**Reference**

HowToGuides

Manuals

LabAlumni

DataAnalysis

Advice for...

**Admin**

analysis:nsb2016:week9

~~DISCUSSION~~

Goals:

- Understand the need for statistical measures to describe patterns of spikes
- Gain familiarity with basic descriptors of spike trains: firing rate, interspike interval distribution, autocorrelation function, cross-correlation function
- Generate artificial Poisson spike trains and compare these to real data
- Develop intuitions about how Poisson spike trains can be used to test interesting hypotheses about neural coding

Resources:

- (optional background reading, a classic book) Spikes: reading the neural code
- (optional visual clarification) convolution example
- (more mathematical treatment, for reference) Gerstner & Kistler chapter on spike train statistics
- (comprehensive treatment, for reference) Kass et al. book, Gruen et al. book
- (toolbox) nSTAT Open-source neural spike train analysis toolbox (with helpful conceptual overview)

Action potentials, or *spikes*, are the main unit of information processing in the brain. Spikes enable fast yet reliable electrical communication over more than the microscopic distances to which passive conduction is limited. Although there are exceptions, spikes are generally the only aspect of neural activity that is transmitted rapidly to postsynaptic targets. At the most general level, sensory input to the brain can be thought of as changing the patterns of spikes in associated sensory nuclei and brain regions; in turn, motor output is ultimately governed by patterns of spikes in motor areas. Thus, any adaptive behavior must ultimately result from the appropriate manipulation of spike patterns. As a result, the recording and interpretation of spike data is a central tool in neuroscience.

This module introduces the most commonly used basic tools for describing spike trains. We focus first on describing spike trains from a single neuron, with a first step towards pairwise analysis at the end.

For most purposes, a spike can be considered as a unitary, punctate event that occurs at a specific time. Of course, in reality, a spike actually has a certain brief lifetime, from its generation and propagation to ultimate decay, but this is not essential to most experimental settings. Thus, a spike train can be described simply as a list of times, each indicating the time at which a given neuron emitted a spike. In statistics, this type of data is known as a *point process*: a spike can be considered a point in time.

Because a point is – at least in principle – infinitesimally small, no two spikes technically occur at **exactly** the same time. Yet, as the figure below shows, multiple presentations of the same stimulus (panel A) can evoke spike trains that are clearly similar across trials (panel B):

(from de Ruyter van Steveninck et al. 1997)

We would like a way to describe the observation that this neuron's response is similar across trials, even though the exact spike times differ between trials.

A simple idea is to count how many spikes occur in a given time interval (time bin). Then, the precise spike time is no longer relevant. Let's first plot a short snippet of an example spike train:

%% loading cd('C:\data\R042-2013-08-18'); S = LoadSpikes([]); %% plot iC = 47; t = [5801 5801.7]; Sr = restrict(S,t(1),t(2)); % get spike times plot([Sr.t{iC} Sr.t{iC}],[-1 -0.5],'Color',[0 0 0]); % note, plots all spikes in one command

To bin these spikes, we can use MATLAB's `histc()`

function. This returns a histogram of spike counts, given a vector of bin edges:

binsize = 0.1; tbin_edges = t(1):binsize:t(2); % vector of time bin edges (for histogram) tbin_centers = tbin_edges(1:end-1)+binsize/2; % vector of time bin centers (for plotting) spk_count = histc(Sr.t{iC},tbin_edges); % get spike counts for each bin spk_count = spk_count(1:end-1); % ignore spikes falling exactly on edge of last bin.

When using `histc`

, it is key to remember that element *n* of the output corresponds to the count in the interval [edge(*n*) edge(*n+1*)>. In other words, a spike occurring exactly at the time of the second edge will be assigned to output bin 2, not bin 1. As a result, `histc()`

needs a special last bin for those spikes occurring exactly at the time of the last edge. The above code ignores this last bin, so that we have a number of output spike counts that is equal to the number of bin centers, which in turn is equal to the number of bin edges minus 1.

We can now plot the spike counts as follows:

hold on; h = bar(tbin_centers,spk_count); set(h,'BarWidth',1,'EdgeColor','none','FaceColor',[1 0 0]); % reformat bar appearance yt = get(gca,'YTick'); ytl = get(gca,'YTickLabel'); set(gca,'YLim',[-1.5 10],'YTick',yt(2:end),'YTickLabel',ytl(2:end,:)); xlabel('time (s)');

This gives

☛ Verify by visual inspection that indeed the binned spike counts, shown as red bars, correspond to the number of spikes in each 100ms bin.

However, it should be clear that this count is a poor description of the underlying spike train. For instance, at time 5801.25 the spike count is zero, yet this time does not seem much different from, say, 5801.4. This occurs because we have sharp bin edges that are placed arbitrarily.

As an aside, note that `histc()`

provides a raw spike count, which is not normalized by the bin size. Thus, to convert the counts into a firing rate estimate in spikes per second (Hz), we need to divide by the bin size.

Perhaps we can mitigate the problem of arbitrary edge placement by making the bin size smaller.

☛ Reduce the bin size to 10 ms and replot.

We now have a finer estimate that tracks the spike bursts and gaps more accurately, but it is still dissatisfying to see that some bins have a count of 2 and others of 1, implying that the firing rate for some bins was twice as large as those nearby, without clear justification. Again, how the bin edges happen to align with the spikes is the problem.

☛ What happens if you reduce the bin size to 1 ms?

At some point, if you make the spike count bins small enough, there will only ever be zero or one spikes per bin. This defeats the purpose of binning in the first place: now again each spike train will be unique. What we require is a measure that avoids the arbitrariness of bin edges.

We saw in the previous section that binned spike counts run into problems both when bins are large (location of edges affect the count) and when bins are small (only one spike per bin at most, so no averaging). A way around this is to use small time bins to minimize the edge effects, combined with a certain amount of filtering (smoothing) to enable contributions from multiple spikes to a single bin.

As an example, imagine that each spike in the above spike train was replaced by a Gaussian, like this:

Then, we end up with a smoothly varying estimate of firing rate. Formally, this operation of replacing each spike (considered a Dirac delta function) with another function (in this example, a Gaussian) is known as *convolution*. Convolving a spike train is related to the idea of *impulse response* to a filter, which we encountered in an earlier module. Each spike is an impulse (input is zero everywhere except at the time of the spike) and the output is a filtered (convoluted) signal which is a smoothed version of the input.

To convolve our example spike train to obtain a smooth estimate of firing rate, we can use MATLAB's `conv2()`

function:

binsize = 0.001; % select a small bin size for good time resolution tbin_edges = t(1):binsize:t(2); tbin_centers = tbin_edges(1:end-1)+binsize/2; spk_count = histc(Sr.t{iC},tbin_edges); spk_count = spk_count(1:end-1); rw_sdf = conv2(spk_count,rectwin(50),'same'); % convolve with rectangular window plot(tbin_centers,rw_sdf,'b'); gau_sdf = conv2(spk_count,gausswin(50),'same'); % convolve with gaussian window plot(tbin_centers,gau_sdf,'g');

(The `same`

option specified in the call to `conv2()`

ensures that the output is the same size as the input.)

When plotted on top of the earlier binned spike train, you should get

Note how convolving with a rectangular function (in blue) results in an estimate that is jagged, but avoids the arbitrary bin edges imposed by the original spike count. For this estimate, each spike is effectively replaced with a rectangle centered at the spike. Convolving with a Gaussian (in green) avoids the jagged edges to make a smooth estimate.

Convolved spike trains are often referred to as *spike density functions* (SDFs) because they are effectively a continuous estimate of firing rate, i.e. describing the density of spikes over time. Of course, the SDFs we computed here are still discretized, but the resolution could be set to something arbitrarily fine by adjusting the bin size.

One final adjustment we need to make is to ensure we actually end up with a firing rate estimate in spikes per second (Hz). Because it is helpful to be able to state the convolution used in a succinct manner, it is best to use the `gausskernel()`

function, which allows us to specify not only the width of the convolution window, but also the standard deviation of the Gaussian:

binsize = 0.001; % in seconds, so everything else should be seconds too gauss_window = 1./binsize; % 1 second window gauss_SD = 0.02./binsize; % 0.02 seconds (20ms) SD gk = gausskernel(gauss_window,gauss_SD); gk = gk./binsize; % normalize by binsize gau_sdf = conv2(spk_count,gk,'same'); % convolve with gaussian window plot(tbin_centers,gau_sdf,'g');

Note that we need to normalize by the binsize (1ms) because our firing rate estimate should be independent of the bin size chosen.

We thus obtain a SDF firing rate estimate with the correct units:

Of course, there is no single correct choice of the SD of the Gaussian kernel used for the convolution. This should be chosen by comparison with the raw spike train: in this case, we don't want to choose a Gaussian that is so wide that the modulation in firing rate is no longer visible.

It is also worth noting that convolution with a Gaussian to obtain a firing rate estimate has limitations. For instance, convolving spikes evoked by stimulus presentation might result in an increase in the firing rate estimate before the stimulus is presented. This can be undesirable and should not be interpreted as evidence of “stimulus anticipation” when in fact it simply results from the convolution! To minimize these sorts of issues, a variety of adaptive firing rate estimates have been developed, such as Bayesian Adaptive Regression Splines (BARS).

If you compute SDFs frequently, it makes sense to put the above into a function, which takes a `ts`

object as input (along with binsize, windowsize and SD parameters), and returns a `tsd`

firing rate estimate.

PETHs or PSTHs show the average firing rate around a time of interest, usually a task event such as the presentation of a stimulus. Once you have a firing rate estimate for the entire session, it is relatively straightforward to average this estimate over a fixed window around each event.

A different aspect of single neuron spike trains is the *regularity* of their spike timing. In principle, an average firing rate of say, 10 Hz could come about in a very regular manner (one spike exactly every 0.1s) or a more irregular manner (say, between 0.05 and 0.15s, with an average of 0.1). This idea is illustrated in the following figure, from a famous paper:

Panels A-C show simulated voltage traces (the membrane potential of a single model neuron), with the trace in panel A displaying very irregular firing, C very regular firing, and B something in between. How can we capture such differences?

A key difference between panels A-C above is the distribution of *interspike intervals* (ISIs). For a very regular spike train, there is a relatively stereotyped time that elapses before the next spike; for an irregular spike train, the interspike intervals are much more variable. Let's plot the ISI histogram for our own example neuron:

iC = 47; spk_t = S.t{iC}; % spike times isi = diff(spk_t); % interspike intervals dt = 0.001; % in s, because spike times are in s isi_edges = 0:dt:0.25; % bin edges for ISI histogram isi_centers = isi_edges(1:end-1)+dt/2; % for plotting isih = histc(isi,isi_edges); bar(isi_centers,isih(1:end-1)); % remember to ignore the last bin of histc() output set(gca,'FontSize',20,'XLim',[0 0.25]); xlabel('ISI (s)'); ylabel('count'); grid on;

You should get

This histogram, which simply counts how many interspike intervals there are in this spike train for every interval from 0-250ms, shows that the most common interspike intervals are around 5ms. There is a hint of an increase in interspike intervals between 100-150ms, in the theta frequency range (7-9 Hz) typical for a hippocampal “place cell”. This ISI distribution can also be seen in the spike train snippet we used for computing firing rates, above: the short ISIs occur in bursts, and then there is a longer interval (in the theta range) until the next burst.

Clearly, this cell does not emit spikes at random intervals; instead, there is structure in the spike train, where certain ISIs are more or less likely than their neighbors.

A basic way to quantify the regularity of a spike train is to use the *coefficient of variation* $C_v$, defined as the ISI standard deviation divided by the ISI mean. If the spike train is perfectly regular, this will be 0; if it is 1, the ISIs are “Poisson-distributed” (more on this below). If it is > 1, the spike train is more irregular (e.g. bursting).

Further indications of structure in spike trains can be found in the *ISI return plot*, a scatterplot of each interspike interval against the next.

☛ Using the `isi`

variable created above, plot each ISI against its successor. Set the axes to run from 0 to 0.25 seconds.

You should get something like this:

This plot shows that a short ISI (in the 5ms range) tends to be followed either by another short ISI, or by a long one in the theta range. It however seems rare for a theta-range ISI to be followed by another theta-range ISI, again in accordance with our visual impression of the spike train.

A useful tool for many spike train analysis questions is to make a comparison with a random, synthetic spike train. Such comparisons can bring into focus salient aspects of real spike trains, and can be used for tests of significance.

A simple way of generating a random spike train is to essentially flip a coin at each time step, scoring a spike for heads, and and no spike for tails:

dt = 0.001; t = [0 10]; % time interval (length) of spike train to generate tvec = t(1):dt:t(2); pspike = 0.5; % probability of generating a spike in bin rng default; % reset random number generator to reproducible state spk_poiss = rand(size(tvec)); % random numbers between 0 and 1 spk_poiss_idx = find(spk_poiss < pspike); % index of bins with spike spk_poiss_t = tvec(spk_poiss_idx)'; % use idxs to get corresponding spike time line([spk_poiss_t spk_poiss_t],[-1 -0.5],'Color',[0 0 0]); % note, plots all spikes in one command axis([0 0.1 -1.5 5]); set(gca,'YTick',[]);

This gives:

This spike train looks different from our real one.

☛ Plot its ISI histogram. Some differences are apparent; for instance, this synthetic spike train clearly has much shorter ISIs overall.

This difference occurs because the mean firing rates between the two spike trains are not equal. Our real spike train has a mean firing rate of about 0.47 spikes/s (as `1./mean(diff(spk_t))`

will verify). In contrast, our synthetic spike train has a theoretical mean firing rate of 500 spikes/s (50% probability of a spike in each 1ms bin).

☛ What should the probability per 1ms bin of a spike be, in order for our synthetic spike train to have a mean firing rate of 0.47 spikes/s? Recompute the ISI histogram for a spike train generated with this probability instead of 0.5. You will need to increase the length of the spike train to 4500s in order to get a number of spikes which is similar to that in the real spike train.

You should get something like:

The number of “short” ISI counts (i.e. the ones shown in this plot) is way smaller in our synthetic spike train than in the real one. So, even though the mean firing rates are the same, our real spike train has relatively many short ISIs, which must be balanced with longer ones to end up with the same mean.

Generating artificial spike trains in this way, by flipping a coin at each time step, is known as a *Poisson point process* (Poisson process for short): it generates interspike intervals drawn from the Poisson distribution. An important property of this distribution is that its values are never negative. This is useful when dealing with counts and interspike intervals, which cannot be negative. For large means, the Poisson distribution will approach a Gaussian distribution, but a Gaussian with a mean close to zero is likely to generate negative values, which are nonsensical when dealing with ISIs or spike counts.

A further property of the Poisson process that each ISI is independent of the last.

☛ Generate the ISI return plot for your Poisson spike train. If you only see a few spikes, increase the axes limits.

You should see a total absence of structure (correlations) in the plot, unlike the equivalent for the real spike train, indicating that a given ISI does not tell you anything about what the next one might be.

A final difference with our actual spike train is that pure Poisson spike trains have no refractory period. Verify that the ISIh of the real spike train does!

Our synthetic Poisson spike train had a fixed probability of emitting a spike in each time bin. This results in a *homogenous* Poisson process, meaning that theoretically, the firing rate does not vary over time. Of course, there will be random fluctuations, but the underlying probability of a spike remains constant. In contrast, an *inhomogenous* Poisson process involves changes over time of the underlying spike probability.

☛ In the above code for generating a spike train, `pspike`

was set to a fixed probability of 0.5. Change `pspike`

to be a vector of the same size as `tvec`

describing a 1 Hz sinusoid which oscillates between 0 and 0.5. Use this changing spike probability to generate your artificial (inhomogenous Poisson) spike train. Verify by visual inspection that indeed the firing rate of this spike train is varying over time.

A different way of characterizing the properties of a spike train is to compute its autocorrelation function, which describes the probability of finding a spike at time $t+t'$ given a spike at time $t$, for some range of lags $t'$. This is analogous to the autocorrelation of gamma-band power in a ventral striatal LFP that we computed in the previous module.

MATLAB doesn't provide a built-in function that can take in a spike train and compute its autocorrelation, so we have to make our own. Here is a simple implementation:

function [ac,xbin] = acf(spk_t,binsize,max_t) % function [ac,xbin] = acf(spike_times,binsize,max_t) % % estimate autocorrelation of input spike train % % INPUTS: % spike_times: [1 x nSpikes] double of spike times (in s) % binsize: acf bin size in s % max_t: length of acf in s % % OUTPUTS: % ac: autocorrelation estimate (normalized so that acf for the zero bin is % 0) % xbin: bin centers (in s) % % MvdM 2013 xbin_centers = -max_t-binsize:binsize:max_t+binsize; % first and last bins are to be deleted later ac = zeros(size(xbin_centers)); for iSpk = 1:length(spk_t) relative_spk_t = spk_t - spk_t(iSpk); ac = ac + hist(relative_spk_t,xbin_centers); % note that hist() puts all spikes outside the bin centers in the first and last bins! delete later. end xbin = xbin_centers(2:end-1); % remove unwanted bins ac = ac(2:end-1); ac = ac./max(ac); % normalize

☛ Using this autocorrelation function and the code for generating Poisson spike trains above, compute and plot the autocorrelation of a Poisson spike train with a mean firing rate of 0.47 Hz (exactly as done above), a 10ms time bin, from -1 to 1 seconds.

You should get something like:

The autocorrelation of a Poisson spike train appears to be essentially flat over different time lags.

☛ Why? What differences to you notice with the acorr of the actual spike train (`[acorr,xbin] = acf(S.t{47},0.01,1);`

)?

(Mathematical note: as noted above, the acf can be thought of as a spike density function conditional on a spike having occurred. A generalization of this idea is the *conditional intensity function (CIF)* which is the spike density function conditional on the spike train's entire history.)

The intuition behind the cross-correlation between two spike trains is the following: given that neuron 1 emits a spike at time $t$, what is the probability of seeing a spike from neuron 2 at different lags $t+t'$? For instance, if neuron 2 reliably fires 10ms after neuron 1, the cross-correlation between 1 and 2 would have a peak at +10ms. The autocorrelation is a special case of this with the two spike trains being the same.

Computing the cross-correlation is a simple modification of our autocorrrelation function:

function [cc,xbin] = ccf(spk_t1,spk_t2,binsize,max_t) % function [cc,xbin] = ccf(spike_times1,spike_times2,binsize,max_t) % % estimate cross-correlation function of input spike train % % INPUTS: % spike_times1: [nSpikes x 1] double spike times in s % spike_times2: [nSpikes x 1] double spike times in s % binsize: ccf bin size in s % max_t: length of acf in s % % OUTPUTS: % cc: cross-correlation estimate (relative to spike_times1) % xbin: bin centers (in s) % % MvdM 2013 xbin_centers = -max_t-binsize:binsize:max_t+binsize; % first and last bins are to be deleted later cc = zeros(size(xbin_centers)); for iSpk = 1:length(spk_t1) relative_spk_t = spk_t2 - spk_t1(iSpk); cc = cc + hist(relative_spk_t,xbin_centers); % note that histc puts all spikes outside the bin centers in the first and last bins! delete later. end xbin = xbin_centers(2:end-1); % remove unwanted bins cc = cc(2:end-1); cc = cc./(length(spk_t1)); % normalize by number of spikes of first input

The only potentially tricky part of cross-correlations is how to do the normalization (there are some different choices that determine if the output is a conditional probability, a correlation, or something else), but we will not worry about this for now. All these normalizations produce outputs that are qualitatively similar.

☛ What do you expect the cross-correlation function of two Poisson spike trains to look like? Why? Verify your intuition.

Let's compute the cross-correlation between two hippocampal place cells, simultaneously recorded from a rat running a T-maze:

iC1 = 5; iC2 = 42; Sr = restrict(S,3200,5650); % restrict to on-track times only [xcorr,xbin] = ccf(Sr.t{iC1},Sr.t{iC2},0.01,1); plot(xbin,xcorr); set(gca,'FontSize',20); xlabel('lag (s)'); ylabel('xcorr'); title(sprintf('%d-%d',iC1,iC2));

You should get:

This is an example of one of my favorite plots in neuroscience. It holds the key to revealing a fundamental and unique property of the rodent hippocampus, thought to reflect a specialization for the rapid encoding of episodic memories. The plot shows (something akin to) the probability of cell 42 spiking at various time lags (between -1 and 1 second) given that cell 5 spikes at time 0. This ccf is clearly asymmetric, in that cell 42 is more likely to spike after cell 5 than before, as evident from the crosscorrelation values on the right side of the plot (5–>42) being larger overall than those on the left (42–>5).

There is something else going on, too: on top of this slow component of the ccf, there is a faster component with sharp peaks repeating at theta frequency (about 8-9 per second). The peak closest to zero is in fact slightly offset, occurring at approximately 30-40ms. Thus, cell 42 is likely to spike just a few tens of ms after cell 5; in contrast, the reverse, where cell 5 spikes a few tens of ms after cell 42, almost never happens.

As it turns out, this highly precise ccf shape results from the fact that each theta cycle in the hippocampal LFP is in fact associated with the sequential activation of a number of place cells, tracing out a trajectory on the maze, as illustrated in this figure from Skaggs et al. 1996:

A different way of showing this idea is as follows (from Malhotra et al. (2012):

This figure illustrates the ccf (labeled 'CCG' here, for cross-correlogram, which is the same thing) that would be expected from generating inhomogenous Poisson spike trains based on a firing rate profile given by two place fields (the blue and red lines in the top panel). In this case, with the animal running from left to right, the blue spikes would tend to come after the red, because the red field is entered before the blue field. This is reflected in the asymmetric shape of the ccf. However, this lacks the sharp theta peaks apparent in the ccf we just computed from our two real place cells. The figure here shows that if we enable an effect called “phase precession” (see here for more optional info on this) then the shape of the ccf does reproduce what is found experimentally. You can also see that accordingly, the blue spikes tend to come just after the red spikes within each theta cycle, forming sequences of place cells.

An important question is to establish if such sequences could come about by just having each place cell *independently* emit spikes in a manner described by some average spike density function, or if the sequences that are actually seen require some sort of *coordination* between place cells.

To test this, we can first convert our spike trains to a SDF over time, and then use that SDF to generate inhomogenous Poisson spike trains. Since these two spike trains are generated by independent coin flips, i.e. whether or not there was a spike for neuron 1 cannot directly affect the probability of a spike in neuron 2, we can determine if this is sufficient to reproduce the ccf seen in the real data (in which there may be some dependency between neuron 1 and neuron 2).

★ Compute the ccf of two spike trains generated from the SDF of our two hippocampal place cells (5 and 42) and compare it to their actual ccf. Use a 50 ms standard deviation for the Gaussian, and 1ms bin size. The sequence of steps for this would look something like this:

% for each of the two neurons, restrict the data to [3200 5650] (the time interval when the rat was running on the track) % compute the spike density function for each, making sure that your tvec runs from 3200 to 5650 also, and that you have a 50ms SD for the Gaussian convolution kernel % to use these SDFs to generate Poisson spike trains, convert the firing rates given by the SDF to a probability of emitting a spike in a given bin. (As you did above for a 0.47 Hz constant firing rate.) % generate Poisson spike trains, making sure to use the same tvec % convert Poisson spike trains to ts objects and compute the ccf

(This comparison of independently generated spikes with the actual data is the essence of this excellent paper.)

★ Write a peri-event time histogram (PETH) function, that takes two inputs: (1) a `ts`

object with spike times of interest, and (2) a `ts`

object or list of event times against which to compute the PETH. It should return the average distribution of spike times relative to the event times, defined to occur at time zero in the PETH. Note that the input spike times do not have to be spikes, but could be other events such as button presses! Extra nice would be an option specifying if the input spikes/events should be binned raw, or smoothed into a SDF first. Hint: how is this related to the ccf?

★★ Choosing the width of the kernel used in generating SDFs can be rather arbitrary. One way to avoid this arbitrariness is to estimate the best $ \sigma $ from the data using a cross-validation procedure: by leaving a small percentage of the spikes out, estimating the SDF from the remaining spikes, and then asking how well the left-out spikes are predicted by this SDF. Implement such an “adaptive smoothing” algorithm, and test it on synthetic and actual data. Comment on the strengths and limitations of this approach.

analysis/nsb2016/week9.txt · Last modified: 2018/07/07 10:19 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International