analysis:course-w16:week10

This shows you the differences between two versions of the page.

analysis:course-w16:week10 [2016/02/14 15:07] mvdm [Estimating tuning curves] |
analysis:course-w16:week10 [2018/07/07 10:19] |
||
---|---|---|---|

Line 1: | Line 1: | ||

- | ~~DISCUSSION~~ | ||

- | :!: **UNDER CONSTRUCTION -- PLEASE DO NOT USE YET** :!: | ||

- | |||

- | ===== Spike train analysis II: tuning curves, encoding, decoding ===== | ||

- | |||

- | Goals: | ||

- | |||

- | * Learn to estimate and plot tuning curves, raw and smoothed | ||

- | * Implement a basic Bayesian decoding algorithm | ||

- | * Compare decoded and actual position by exporting to a movie file | ||

- | |||

- | Resources: | ||

- | |||

- | * (if you have not encountered Bayes' rule and conditional probability before) [[http://www.cs.ubc.ca/~murphyk/Bayes/bayesrule.html | A brief introduction ]] by Ken Murphy at UBC, and one from the [[http://blogs.scientificamerican.com/cross-check/bayes-s-theorem-what-s-the-big-deal/ | Scientific American]] | ||

- | * (for reference) [[http://jn.physiology.org/content/79/2/1017.long | Zhang et al. 1998]], first application of decoding to place cell data, with nice explanations and derivations | ||

- | * (for reference) [[http://www.jneurosci.org/content/18/18/7411.long | Brown et al. 1998]], an example of a more sophisticated decoding method | ||

- | |||

- | **Note**: this module uses an externally compiled function, ''ndhist''. By default, this will not work on non-Windows machines, but the source code is provided. If you are a non-Windows user and able to compile this, I would be grateful if you could push it to %%GitHub%%. | ||

- | |||

- | ==== Introduction ==== | ||

- | |||

- | To support adaptive behavior, activity in the brain must correspond in some way to relevant sensory events and planned movements, combine many sources of information into multimodal percepts, and recall traces of past events to inform predictions about the future. In other words, neural activity must somehow //encode// relevant quantities. For instance, it can be [[http://www.ncbi.nlm.nih.gov/pubmed/9817202 | demonstrated]] [[http://www.ncbi.nlm.nih.gov/pubmed/20141292 | behaviorally]] that many animals use estimates of their location and head direction to navigate towards a goal. Where, and how, are these quantities represented in the brain? What are the neural circuits that can compute and update these signals? How do place and direction estimates contribute to which way to go? | ||

- | |||

- | This //information processing// view of the brain has been extremely influential, as highlighted by the enduring appeal of [[http://www.ncbi.nlm.nih.gov/pubmed/4966457 | Hubel and Wiesel's demonstrations]] that single cells in macaque V1 respond to bars of light not only within a particular region of visual space, but also with a specific orientation. Such cells are said to be //tuned// for orientation [of the bar] and a typical //tuning curve// would therefore look like this: | ||

- | |||

- | {{ :analysis:course:biol377_2-3.png?300 |}} | ||

- | |||

- | This tuning curve describes how the cell responds, on average, to different orientations of the stimulus. If the cell were to respond with the same firing rate across the range of stimulus orientations, then the cell is indifferent to this particular stimulus dimension: it does not encode it. However, because this cell clearly modulates its firing rate with stimulus orientation, it encodes, or represents (I use these terms interchangeably, but some disagree) this quantity in its activity. | ||

- | |||

- | We can turn this idea around and note that if orientation is encoded, this implies we can also //decode// the original stimulus from the cell's activity. For instance, if we noted that this cell was firing at a high rate, we would infer that the stimulus orientation is likely close to the cell's preferred direction. Note that this requires knowledge of the cell's tuning curve, and that based on one cell only, we are unlikely to be able to decode (or reconstruct, which means the same thing) the stimulus perfectly. The more general view is to say that the cell's activity provides a certain amount of information about the stimulus, or equivalently, that our (decoded) estimate of the stimulus is improved by taking the activity of this cell into account. | ||

- | |||

- | This module first explores some practical issues in estimating tuning curves of "place cells" recorded from the rat hippocampus. An introduction to a particular decoding method (Bayesian decoding) is followed by application to many simultaneously recorded place cells as a rat performs a T-maze task. | ||

- | |||

- | ==== Estimating place cell tuning curves (place fields) ==== | ||

- | |||

- | First, load the "place cell" data set also used in the previous module, which contains a number of spike trains recorded simultaneously from the dorsal CA1 area of the hippocampus: | ||

- | |||

- | <code matlab> | ||

- | %% | ||

- | cd('D:\data\R042\R042-2013-08-18'); | ||

- | |||

- | please = []; please.load_questionable_cells = 1; | ||

- | S = LoadSpikes(please); | ||

- | |||

- | pos = LoadPos([]); | ||

- | </code> | ||

- | |||

- | The ''load_questionable_cells'' option in ''LoadSpikes()'' results in the loading of ''*._t'' files, in addition to the familiar ''*.t'' spike time files. The underscore extension indicates a cell with questionable isolation quality, likely contaminated with noise, spikes from other neurons, and/or missing spikes. In general, you do not want to use such neurons for analysis, but in this case we are not concerned with properties of individual neurons. We are instead interested in the information present in a population of neurons, and for this we will take everything we can get! | ||

- | |||

- | === Visual inspection === | ||

- | |||

- | Before looking at the data, we will first exclude the pre- and post-run segments of the data: | ||

- | |||

- | <code matlab> | ||

- | LoadExpKeys; | ||

- | S = restrict(S,ExpKeys.TimeOnTrack,ExpKeys.TimeOffTrack); | ||

- | pos = restrict(pos,ExpKeys.TimeOnTrack,ExpKeys.TimeOffTrack); | ||

- | </code> | ||

- | |||

- | Now we can plot the position data: | ||

- | |||

- | <code matlab> | ||

- | plot(getd(pos,'x'),getd(pos,'y'),'.','Color',[0.5 0.5 0.5],'MarkerSize',1); | ||

- | axis off; hold on; | ||

- | </code> | ||

- | |||

- | Note that ''getd()'' is a utility function that retrieves data associated with a specific label; see [[analysis:course-w16:week2|Module 2]] for details. | ||

- | |||

- | Next, we plot the spikes of a single cell at the location where the rat was when each spike was emitted: | ||

- | |||

- | <code matlab> | ||

- | iC = 7; | ||

- | spk_x = interp1(pos.tvec,getd(pos,'x'),S.t{iC},'linear'); | ||

- | spk_y = interp1(pos.tvec,getd(pos,'y'),S.t{iC},'linear'); | ||

- | |||

- | h = plot(spk_x,spk_y,'.r'); | ||

- | </code> | ||

- | |||

- | Note the use of ''interp1()'' here: it finds the corresponding ''x'' and ''y'' values for each spike time using linear interpolation. You should see: | ||

- | |||

- | {{ :analysis:course:week10_scatterfield.png?600 |}} | ||

- | |||

- | This cell seems to have a place field just to the left of the choice point on the track (a T-maze). There are also a few spikes on the pedestals, where the rat rests in between runs on the track. | ||

- | |||

- | This figure is a useful visualization of the raw data, but it is not a tuning curve. | ||

- | |||

- | === Estimating tuning curves === | ||

- | |||

- | This figure is a useful visualization of the raw data, but it is not a tuning curve. As a first step towards estimating this cell's tuning curve (or //encoding model//, we should restrict the spikes to only those occurring when the rat is running on the track: | ||

- | |||

- | <code matlab> | ||

- | ENC_S = restrict(S,run_start,run_end); | ||

- | ENC_pos = restrict(pos,run_start,run_end); | ||

- | |||

- | % check for empties and remove | ||

- | keep = ~cellfun(@isempty,ENC_S.t); | ||

- | ENC_S.t = ENC_S.t(keep); | ||

- | ENC_S.label = ENC_S.label(keep); | ||

- | |||

- | S.t = S.t(keep); | ||

- | S.label = S.label(keep); | ||

- | </code> | ||

- | |||

- | We have created ''ENC_'' versions of our spike trains and position data, containing only data from when the rat was running on the track (the ''run_start'' and ''run_end'' variables have been previously generated by a different script) and removed all cells from the data set that did not have any spikes on the track. | ||

- | |||

- | ☛ Plot the above scatterfield again for the restricted spike train. Verify that no spikes are occurring off the track by comparing your plot to the previous one for the full spike trains, above. | ||

- | |||

- | To estimate tuning curves from the data, we need to divide spike count by time spent for each location on the maze. A simple way of doing that is to obtain 2-D histograms, shown here for the position data: | ||

- | |||

- | <code matlab> | ||

- | clear pos_mat; | ||

- | pos_mat(:,1) = getd(ENC_pos,'y'); % construct input to 2-d histogram | ||

- | pos_mat(:,2) = getd(ENC_pos,'x'); | ||

- | |||

- | SET_xmin = 80; SET_ymin = 0; % set up bins | ||

- | SET_xmax = 660; SET_ymax = 520; | ||

- | SET_xBinSz = 10; SET_yBinSz = 10; | ||

- | |||

- | x_edges = SET_xmin:SET_xBinSz:SET_xmax; | ||

- | y_edges = SET_ymin:SET_yBinSz:SET_ymax; | ||

- | |||

- | occ_hist = histcn(pos_mat,y_edges,x_edges); | ||

- | |||

- | no_occ_idx = find(occ_hist == 0); % NaN out bins rat never visited | ||

- | occ_hist(no_occ_idx) = NaN; | ||

- | |||

- | occ_hist = occ_hist .* (1/30); % convert to seconds using video frame rate | ||

- | |||

- | subplot(221); | ||

- | pcolor(occ_hist); shading flat; axis off; colorbar | ||

- | title('occupancy'); | ||

- | </code> | ||

- | |||

- | We can do the same thing for the spikes of our example neuron: | ||

- | |||

- | <code matlab> | ||

- | % basic spike histogram | ||

- | clear spk_mat; | ||

- | iC = 7; | ||

- | spk_x = interp1(ENC_pos.tvec,getd(ENC_pos,'x'),ENC_S.t{iC},'linear'); | ||

- | spk_y = interp1(ENC_pos.tvec,getd(ENC_pos,'y'),ENC_S.t{iC},'linear'); | ||

- | spk_mat(:,2) = spk_x; spk_mat(:,1) = spk_y; | ||

- | spk_hist = histcn(spk_mat,y_edges,x_edges); | ||

- | |||

- | spk_hist(no_occ_idx) = NaN; | ||

- | |||

- | subplot(222) | ||

- | pcolor(spk_hist); shading flat; axis off; colorbar | ||

- | title('spikes'); | ||

- | </code> | ||

- | |||

- | ..and finally simply divide one by the other: | ||

- | |||

- | <code matlab> | ||

- | % rate map | ||

- | tc = spk_hist./occ_hist; | ||

- | |||

- | subplot(223) | ||

- | pcolor(tc); shading flat; axis off; colorbar | ||

- | title('rate map'); | ||

- | </code> | ||

- | |||

- | This gives: | ||

- | |||

- | {{ :analysis:cosmo2014:example_rawtc.png?900 |}} | ||

- | |||

- | Note that from the occupancy map, you can see the rat spent relatively more time at the choice point compared to other segments of the track. However, the rough binning is not very satisfying. Let's see if we can do better with some smoothing: | ||

- | |||

- | <code matlab> | ||

- | kernel = gausskernel([4 4],2); % Gaussian kernel of 4x4 pixels, SD of 2 pixels (note this should sum to 1) | ||

- | |||

- | [occ_hist,~,~,pos_idx] = histcn(pos_mat,y_edges,x_edges); | ||

- | occ_hist = conv2(occ_hist,kernel,'same'); | ||

- | |||

- | occ_hist(no_occ_idx) = NaN; | ||

- | occ_hist = occ_hist .* (1/30); % convert to seconds using video frame rate | ||

- | |||

- | subplot(221); | ||

- | pcolor(occ_hist); shading flat; axis off; colorbar | ||

- | title('occupancy'); | ||

- | |||

- | % | ||

- | spk_hist = histcn(spk_mat,y_edges,x_edges); | ||

- | spk_hist = conv2(spk_hist,kernel,'same'); | ||

- | spk_hist(no_occ_idx) = NaN; | ||

- | |||

- | subplot(222) | ||

- | pcolor(spk_hist); shading flat; axis off; colorbar | ||

- | title('spikes'); | ||

- | |||

- | % | ||

- | tc = spk_hist./occ_hist; | ||

- | |||

- | subplot(223) | ||

- | pcolor(tc); shading flat; axis off; colorbar | ||

- | title('rate map'); | ||

- | </code> | ||

- | |||

- | Now you should get: | ||

- | |||

- | {{ :analysis:cosmo2014:example_smoothtc.png?900 |}} | ||

- | |||

- | These are well-formed tuning curves we can use for decoding. Of course we could bin more finely for increased spatial resolution, but this will slow down the decoding, so for now it's not worth it. | ||

- | |||

- | Next we obtain a tuning curve for all our cells: | ||

- | |||

- | <code matlab> | ||

- | clear tc all_tc | ||

- | nCells = length(ENC_S.t); | ||

- | for iC = 1:nCells | ||

- | spk_x = interp1(ENC_pos.tvec,getd(ENC_pos,'x'),ENC_S.t{iC},'linear'); | ||

- | spk_y = interp1(ENC_pos.tvec,getd(ENC_pos,'y'),ENC_S.t{iC},'linear'); | ||

- | |||

- | clear spk_mat; | ||

- | spk_mat(:,2) = spk_x; spk_mat(:,1) = spk_y; | ||

- | spk_hist = histcn(spk_mat,y_edges,x_edges); | ||

- | spk_hist = conv2(spk_hist,kernel,'same'); | ||

- | | ||

- | spk_hist(no_occ_idx) = NaN; | ||

- | |||

- | tc = spk_hist./occ_hist; | ||

- | |||

- | all_tc{iC} = tc; | ||

- | |||

- | end | ||

- | </code> | ||

- | |||

- | We can inspect the results as follows: | ||

- | |||

- | <code matlab> | ||

- | %% | ||

- | ppf = 25; % plots per figure | ||

- | for iC = 1:length(ENC_S.t) | ||

- | nFigure = ceil(iC/ppf); | ||

- | figure(nFigure); | ||

- | |||

- | subtightplot(5,5,iC-(nFigure-1)*ppf); | ||

- | pcolor(all_tc{iC}); shading flat; axis off; | ||

- | caxis([0 10]); | ||

- | |||

- | end | ||

- | </code> | ||

- | |||

- | You will see a some textbook "place cells" with a clearly defined single place field. There are also cells with other firing patterns. | ||

- | |||

- | The data in this module computes tuning curves for location, but the idea is of course more general. For continuous variables in particular, it is a natural and powerful way to describe the relationship between two quantities -- spikes and location in this case, but there is no reason why you couldn't do something like pupil diameter as a function of arm reaching direction, for instance! | ||

- | ==== Bayesian decoding ==== | ||

- | |||

- | As noted in the introduction above, given that we have neurons whose activity seems to encode some stimulus variable (location in this case), we can attempt to decode that variable based on the neurons' time-varying activity. | ||

- | |||

- | A popular approach to doing this is "one-step Bayesian decoding", illustrated in this figure (from [[http://www.cell.com/neuron/abstract/S0896-6273(10)00507-6 | van der Meer et al. 2010]]): | ||

- | |||

- | {{ :analysis:course:mvdm_yorkfieldtrip13.png?900 |}} | ||

- | |||

- | For this particular experiment, the goal of decoding is to recover the location of the rat, given neural activity in some time window. More formally, we wish to know $P(\mathbf{x}|\mathbf{n})$, the probability of the rat being at each possible location $x_i$ ($\mathbf{x}$ in vector notation, to indicate that there are many possible locations) given a vector of spike counts $\mathbf{n}$. | ||

- | |||

- | If $P(\mathbf{x}|\mathbf{n})$ (the "posterior") is the same for every location bin $x_i$ (i.e. is uniform), that means all locations are equally likely and we don't have a good guess; in contrast, if most of the $x_i$ are zero and a small number have a high probability, that means we are confident predicting the most likely location. Of course, there is no guarantee that our decoded estimate will agree with the actual location; we will test this later on. | ||

- | |||

- | So how can we obtain $P(\mathbf{x}|\mathbf{n})$? We can start with Bayes' rule: | ||

- | |||

- | \[P(\mathbf{x}|\mathbf{n})P(\mathbf{n}) = P(\mathbf{n}|\mathbf{x})P(\mathbf{x})\] | ||

- | |||

- | If you have not come across Bayes' rule before, or the above equation looks mysterious to you, review the gentle intro by linked to at the top of the page. In general, it provides a quantitative way to update prior beliefs in the face of new evidence. | ||

- | |||

- | The key quantity to estimate is $P(\mathbf{n}|\mathbf{x})$, the probability of observing $n$ spikes in a given time window when the rat is at location $x$. At the basis of estimating this probability (the "likelihood" or evidence) lies the tuning curve: this tells us the //average// firing rate at each location. We need a way to convert a given number of spikes -- whatever we observe in the current time window for which we are trying to decode activity, 3 spikes for cell 1 in the figure above -- to a probability. In other words, what is the probability of observing 3 spikes in a 250ms time window, given that for this location the cell fires, say at 5Hz on average? | ||

- | |||

- | A convenient answer is to assume that the spike counts follow a Poisson distribution. Assuming this enables us to assign a probability to each possible spike count for a mean firing rate given by the tuning curve. For instance, here are the probabilities of observing different numbers of spikes $k$ (on the horizontal axis) for four different means ($\lambda = $1, 4 and 10): | ||

- | |||

- | {{ https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Poisson_pmf.svg/500px-Poisson_pmf.svg.png |}} | ||

- | |||

- | In general, from the [[http://en.wikipedia.org/wiki/Poisson_distribution | definition of the Poisson distribution]], it follows that | ||

- | |||

- | \[P(n_i|\mathbf{x}) = \frac{(\tau f_i(\mathbf{x}))^{n_i}}{n_i!} e^{-\tau f_i (x)}\] | ||

- | |||

- | $f_i(\mathbf{x})$ is the average firing rate of neuron $i$ over $x$ (i.e. the tuning curve for position), $n_i$ is the number of spikes emitted by neuron $i$ in the current time window, and $\tau$ is the size of the time window used. Thus, $\tau f_i(\mathbf{x})$ is the mean number of spikes we expect from neuron $i$ in a window of size $\tau$; the Poisson distribution describes how likely it is that we observe the actual number of spikes $n_i$ given this expectation. | ||

- | |||

- | In reality, place cell spike counts are typically not Poisson-distributed ([[http://www.ncbi.nlm.nih.gov/pubmed/9501237 | Fenton et al. 1998]]) so this is clearly a simplifying assumption. There are many other, more sophisticated approaches for the estimation of $P(n_i|\mathbf{x})$ (see for instance [[http://www.ncbi.nlm.nih.gov/pubmed/17925266 | Paninski et al. 2007]]) but this basic method works well for many applications. | ||

- | |||

- | The above equation gives the probability of observing $n$ spikes for a given average firing rate for a single neuron. How can we combine information across neurons? Again we take the simplest possible approach and assume that the spike count probabilities for different neurons are independent. This allows us to simply multiply the probabilities together to give: | ||

- | |||

- | \[P(\mathbf{n}|\mathbf{x}) = \prod_{i = 1}^{N} \frac{(\tau f_i(\mathbf{x}))^{n_i}}{n_i!} | ||

- | e^{-\tau f_i (x)}\] | ||

- | |||

- | An analogy here is simply to ask: if the probability of a coin coming up heads is $0.5$, what is the probability of two coints, flipped simultaneously, coming up heads? If the coins are independent then this is simply $0.5*0.5$. | ||

- | |||

- | Combining the above with Bayes' rule, and rearranging a bit, gives | ||

- | |||

- | \[P(\mathbf{x}|\mathbf{n}) = C(\tau,\mathbf{n}) P(\mathbf{x}) (\prod_{i = 1}^{N} f_i(\mathbf{x})^{n_i}) \: e (-\tau \sum_{i = 1}^N f_i(\mathbf{x})) \] | ||

- | |||

- | This is more easily evaluated in vectorized MATLAB code. $C(\tau,\mathbf{n})$ is a normalization factor which we simply set to guarantee $\sum_x | ||

- | P(\mathbf{x}|\mathbf{n}) = 1$ (Zhang et al. 1998). For now, we assume that $P(\mathbf{x})$ (the "prior") is uniform, that is, we have no prior information about the location of the rat and let our estimate be completely determined by the likelihood. | ||

- | |||

- | The tuning curves take care of the $f_i(x)$ term in the decoding equations. Next, we need to get $\mathbf{n}$, the spike counts. | ||

- | === Preparing tuning curves for decoding === | ||

- | |||

- | With the math taken care of, we can now start preparing the data for the decoding procedure. First we need to make sure we have tuning curves for all neurons. | ||

- | |||

- | Now we need to do the same for //all// cells. For now, we will revert to using the "low-resolution" version (with 63x47 bins) with a small amount of smoothing. Even though this is not as good of an estimate as the high-resolution version, our decoding will be super slow if we try to run it on a high-resolution smoothed estimate. | ||

- | |||

- | So first, let's inspect our updated tuning curve example: | ||

- | |||

- | <code matlab> | ||

- | kernel = gausskernel([4 4],2); % 2-D gaussian, width 4 bins, SD 2 | ||

- | |||

- | SET_xmin = 10; SET_ymin = 10; SET_xmax = 640; SET_ymax = 480; | ||

- | SET_nxBins = 63; SET_nyBins = 47; | ||

- | |||

- | spk_binned = ndhist(cat(1,spk_x',spk_y'),[SET_nxBins; SET_nyBins],[SET_xmin; SET_ymin],[SET_xmax; SET_ymax]); | ||

- | spk_binned = conv2(spk_binned,kernel,'same'); % smoothing | ||

- | |||

- | occ_binned = ndhist(cat(1,getd(pos,'x'),getd(pos,'y')),[SET_nxBins; SET_nyBins],[SET_xmin; SET_ymin],[SET_xmax; SET_ymax]); | ||

- | occ_mask = (occ_binned < 5); | ||

- | occ_binned = conv2(occ_binned,kernel,'same'); % smoothing | ||

- | |||

- | occ_binned(occ_mask) = 0; % don't include bins with less than 5 samples | ||

- | |||

- | VT_Fs = 30; | ||

- | tc = spk_binned./(occ_binned .* (1 / VT_Fs)); | ||

- | tc(isinf(tc)) = NaN; | ||

- | |||

- | pcolor(tc'); shading flat; | ||

- | axis xy; colorbar; axis off; | ||

- | </code> | ||

- | |||

- | Then, we can do the same for all cells in our data set: | ||

- | |||

- | <code matlab> | ||

- | clear tc | ||

- | nCells = length(S.t); | ||

- | for iC = 1:nCells | ||

- | spk_x = interp1(pos.tvec,getd(pos,'x'),S.t{iC},'linear'); | ||

- | spk_y = interp1(pos.tvec,getd(pos,'y'),S.t{iC},'linear'); | ||

- | |||

- | spk_binned = ndhist(cat(1,spk_x',spk_y'),[SET_nxBins; SET_nyBins],[SET_xmin; SET_ymin],[SET_xmax; SET_ymax]); | ||

- | spk_binned = conv2(spk_binned,kernel,'same'); | ||

- | | ||

- | tc = spk_binned./(occ_binned .* (1 / VT_Fs)); | ||

- | tc(isinf(tc)) = NaN; | ||

- | | ||

- | all_tc{iC} = tc; | ||

- | | ||

- | end | ||

- | </code> | ||

- | |||

- | Note that we don't need to recompute the occupancy because it is the same for all cells. | ||

- | |||

- | Let's inspect the resulting tuning curves: | ||

- | |||

- | <code matlab> | ||

- | ppf = 25; % plots per figure | ||

- | for iC = 1:length(S.t) | ||

- | nFigure = ceil(iC/ppf); | ||

- | figure(nFigure); | ||

- | | ||

- | subplot(5,5,iC-(nFigure-1)*ppf); | ||

- | pcolor(all_tc{iC}); shading flat; axis off; | ||

- | | ||

- | end | ||

- | </code> | ||

- | |||

- | You will see a some textbook "place cells" with a clearly defined single place field. There are also cells with other firing patterns. | ||

- | |||

- | ☛ One cell has a completely green place map. What does this indicate, and under what conditions can this happen? | ||

- | |||

- | Since we see cells with fields in some different locations, it seems unlikely that a single sensory cue or nonspatial source can account for this activity. Of course, numerous experiments have demonstrated that many place cells do not depend on any specific sensory cue to maintain a stable firing field. | ||

- | |||

- | === Preparing firing rates for decoding === | ||

- | |||

- | The tuning curves take care of the $f_i(x)$ term in the decoding equations. Now we need to get $\mathbf{n}$, which are simply spike counts: | ||

- | |||

- | <code matlab> | ||

- | clear Q; | ||

- | binsize = 0.25; | ||

- | |||

- | % assemble tvecs | ||

- | tvec_edges = ExpKeys.TimeOnTrack:binsize:ExpKeys.TimeOffTrack; | ||

- | Q_tvec_centers = tvec_edges(1:end-1)+binsize/2; | ||

- | |||

- | for iC = length(S.t):-1:1 | ||

- | |||

- | spk_t = S.t{iC}; | ||

- | Q(iC,:) = histc(spk_t,tvec_edges); | ||

- | |||

- | end | ||

- | nActiveNeurons = sum(Q > 0); | ||

- | </code> | ||

- | |||

- | This "Q-matrix" of size ''[nCells x nTimeBins]'' is the start of a number of analyses, such as the nice ensemble reactivation procedure introduced in Peyrache et al. 2009. Let's inspect it briefly: | ||

- | |||

- | <code matlab> | ||

- | % look at it | ||

- | imagesc(tvec,1:nCells,Q) | ||

- | set(gca,'FontSize',16); xlabel('time(s)'); ylabel('cell #'); | ||

- | </code> | ||

- | |||

- | You should see: | ||

- | |||

- | {{ :analysis:course:week10_qmat.png?600 |}} | ||

- | |||

- | If you zoom in to a smaller slice of time, you will notice that there are gaps in the data, i.e. segments without any activity whatsoever. This is a quirk of this particular data set: the epochs when the rat is in transit between the pedestals and the track have been removed to facilitate spike sorting. | ||

- | |||

- | Our Q-matrix only includes non-zero counts when the animal is running on the track; these episodes manifest as narrow vertical stripes. To speed up calculations later, let's restrict Q to those times only: | ||

- | |||

- | <code matlab> | ||

- | LoadMetadata; | ||

- | Q_tsd = tsd(Q_tvec_centers,Q); | ||

- | Q_tsd = restrict(Q_tsd,metadata.taskvars.trial_iv); % metadata contains experimenter annotations - here, intervals when rat ran the track | ||

- | </code> | ||

- | |||

- | The final step before the actual decoding procedure is to reformat the tuning curves a bit to make the decoding easier to run. Instead of keeping them as a 2-D matrix, we just unwrap this into 1-D: | ||

- | |||

- | <code matlab> | ||

- | %% prepare tuning curves | ||

- | clear tc | ||

- | nBins = numel(occ_binned); | ||

- | nCells = length(S.t); | ||

- | for iC = nCells:-1:1 | ||

- | tc(:,:,iC) = all_tc{iC}; | ||

- | end | ||

- | tc = reshape(tc,[size(tc,1)*size(tc,2) size(tc,3)]); | ||

- | occUniform = repmat(1/nBins,[nBins 1]); | ||

- | </code> | ||

- | |||

- | === Running the decoding algorithm === | ||

- | |||

- | Aaandd... action! | ||

- | |||

- | <code matlab> | ||

- | %% decode | ||

- | Q_tvec_centers = Q_tsd.tvec; | ||

- | Q = Q_tsd.data; | ||

- | nActiveNeurons = sum(Q > 0); | ||

- | |||

- | len = length(Q_tvec_centers); | ||

- | p = nan(length(Q_tvec_centers),nBins); | ||

- | for iB = 1:nBins | ||

- | tempProd = nansum(log(repmat(tc(iB,:)',1,len).^Q)); | ||

- | tempSum = exp(-binsize*nansum(tc(iB,:),2)); | ||

- | p(:,iB) = exp(tempProd)*tempSum*occUniform(iB); | ||

- | end | ||

- | |||

- | p = p./repmat(sum(p,2),1,nBins); % renormalize to 1 total probability | ||

- | p(nActiveNeurons < 1,:) = 0; % ignore bins with no activity | ||

- | </code> | ||

- | |||

- | ☛ Compare these steps with the equations above. | ||

- | |||

- | * There is no log in the equations; why does it appear here? | ||

- | * What parts of the equation correspond to the ''tempProd'' and ''tempSum'' variables? | ||

- | |||

- | === Visualizing the results === | ||

- | |||

- | The hard work is done. Now we just need to display the results. Before we do so, we should convert the rat's actual position into our binned form, so that we can compare it to the decoded estimate: | ||

- | |||

- | <code matlab> | ||

- | xBinned = interp1(ENC_pos.tvec,pos_idx(:,1),Q_tvec_centers); | ||

- | yBinned = interp1(ENC_pos.tvec,pos_idx(:,2),Q_tvec_centers); | ||

- | </code> | ||

- | |||

- | Now we can visualize the decoding (press Ctrl-C to break out of the loop): | ||

- | |||

- | <code matlab> | ||

- | goodOccInd = find(occ_hist > 0); | ||

- | SET_nxBins = length(x_edges)-1; SET_nyBins = length(y_edges)-1; | ||

- | |||

- | dec_err = nan(length(Q_tvec_centers),1); | ||

- | |||

- | for iT = 1:length(Q_tvec_centers) | ||

- | cla; | ||

- | temp = reshape(p(iT,:),[SET_nyBins SET_nxBins]); | ||

- | toPlot = nan(SET_nyBins,SET_nxBins); | ||

- | toPlot(goodOccInd) = temp(goodOccInd); | ||

- | |||

- | pcolor(toPlot); axis xy; hold on; caxis([0 0.5]); | ||

- | shading flat; axis off; | ||

- | |||

- | hold on; plot(yBinned(iT),xBinned(iT),'ow','MarkerSize',15); | ||

- | |||

- | % get x and y coordinates of MAP | ||

- | [~,idx] = max(toPlot(:)); | ||

- | [x_map,y_map] = ind2sub(size(toPlot),idx); | ||

- | | ||

- | if nActiveNeurons(iT) > 0 | ||

- | dec_err(iT) = sqrt((yBinned(iT)-y_map).^2+(xBinned(iT)-x_map).^2); | ||

- | end | ||

- | | ||

- | plot(y_map,x_map,'g*','MarkerSize',5); | ||

- | | ||

- | h = title(sprintf('t %.2f, nCells %d, dist %.2f',Q_tvec_centers(iT),nActiveNeurons(iT),dec_err(iT))); | ||

- | if nActiveNeurons(iT) == 0 | ||

- | set(h,'Color',[1 0 0]); | ||

- | else | ||

- | set(h,'Color',[0 0 0]); | ||

- | end | ||

- | drawnow; pause(0.1); | ||

- | end | ||

- | </code> | ||

- | |||

- | This plot shows the posterior $P(\mathbf{x}|\mathbf{n})$, as the rat moves around the maze; its actual position is indicated by the white ''o'', and the pixel with the highest posterior probability is indicated by the green ''*''. As you can see, the decoding seems to track the rat's actual location as it moves. | ||

- | |||

- | ☛ No decoding is available for those bins where no neurons are active, because we manually set the posterior to zero. However, there also seem to be some frames in the animation where some neurons are active (as indicated in the title), yet no decoded estimate is visible. What is the explanation for this? | ||

- | |||

- | === Optional diversion: exporting the results to a movie file === | ||

- | |||

- | By making the %%MATLAB%% animation into a movie file, it is often easier to explore the results. To do this, we can run the animation code above, with a few small modifications. First, before entering the main plotting loop, set the figure to be used to a specific size: | ||

- | |||

- | <code matlab> | ||

- | h = figure; set(h,'Position',[100 100 320 240]); | ||

- | </code> | ||

- | |||

- | This is important first, to keep the size of the resulting movie file manageable (the above sets a 320x240 pixel figure size), and second, because many movie encoders (such as the excellent [[http://www.xvid.org/ | XVid]]) will only work with certain sizes. | ||

- | |||

- | Next, we need to store each frame into a variable that we can later write to file. Modify the last two lines inside the loop to: | ||

- | |||

- | <code matlab> | ||

- | f(iT) = getframe(gcf); % store current frame | ||

- | drawnow; | ||

- | </code> | ||

- | |||

- | If you now run the code again, each frame gets stored in the ''f'' variable as the loop runs. Break out of the loop after a few seconds to test the writing-to-file part: | ||

- | |||

- | <code matlab> | ||

- | fname = 'test.avi'; | ||

- | movie2avi(f,fname,'COMPRESSION','XVid','FPS',10); | ||

- | </code> | ||

- | |||

- | The above will only work if you have the %%XVid%% codec installed: I highly recommend this because it creates movie files that are an order of magnitude smaller than uncompressed files. If you have trouble with %%XVid%%, you can of course still save an uncompressed file for now. For longer movies, it is often required to save a file, say, every 500 frames, to prevent the ''f'' variable getting too large. These segments can then be merged with a video editing program such as [[http://www.virtualdub.org/ | VirtualDub]] (Windows only %%AFAIK%%; please suggest %%OSX/Linux%% alternatives if you know any that work well!). | ||

- | |||

- | === Quantifying decoding accuracy === | ||

- | |||

- | By running the above loop without plotting we can obtain the "decoding error", that is the distance between the maximum a posteriori (MAP) estimate and the rat's true position: | ||

- | |||

- | <code matlab> | ||

- | %% get distance (no plotting) | ||

- | dec_err = nan(length(Q_tvec_centers),1); | ||

- | |||

- | SET_nxBins = length(x_edges)-1; SET_nyBins = length(y_edges)-1; | ||

- | |||

- | xBinned = interp1(ENC_pos.tvec,pos_idx(:,1),Q_tvec_centers); | ||

- | yBinned = interp1(ENC_pos.tvec,pos_idx(:,2),Q_tvec_centers); | ||

- | |||

- | for iT = 1:length(Q_tvec_centers); | ||

- | | ||

- | temp = reshape(p(iT,:),[SET_nyBins SET_nxBins]); | ||

- | toPlot = nan(SET_nyBins,SET_nxBins); | ||

- | toPlot(goodOccInd) = temp(goodOccInd); | ||

- | |||

- | % get x and y coordinates of MAP | ||

- | [~,idx] = max(toPlot(:)); | ||

- | [x_map,y_map] = ind2sub(size(toPlot),idx); | ||

- | | ||

- | if nActiveNeurons(iT) > 0 | ||

- | dec_err(iT) = sqrt((yBinned(iT)-y_map).^2+(xBinned(iT)-x_map).^2); | ||

- | end | ||

- | |||

- | end | ||

- | plot(Q_tvec_centers,dec_err,'.k'); | ||

- | </code> | ||

- | |||

- | A nice way to plot this is to average by lap as well as overall: | ||

- | |||

- | <code matlab> | ||

- | % get trial id for each sample | ||

- | trial_id = zeros(size(Q_tvec_centers)); | ||

- | trial_idx = nearest_idx3(run_start,Q_tvec_centers); % NOTE: on non-Windows, use nearest_idx.m | ||

- | trial_id(trial_idx) = 1; | ||

- | trial_id = cumsum(trial_id); | ||

- | %plot(dec_err_tsd.tvec,trial_id,'.k'); | ||

- | |||

- | figure; set(gca,'FontSize',18); | ||

- | boxplot(dec_err,trial_id); | ||

- | xlabel('trial'); ylabel('decoding error (pixels)'); | ||

- | |||

- | av_error = nanmean(dec_err); | ||

- | title(sprintf('avg err %.2f',av_error)); | ||

- | </code> | ||

- | |||

- | This yields: | ||

- | |||

- | {{ :analysis:cosmo2014:dec_err_1step_250ms.png?600 |}} | ||

- | |||

- | Thus, on average our estimate is 1.87 pixels away from the true position. Earlier laps seem to have some more outliers of bins where our estimate is bad (large distance) but there is no obvious trend across laps visible. | ||

- | |||

- | ☛ How does the decoding accuracy depend on the bin size used? Try a range from very small (10ms) to very large (1s) bins, making sure to note the average decoding error for 50ms bins, for comparison with results in the next module. What factors need to be balanced if the goal is maximum accuracy? | ||

- | |||

- | A different way of looking at the decoding error is to plot it as a function of space: | ||

- | |||

- | <code matlab> | ||

- | cfg = []; | ||

- | cfg.y_edges = y_edges; cfg.x_edges = x_edges; | ||

- | |||

- | dec_err_tsd = tsd(Q_tvec_centers,dec_err); | ||

- | space_err = TSDbySpace(cfg,ENC_pos,dec_err_tsd); | ||

- | |||

- | figure; | ||

- | pcolor(space_err); shading flat; axis off; colorbar; caxis([0 10]); | ||

- | </code> | ||

- | |||

- | This gives: | ||

- | |||

- | {{ :analysis:cosmo2014:2d_decerror_space_250ms.png?600 |}} | ||

- | |||

- | It looks like the decoding error is on average larger on the central stem, compared to the arms of the maze. | ||

- | |||

- | ☛ What could be some reasons for this? Can you think of ways to test your suggestions? | ||

- | |||

- | ==== Challenges ==== | ||

- | |||

- | Visual inspection of the animation or movie suggests that the decoding does a decent job of tracking the rat's true location. However, especially because of the number of parameters involved in the analysis (bin size, how firing rates are computed, the Poisson and independence assumptions, etc.) it is important to quantify how well we are doing. | ||

- | |||

- | ★ Modify the visualization code above to also compute a //decoding error// for each frame. This should be the distance between the rat's actual location and the location with the highest posterior probability (the "maximum a posteriori" or MAP estimate). Plot this error over time, excluding those bins where no cells were active. How does this error change over the course of the session? How does it change if you reduce the bin size for decoding to 100ms? |

analysis/course-w16/week10.txt · Last modified: 2018/07/07 10:19 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International