Spike sorting is the process of assigning experimentally recorded voltage waveforms to putative single neurons. Extracellular recordings generally contain contributions from multiple neurons, as well as noise, and therefore cannot be directly converted to spike trains.
The spike sorting process relies on the idea that extracellularly recorded waveforms are a function of (1) a neuron's morphological (size, shape) and electrical properties (channel distributions, states), and (2) the location of the recording electrode relative to the neuron (see Buzsaki et al. 2012). Thus, waveforms generated by different neurons are expected to result in differently sized and shaped waveforms on a given electrode.
If “features” extracted from these waveforms, such as their peak voltage, width, or energy (area under the curve squared) form distinct clouds in feature space, these clouds are inferred to correspond to distinct, isolated neurons.
Spike sorting can be a laborious and frustrating process. It is surprising that despite decades of research, no satisfactory automated method has been found! One reason for this is that correct spike sorting needs to rely on just the right amount of domain knowledge of what is likely and unlikely: refractory periods, electrode instability, adaptation of action potential amplitude, and variation in where on the dendritic tree synaptic inputs arrive all affect the shape of clusters in ways that current automated methods do not capture well.
Yet, there is definitely satisfaction in spike sorting electrodes with plentiful clusters, in anticipation of the beautiful data set that will result!
A conceptual overview of the spike sorting process for tetrode recordings is shown below (from Einevoll et al. 2011):
This image illustrates the process from extracellular potentials recorded by a tetrode (b) to spike trains (h). For our purposes, the most important steps are:
Data acquisition systems (such as our Neuralynx Digital Lynx) can be configured to perform steps b-e on-line during recording, but recording only the raw data (b) is also an option. Recording the raw data affords opportunities to optimize steps c-e but is more storage- and time-intensive.
Assuming you collect aligned spike data (.ntt files for Neuralynx), further analysis for these data proceeds as follows:
(Note that the resulting files are part of a promoted data set, discussed in Module 2.)
The next section provides more detailed information on how to implement each step of the above pipeline.
The rest of this module assumes you are using the
R042-2013-08-14 example data set. If you want to work with your own data, create a local copy of your Incoming or InProcess data (see Module 2 for a reminder of what these stages are) before proceeding.
If you recorded
.ntt files that you want to spike-sort (or
.nst in the case of stereotrodes), rename them into the standard format (Rxxx-yyyy-mm-dd-TTnn.ntt for rats, mxxx-yyyy-dd-TTnn.ntt for mice; note that you should use 3 digits for the number, e.g. 001 instead of 1!). You can do this manually or write a simple batch script to do the renaming. This has already been done for the example data set.
If you have already cloned the
nsb2014 repository on GitHub (see Module 1 for instructions) you already have the MClust data files. However, some of the MClust .m files conflict with those in the main codebase. Therefore, you need a different path for using MClust. Create a new shortcut as follows:
restoredefaultpath; % start with a clean slate cd('D:\My_Documents\GitHub\nsb2014\MClust-4.1'); % replace with your own path p = genpath(pwd); % create list of all folders from here addpath(p);
After running this shortcut, you should be able to type
MClust and have the main window appear. If you do, make sure to close MClust again before proceeding to the next step.
Note that as of 19/July/2014, the official release of MClust contains several bugs that have been fixed in the
nsb2014 code, so this is not recommended.
KlustaKwik is an unsupervised clustering algorithm that sorts the waveform data into clusters. This makes spike sorting much faster compared to doing everything manually from scratch!
RunClustBatch() is a script that automatically runs KlustaKwik on all tetrode files in the current working directory. It has some default settings, defined in the
.m file that you can look at, such as the maximum number of clusters, which waveform features to use, et cetera. For the sample dataset we will use the defaults, except for one thing: the third wire of the tetrode was disabled, so we want to exclude it from processing. This we do as follows:
RunClustBatch('channelValidity',[1 1 0 1]);
If you have your own data that you want to spike sort which has all channels of a tetrode working, you can simply do
RunClustBatch without the optional channelValidity argument. Remember to run this inside your data folder!
Note that by default, RunClustBatch calls KlustaKwik.exe, included by default in the MClust distribution. This is fine for Windows machines, but if you are running something else you will need to visit KlustaKwik's home on SourceForge.
If you get a message that MClust is already running, do a
As shown in the above diagram, RunClustBatch generates feature files (.fet) containing peak, energy, etc. values for each waveform, and cluster files (.clu) containing the cluster assignments. These are stored in the FD folder. The next step needs these .clu files to work.
MClust to obtain the main window as shown below:
Because we will be loading a tetrode which has the third lead disabled, uncheck the “Ch 3” box. Then, click “Create/Load FD files” and select “R042-2013-08-14-TT13.ntt”. The gray rectangle should turn green and show the name of the loaded tetrode file.
The goal of the KlustaKwik decision window is to decide which (pre)clusters you want to keep and which you want to toss. Once you have your preclustered tetrode loaded, click the “Select from KKwik” button in the MClust main window. You will see some figures open, like this:
(You may have to move and/or resize your windows to get them to look like this.) Apart from the main control window (“Cutting Control Window”), several other figures are provided to show information about the currently selected cluster:
For each cluster, the information displayed in these windows determines your decision of whether to keep or toss it. The default is to toss; to toggle, click on the “toss” button. The first cluster is selected by default. As you can see from the scatterplot, this is a cluster with a small amount of diffuse spikes that did not fit clearly in any cluster. We can toss this one.
To move on to the next cluster, locate the radio button next to “Focus” in the control window and click on the one directly below it. This selects cluster 2.
If your KlustaKwik run returned the same results as mine, you should see that this cluster appears to consist of a tight cloud of spikes, plus some others that are clearly separate:
(Note that this is showing Peak 1against Peak 4.)
We want to keep the tight cloud, which we will need to manually separate from the other spikes later. Click on the “2” to select a different color than black for this cluster, and toggle to “keep” this one.
Proceed one by one through the other clusters, deciding whether to keep or toss. Assign colors to keeps; I use a specific gray color to indicate questionable clusters, and strong primary colors for ones that seem good. Related clusters can be given similar colors to facilitate manual touch-up later.
If you want to merge two (pre)clusters, rename one of them to match the name of the other by clicking on the “KKxx” labels. All clusters with the same name will be merged. Because it is easier to merge than to split clusters, it is often better to instruct KlustaKwik to “over-cluster” the data to avoid cases like the above example (this example is likely “under-clustered”). The Decision Window also has some other features that are described in the MClust manual.
When you are satisfied, click Exit (Export) to return to the MClust main window. You will see the black panel in the top left now indicates a certain number of clusters.
Click the ManualCut button in the main MClust window to open the manual touch-up process. You will see a list of clusters with the names and colors exported from the KlustaKwik decision window.
In the main cutting window, check the “Redraw Axes” box to make a scatterplot of all clusters appear. By default, this is plotted with small dots, but clusters will be easier to see if we change the marker size to size 2 circles. Do this using the drop-down menus indicated in red below.
Let's first clean up the split cluster we encountered above. To see only this cluster, first click “Hide”, and then uncheck the checkbox for KK02. You should now see only this cluster in the scatterplot.
To see the waveforms for this cluster, click on the drop-down menu to the right of the KK02 label and select Show Waveforms (you may have to scroll down for this). The slider at the bottom controls the number of waveforms plotted; things are usually easier to see by plotting a small number. In this case, inspecting the waveforms should reveal that there are two clearly distinct waveforms visible on lead 4 (the rightmost subplot). These are unlikely to be from the same neuron.
By default the scatterplot shows Peak 1 against Peak 2. Change this to show Peak 1 against Peak 4; this should result in two clearly distinct clusters.
Select “SplitSpikesByCvxHull”. You should get a cursor (crosshair) in the cluster cutting window. Clicking multiple times around the top cluster to define a convex shape enclosing the spikes. The shape will automatically be completed once you press Enter. You should see that KK02 now contains only the spikes at the top, and a new cluster is appended to the list of clusters (named KK02-split for me) containing the spikes outside the enclosed region:
(Note, by the way, that if you still have the waveforms window open, these did not get updated.)
By clicking through a few different feature projections and inspecting the new waveforms, it now appears that we have a tight cluster that is no longer split into multiple pieces.
The 01_CheckCluster option provides a convenient summary of a cluster's properties, to be reviewed before accepting:
Important indicators of cluster quality are:
Clusters are sets of points in multidimensional feature space. The L-ratio, indicates how “tight” the points in a cluster are, and the isolation distance (ID) indicates how well isolated the points in a cluster are from those outside it. Lower L-ratios and higher IDs are good. Typical values for including a cluster are L-ratio < 0.1 and ID > 15.
However, these statistical measures are not perfect. For instance, a cell that is cut off by the detection threshold can have an artificially high detection threshold. Thus it is helpful to also manually rate each cluster on a scale from 1 (good) to 5 (bad). Do this by renaming each cluster you want to keep and simply having an 1-5 number as the first character. For example, a typical cluster name might be “5, cutoff” or “4, instability”. The LoadSpikes() function can take an optional input argument to only load, for instance, clusters with at least a 4 rating.
These ratings should therefore NOT be based solely on the LR and ID. Guidelines for these are:
When you are satisfied that you have only clusters left that you want to keep, click Export Clusters. Note that by default, this overwrites the clusters associated with the main MClust window. The main MClust window does not automatically update unless you Export!
Then, from the main MClust window, click “Write .t, WV, CQ” which saves a .t file for each cluster containing the spike times (see Module 2 for more information on data formats) as well as waveform and cluster quality files for later use. Any “Write files” option also saves a .clusters file that allows you to load the cluster definitions.
The writing of these files concludes the spike sorting process!
clear allbefore opening a new one.)
MClust supports three different cluster types: