wiki

~~DISCUSSION~~ ===== Module 2: Introduction to neural data types, file formats and preprocessing ===== Goals: * Consider the common structure in diverse (neural) data sets * Understand how such data can be intuitively and efficiently represented in three custom MATLAB data types (''ts'', ''tsd'', and ''iv'') * Learn where to find commonly used utility functions that perform basic operations on these data types * Obtain a basic overview of the different file formats saved by a Neuralynx system (as an example) * Become aware of the pre-processing steps typically applied to raw data * Get to know the different files in a pre-processed ("promoted") vandermeerlab data set, and their relationship to the raw data * Use the low-level and wrapped loading functions for all these files Resources: * (reference) [[http://www.mathworks.com/help/matlab/structures.html|Introduction to MATLAB structures (structs)]]; if you are new to MATLAB, you should make sure you work through the howto sections "Create a Structure Array" and "Access Data in a Structure Array". * (reference) Same thing for [[http://www.mathworks.com/help/matlab/cell-arrays.html|cell arrays]]. * (reference) I mean it. Nothing that follows will make sense if you don't know about structs and cell arrays. * (reference) {{:analysis:nsb2014:neuralynxdatafileformats.pdf|Neuralynx data file formats specification (technical)}} ===== Introductory remarks ===== Careful analysis of neural data begins with a thorough understanding of the raw data that is saved by your data acquisition system(s). However, raw data is only rarely suitable for analysis beyond a few quick checks. At a minimum, freshly acquired data sets typically must be annotated, and/or the files systematically renamed -- for instance, with the ID of the experimental subject and some information about recording locations -- so that the analyst can select which files to analyze, and combine results across sessions and subjects. More complex pre-processing steps include spike sorting (the process of assigning spike waveforms to putative single neurons to obtain their spike times), artefact removal, and many others. Pre-processed data can be loaded into MATLAB, typically using code provided by the vendor, or perhaps by something created by the community. Either way, how you represent the data -- what **data types** you use -- has a major bearing on how effectively you can accomplish multiple [[analysis:nsb2015:week0|principles of careful data analysis]]. Therefore, we will begin with a consideration of the general structure of neural data sets. A word of warning and encouragement: this module is probably the least exciting of the course, but it's important to get the fundamentals in place before we get to the exciting stuff! ===== Structure of (describing) neural data ===== === Sampled signals === In general, data acquisition systems work by //sampling// (i.e. periodically taking a measurement of) some quantity of interest, such as the potential difference (voltage) between an electrode placed in a brain area of interest and a reference. That is, measurements of a signal are repeatedly taken, at some finite //sampling rate//, as the signal evolves over time. This kind of data is often referred to as //time series data//, and may look like this: {{ :analysis:nsb2014:tsd_example.png?300 |}} Time is shown on the horizontal axis ("abscissa"), and the value of this particular quantity (on the vertical axis, "ordinate") is changing over time. At regular intervals -- the inverse of the sampling rate (1 / Fs), to be precise -- we obtain a measurement, indicated by the black dots. We are blind to any changes in between the samples, illustrated by the unbroken line. Obviously, this illustration shows a single signal, but many systems will record multiple signals simultaneously, such as an EEG system that records voltages from 256 scalp electrodes, or a rodent electrophysiology system that in addition to neural data also records the position of the animal, a video camera records a signal in each of many pixels, and so on. The fact that we are dealing with sampled signals has some important consequences for data analysis, broadly captured by the term //sampling theory//, of which we will introduce a few in Module 5. The "Nyquist limit" and the "Moiré effect" (aliasing) are two well-known examples. === Point processes (timestamps) === Neuroscience attributes particular significance to action potentials, or "spikes", which are typically understood as all-or-none events that occur at a specific point in time (hence the technical term, //point process//). To describe a train of spikes, it is not necessary to state all the times at which there was no spike: it suffices to maintain a list of those times (sometimes called timestamps) at which a spike train was emitted. The same description works well for other quantities of interest which are essentially punctate //events//, such as delivery of a reward pellet, initiation of a key press, and so on. === Intervals === Although uncommon in raw neural data, time intervals (epochs with a certain duration, rather than a point in time) commonly arise in some aspect of experimental procedures and analyses. Intervals describe occurrences that have start and end times, such as a //trial// of an experiment, the presence of a cue (e.g. a light or a tone), et cetera. Together, these three types of data can describe most data sets encountered in neuroscience. Putting all three together in a simple visualization might look something like this: {{ :analysis:course-w16:gamma.png?600 |}} At the top of the figure, you can see several rows containing point process data: the dots indicate spikes, one row per neuron. At the bottom, you see time series data (a local field potential), and the colored blocks show intervals with two different labels, indicated by the color. ☛ Now, think about one of your own experiments. How would you describe the data you collect? What quantities (signals) of interest are time series data, what are point processes, and what is best described as intervals? ===== Introduction to vandermeerlab data types ===== A //data type// is the computer science term for a standardized format of representing data. Classical data types include things like [[http://en.wikipedia.org/wiki/Integer|integers]] and [[http://en.wikipedia.org/wiki/Floating_point|floating-point]] numbers, but our data types of interest are essentially all MATLAB [[http://www.mathworks.com/help/matlab/structures.html|structs]] with particular constraints on field names and formats. (Note for the connoisseurs: the choice to not implement these data types as MATLAB objects is deliberate.) The three main data types are (1) timestamped data (TSD), (2) timestamps (TS), (3) and intervals (IV), discussed in turn below. Standardizing how we represent these data makes it possible for commonly used functions to be used on any data set -- good for readability and robustness! ==== Timestamped data (TSD) data-type ==== As introduced above, a sampled signal is essentially a list of data points (values), taken at specific times. Thus, what we need to fully describe such a signal is two arrays of the same length: one with the timestamps and the other with the corresponding values. This is exactly what the timestamped data (TSD) data type is, as illustrated by the ''LoadCSC()'' function: <code matlab> %% load data cd('D:\Data\R016\R016-2012-10-08'); % same session as Module 1 cfg = []; cfg.fc = {'R016-2012-10-08-CSC02d.ncs'}; % cell array with filenames to load csc = LoadCSC(cfg); >> csc csc = type: 'tsd' tvec: [5498360x1 double] data: [1x5498360 double] label: {'R016-2012-10-08-CSC02d.ncs'} cfg: [1x1 struct] </code> The TSD data type has the following fields: * ''type'': string indicating data type, 'tsd' * ''tvec'': //nSamples x 1 double//, timestamps (in seconds) * ''data'': //nSignals x nSamples double//, values (units can be specified in cfg if needed) * ''label'': //nSignals x 1 cell array//, filenames * ''cfg'': content depends on specific data, but always has a ''history'' field. For CSC data, there is also ''hdr'', ''ExpKeys'', and ''SessionID''. Thus, the ''tvec'' field and the ''data'' field together define the sampled signal. In the above example, we only loaded one ''.ncs'' file (a single local field potential, recorded from a specific electrode in the brain) and therefore there is only one label, containing the filename. To plot this data you can simply do ''plot(csc.tvec,csc.data)''. ☛ Consider the ''tvec'' field in the struct above. If the sampling rate for a given signal is constant, is this field strictly necessary? Can you think of a way to describe such an idealized signal more efficiently (i.e. by taking up less memory)? ☛ How does ''LoadCSC()'' represent multiple, simultaneously acquired, signals? A nice way to do so is to use a config field like ''cfg.fc = FindFiles('*CSC01*.ncs');''. If at some point you want to construct a ''tsd'' variable yourself, you can do ''help tsd'' to see how. The ''tsd()'' function is a //constructor// for variables of type ''tsd''. ''LoadCSC()'' calls this function to create an empty ''tsd'' template, and then fills it with data loaded from ''.ncs'' files; you can check if the result meets the specification by calling ''CheckTSD()''. There are a number of functions that work with ''tsd'' data: some of these can be found in the tsd [[https://github.com/mvdm/neuraldata-w16/tree/master/shared/datatypes/tsd | folder]] in the %%GitHub%% repository. Two other important ones you will meet in this module are ''restrict()'' and ''getd()'' (which also work on other data types, below). ==== Timestamp (TS) data-type ==== A different data type is needed to describe sets of punctate events (a //point process// in statistics), such as times of action potentials (spikes) or task events such as reward delivery times. For this we use the TS (timestamp) data type, defined as follows: * ''type'': string indicating data type: 'ts' * ''t'': //nSignals x 1 cell array//, timestamps (in seconds) * ''label'': //nSignals x 1 cell array//, labels * ''usr'': //nSignals x nUsr double//, optional additional data corresponding to intervals * ''cfg'': content depends on specific data, but always has a ''history'' field. An example is provided by the function ''LoadEvents()'', which loads the timestamps of events used in this particular experiment (such as the delivery of reward pellets): <code> %% remember to use Cell Mode in the editor to run this code! cfg = []; evt = LoadEvents(cfg); >> evt evt = type: 'ts' t: {1x109 cell} label: {1x109 cell} cfg: [1x1 struct] </code> Note how several of the fields of the resulting ''evt'' struct are [[http://www.mathworks.com/help/matlab/cell-arrays.html|cell arrays]]. Because we provided ''LoadEvents()'' with an empty config input, it by default loads the times of all events it can find. As you can see by the size of the cell arrays, there are 109 labels here. Let's look at some of them: <code matlab> >> evt.label(1:3) % display first three labels ans = '1 or 5 pellet cue' '1 pellet cue' '1 pellet dispensed' </code> Taking the second label as an example, it describes an experimental event: the onset of a cue (tone in this case). The corresponding timestamps (in seconds) can be found in the second ''.t'' field: <code matlab> >> evt.t{2} ans = 1.0e+03 * Columns 1 through 8 1.1475 1.1533 1.1706 1.1798 1.2190 1.2255 1.2380 1.2435 (...) </code> These timestamps completely describe a point process (timestamp data). ☛ Why do you think the event times (in the ''.t'' field) are stored in a cell array, rather than in a matrix? A way to address timestamps by label is provided by the ''getd()'' function: <code matlab> plot(getd(evt,'1 pellet cue'),0,'.k') % retrieve times associated with 1 pellet cue and plot each time against zero </code> ☛ ''getd()'' also works for ''tsd'' data. Try plotting a specific channel this way. A different function that loads data into a ''ts'' data type is ''LoadSpikes()''. Try it: <code matlab> S = LoadSpikes([]) </code> Notice how instead of creating an empty config variable and passing it as an input, I now just passed an empty array ''[]'' as an input directly. This instructs ''LoadSpikes()'' to load all spike files it can find. As you can see from the labels, two different files were loaded: as will be explained below. ''*.t'' indicates a file containing spike times from one neuron. ☛ How many spikes did the second neuron emit in this session? As with ''tsd'' data above, you can call the ts constructor ''ts()'' to start with a template that you can then fill with data if you want to build your own. The ts [[https://github.com/mvdm/neuraldata-w16/tree/master/shared/datatypes/ts | folder]] on %%GitHub%% contains some other utility functions that work with timestamp data. ==== Interval (IV) data-type ==== Interval data -- matched sets of start and end times -- is typically not loaded directly from data files. However, it commonly comes up during analysis, for instance when defining trials, running vs. resting epochs, sharp wave-ripple complexes, et cetera. Interval data is defined as follows: * ''type'': //string// to indicate data type, 'iv' * ''tstart'': //nIntervals x 1 double//, interval start times (in seconds) * ''tend'': //nIntervals x 1 double//, end times (in seconds) * ''usr'': //nIntervals x nUsr double//, optional additional data corresponding to intervals * ''cfg'': content depends on specific data, but always has a ''history'' field. Some common ways of creating an iv object from scratch are the following: <code matlab> >> a = iv([1 2]) % define a single interval from 1 to 2 a = type: 'iv' tstart: 1 tend: 2 usr: [] cfg: [1x1 struct] >> b = iv([1 2],[3 3]) % define two intervals, 1 to 3 and 2 to 3 b = type: 'iv' tstart: [2x1 double] tend: [2x1 double] usr: [] cfg: [1x1 struct] </code> There are a number of useful functions available that work with interval data. One of the most useful ones is ''TSDtoIV()'' which will be demonstrated below. The ''iv'' [[https://github.com/mvdm/neuraldata-w16/tree/master/shared/datatypes/iv | folder]] on the codebase has a number of functions whose functions you can guess from their names, for instance, ''IntersectIV()'' computes the intersection between two sets of intervals (i.e. output only those intervals in A which overlap with intervals in B). ===== Data files overview ===== Our next goal is to learn about the different kinds of data and associated information that make up a typical neural recording session (as an example, we will use Neuralynx data; obviously the details will be different for other systems), and to meet the various loading functions that will enable you to access them in the data type formats introduced above. Make sure you have the data session ''R042-2013-08-18'' from the lab database, and that this is placed in a sensible location (NOT in a %%GitHub%% or project folder! See [[analysis:course-w16:week1|Module 1]] if this is not obvious). This folder contains data from a single recording session that has been pre-processed so that it is ready for analysis. Such a pre-processed data set is referred to as "promoted"; raw data that has just been recorded is "incoming", data being pre-processed is "inProcess". The schematic below (drawn using the [[http://www.graphviz.org/ | dot]] tool in %%GraphViz%%) gives an overview of the major data files and their transformation during pre-processing: <graphviz> strict digraph G { resolution = 300; fontname = Helvetica; overlap = false; rankdir = BT; /* concentrate = true; */ node [fontname = "Helvetica",shape=ellipse,fontsize=9]; edge [fontname = "Helvetica",fontsize=9]; subgraph cluster0 { /* promoted data */ node [style=filled,fillcolor=white,color=black,fontsize=10]; style=filled; color=lightgrey; spk [label="spikes\n*.t files"]; lfp [label="LFPs\n*.ncs files"]; pos [label="position\n*.mat file (tsd)"]; evt [label="events\n*.nev file"]; keys [label="ExpKeys\n*keys.m file"]; md [label="metadata\n*metadata.mat file"]; wv [label="Waveforms\n*wv.mat files"]; cq [label="ClusterQual\n*.mat files"]; } subgraph cluster1 { /* legend */ node [color=black,style=filled,fillcolor=white,fontsize=10]; style = "filled"; color=".3 .3 .7"; label = "legend"; l1 [label="processed file"]; l2 [shape=rectangle,label="raw file",fillcolor=purple,fontcolor=white]; l3 [style=filled,shape=diamond,fillcolor=red,fontcolor=white,nedges=5,label="function"]; } ann [shape=plaintext,label="description of\nsubject, session,\nconditions, etc."]; ann -> keys [label=" annotation"]; ann2 [shape=plaintext,label="details of\ntrials, etc."]; ann2 -> md [label=" annotation"]; raw_evt [style=filled,shape=rectangle,label="*.nev file",fillcolor=purple,fontcolor=white]; raw_evt -> evt [label=" rename"]; raw_pos [style=filled,shape=rectangle,label="*.nvt file",fillcolor=purple,fontcolor=white]; raw_pos -> pos [label=" load into MATLAB\n save as tsd"]; raw_lfp [style=filled,shape=rectangle,label="*.ncs files",fillcolor=purple,fontcolor=white]; raw_lfp -> lfp [label=" rename"]; ntt [style=filled,shape=rectangle,label="*.ntt files",fillcolor=purple,fontcolor=white]; ntt_ren [shape=rectangle,label="renamed *.ntt files"]; mclust [shape=rectangle,label="*.clu files"]; temp1 [style=filled,shape=diamond,fillcolor=red,fontcolor=white,nedges=5,label="MClust"]; ntt -> ntt_ren [label=" rename"]; ntt_ren -> mclust [label=" automatic\n preclustering\n (KlustaKwik)"]; mclust -> temp1; ntt_ren -> temp1; temp1 -> spk; // ccqf [style=filled,shape=diamond,fillcolor=red,fontcolor=white,nedges=5,label="CreateCQFile()"]; ntt_ren -> ccqf; spk -> ccqf; ccqf -> wv; ccqf -> cq; } </graphviz> The files you find in a promoted folder such as ''R042-2013-08-18'' are those enclosed in the gray box. They are: * Each ''.ncs'' file ("**N**euralynx **C**ontinuously **S**ampled") contains a single channel of continuously sampled voltage data. The sampling rate and filters for these channels can be configured in the Cheetah data acquisition software. Typically, as in this data set, the sampling rate and filters are set so that these files are local field potentials (LFPs) sampled at 2kHz and filtered between 1 and 475 Hz. It is also possible to have wide-band, 32kHz ''.ncs'' files suitable for spike extraction, but these are not included in the current dataset. (We will discuss filtering in a subsequent module.) * Each ''.t'' file contains a set of times -- a spike train from a putative neuron. The qualifier "putative" is used because this is extracellular data and spike-sorting is not perfect, so it's likely there will be some spikes missing and some spikes included that are not from this neuron. Always remember this even if I will omit the "putative" from now on for short! ''*.t'' files are generated by %%MClust%%, a spike sorting tool developed by A. David Redish, from the raw ''*.ntt'' ("**N**euralynx **T**e**T**rode") files saved by Neuralynx. ''*.ntt'' files do not contain continuously sampled data; instead, a one-millisecond snapshot across the channels of a tetrode is stored whenever any of the four channels exceeds a threshold set in Cheetah by the experimenter. * The ''*.nvt'' file ("**N**euralynx **V**ideo **T**racking") contains the location of the rat as tracked by an overhead camera. For Neuralynx systems, this is typically sampled at 30 Hz. Because the raw files are large, they are usually stored in compressed (zip) format. The ''.nvt'' files are in units of camera pixels (typically 640x480). * The ''*.Nev'' file ("**N**euralynx **EV**ents") contains timestamps and labels of events, such as those input by the user during recording, received from experimental components connected to Neuralynx's digital I/O (Input/Output) port, and system messages such as recording start, data loss, et cetera. A critical part of any promoted data set is the following: * The ''*keys.m'' file, referred to as "%%ExpKeys%%" or "keys". This file contains experimenter-provided information that describes this data set. This information is stored as a ''.m'' file so that it can be edited and read by standard text editors (rather than having to be loaded into MATLAB to view, as would be the case for a ''.mat'' file). This file and the correct format for %%ExpKeys%% is explained in more detail [[https://github.com/mvdm/vandermeerlab/blob/master/doc/HOWTO_ExpKeys_Metadata.md|here]]. * The ''*metadata.m'' file, which like the %%ExpKeys%% contains descriptive information about the data set, such as start and end times of individual trials, but that is not desirable or practical to include in the %%ExpKeys%% file. See [[https://github.com/mvdm/vandermeerlab/blob/master/doc/HOWTO_ExpKeys_Metadata.md|here]] for guidelines on what should go in %%ExpKeys%% versus metadata. Next, we have: * ''*wv.mat'' files. There is one file for each ''*.t'' file, containing the average waveforms for that cell. * ''*ClusterQual.mat'' files. Also, one file for each ''*.t'' file, containing some cluster quality statistics. Both of these files are generated by a MATLAB script (''CreateCQFile.m'') or directly from %%MClust%% version 4.1 or higher. Finally, there is also: * the ''*vt.mat'' file. This contains the position data in ''tsd'' format (see above for a description of data types), after potential position artifacts have been removed, and the raw camera pixel units have been converted to centimeters. :!: **NOTE**: Some older data sessions may not have this conversion to centimeters done. What units the video data are in is not crucial for this tutorial, but in general it is a good idea to be aware of what these units are! ☛ Look at the contents of the ''R042-2013-08-18'' folder. Notice how each file is named: all start with ''R042-2013-08-18'' followed by a suffix indicating the file type and (if necessary) an identifier. Applying this naming scheme consistently is a key part of good data management because it enables provenance tracking -- which cells from what animal, what session, and what condition are contributing to each plot, et cetera. The **rename** steps in the above schematic are an important first step. ===== Using the low-level data loading functions ===== Neuralynx supplies a set of functions that load the raw data into MATLAB (included in your %%GitHub%% clone). We will use these one by one in the following subsections. A common theme is that all of these functions will output a ''Timestamps'' variable, indicating when each data sample or event occurred. Data acquisition systems need to solve the engineering challenge of aligning many different kinds of signals (video, neural activity, events) on a common timebase, so that relationships between them can be analyzed. These ''Timestamps'' are what ties the different data files together. By default, Neuralynx data loaders return timestamps in microseconds (us). Before getting started, create a folder with today's date in your [[analysis:course-w16:week1|project folder]], and create a new file in it named ''sandbox.m''. These sandbox files are not meant to be re-used or committed to %%GitHub%% -- as the name indicates, they are just a temporary file that is easier to work with compared to typing everything directly into the MATLAB Command Window. Next, make sure that your path is [[analysis:course-w16:week1|set correctly]] using a Shortcut button. Also, set MATLAB's current directory to the data folder (''R042-2013-08-18''); you can do this either using the MATLAB %%GUI%% (I often paste from Explorer into MATLAB) or by using the ''cd'' command. All instructions that follow should be pasted into a [[http://blogs.mathworks.com/videos/2011/07/26/starting-in-matlab-cell-mode-scripts/|cell]] in this sandbox file and executed from there (Ctrl-Enter when a cell is selected), unless they are prefaced with ''>>'' to indicate the Command Prompt. ==== Position data (*.nvt) loading ==== The low-level loading function for video data is ''Nlx2MatVT''. Deploy it as follows: <code matlab> %% load video data (make sure the VT1.zip file is unzipped first and now present in MATLAB's working folder!) [Timestamps, X, Y, Angles, Targets, Points, Header] = Nlx2MatVT('VT1.nvt', [1 1 1 1 1 1], 1, 1, [] ); </code> The abundance of ones in the function call are basically saying, "load everything" (type ''help Nlx2MatVT'' for the gory details). Notice that the output arguments (with the exception of the ''Header'') share a common dimension: <code matlab> >> whos Name Size Bytes Class Attributes Angles 1x131898 1055184 double Header 28x1 4262 cell Points 400x131898 422073600 double Targets 50x131898 52759200 double Timestamps 1x131898 1055184 double X 1x131898 1055184 double Y 1x131898 1055184 double </code> We appear to have 131898 samples of "X" and "Y", the main variables of interest, with corresponding timestamps. We can plot X against Y: <code matlab> >> plot(X,Y); </code> to get: {{ :analysis:nsb2014:module2_xvsy.png?600 |}} You can see the outline of a modified T-maze used for this recording session (rotated 90 degrees). Notice that this way of plotting the position data reveals something strange going on: there are many abrupt jumps to the (0,0) position! As it turns out, these are Neuralynx's way of indicating missing data (samples on which no position data could be acquired). ☛ Plot X against Y again, but this time without the missing data. A good way of doing this is to first define a variable ''keep_idx'' that contains the indices of those samples which you want to keep (i.e. that are not (0,0)). Inspect the resulting plot. The shape of the T-maze is now more clear; also visible are two roughly circular areas. These are the "pedestals" on which the rat can relax at the beginning and end of the recording session, as well as in between trials (if you want more details about what is going on this task, see [[http://ctnsrv.uwaterloo.ca/vandermeerlab/doku.php?id=analysis:task:motivationalt|here]]). I plotted my version as follows: <code matlab> %% plot video data -- use a new cell so that you can rerun this without also reloading the data fh = figure; set(fh,'Color',[0 0 0]); plot(X(keep_idx),Y(keep_idx),'.','Color',[0.7 0.7 0.7],'MarkerSize',1); axis off; </code> The first line opens a new figure, and uses its //handle// to set the background to black. The second line uses additional arguments for ''plot()'' to plot the X and Y data points not as a connected line, but as individual points of size 1 in a gray color. The result: {{ :analysis:nsb2014:module2_xvsy2.png?600 |}} It is useful to know how to save figures to a format that is easy to view: <code matlab> set(gcf, 'InvertHardCopy', 'off'); print(gcf,'-r75','-dpng','module2_xvsy2.png'); </code> The first line is necessary to preserve the black background. The second line saves a 75dpi PNG image. PNG is a good choice for saving MATLAB images, because it uses lossless compression and therefore will not cause ugly artifacts the way JPEG will. Let's look at the Timestamps next, by plotting the X data as a function of time: <code matlab> plot(Timestamps(keep_idx),X(keep_idx),'.r','MarkerSize',3) box off; set(gca,'FontSize',24); </code> Note the use of some different plotting options here, to give: {{ :analysis:nsb2014:module2_xvsy3.png?600 |}} The horizontal axis is still in Neuralynx's raw data units (us). ☛ Convert the Timestamps to seconds, and replot. If you look closely, you can spot some gaps in the data (times when no position data is plotted). ☛ (Optional exercise to test your MATLAB skills) Are these gaps because of (0,0) samples that have been removed? Or because there are no records in the data for those times? As you should have ascertained, there are in fact two short gaps in the data. These occur on purpose to separate behavior on the T-maze (when you can see the X coordinate changing as the rat runs) from the times when the rat is resting on the pedestal. In the Cheetah software this can be done by simply turning off Recording and then turning it back on. (Sneak preview: although doing this is helpful for some applications, it can be problematic for analyses that assume your data is continuous. We will encounter this when we start using the %%FieldTrip%% toolbox later.) ☛ Determine the video tracker sampling rate from the ''Timestamps'' variable. Watch out for gaps in the data! (Hint: the ''diff()'' function is useful here!) This concludes the introduction to Neuralynx video data. The other outputs of ''Nlx2MatVT'' are not used for typical analyses. ==== LFP data file (*.Ncs) loading ==== The Neuralynx loader for Ncs files is ''Nlx2MatCSC''. Use it thusly: <code matlab> clear all; fname = 'R042-2013-08-18-CSC05a.ncs'; [Timestamps, ~, SampleFrequencies, NumberOfValidSamples, Samples, Header] = Nlx2MatCSC(fname, [1 1 1 1 1], 1, 1, []); </code> ..and inspect the result: <code matlab> >> whos Name Size Bytes Class Attributes Header 33x1 5182 cell NumberOfValidSamples 1x17193 137544 double SampleFrequencies 1x17193 137544 double Samples 512x17193 70422528 double Timestamps 1x17193 137544 double fname 1x9 18 char </code> Now we get only 17193 Timestamps, a surprising number because it is substantially less than the number of video tracking timestamps we got (on the order of 10 times less), even though the video tracking data was only sampled at about 30 Hz, and this LFP data is supposed to be sampled at something like 2kHz! As it turns out, Neuralynx Ncs data is stored in blocks of 512 samples, with only the first sample of each block timestamped. Hence the [512 x 17193] size of Samples, which contains the actual time-varying voltage signal. This is not a very convenient format for plotting timestamps against voltage, the way we typically would like to do. This is one reason why we generally don't use these low-level loading functions, but instead //wrap// them in a function that is more user-friendly. These loading functions are discussed in the next section. For now, one more point about this data: ''Samples'' is not in units of volts, but on a scale internal to the Neuralynx system. To know how these "A-D bits" (analog-to-digital) correspond to real voltages, we need to look in the ''Header'': <code matlab> >> Header Header = '######## Neuralynx Data File Header' '## File Name C:\CheetahData\2013-08-18_09-06-16\CSC49.ncs' '## Time Opened (m/d/y): 8/18/2013 (h:m:s.ms) 9:6:36.546' '## Time Closed (m/d/y): 8/18/2013 (h:m:s.ms) 10:26:2.875' '' '-FileType CSC' '-FileVersion 3.3.0' '-RecordSize 1044' '' '-CheetahRev 5.6.3 ' '' '-HardwareSubSystemName AcqSystem1' '-HardwareSubSystemType DigitalLynxSX' '-SamplingFrequency 2000' '-ADMaxValue 32767' '-ADBitVolts 0.000000061037020770982053' '' '-AcqEntName CSC49' '-NumADChannels 1' '-ADChannel 80' '-InputRange 2000' '-InputInverted True' '' '-DSPLowCutFilterEnabled True' '-DspLowCutFrequency 1' '-DspLowCutNumTaps 0' '-DspLowCutFilterType DCO' '-DSPHighCutFilterEnabled True' '-DspHighCutFrequency 475' '-DspHighCutNumTaps 128' '-DspHighCutFilterType FIR' '-DspDelayCompensation Disabled' '-DspFilterDelay_µs 1984' </code> Aha, the ''-ADBitVolts'' entry gives us the conversion from the raw data to volts. Another reason to wrap this lowlevel function into something that does the conversion for us! As you can see, the header contains some other information, which will be discussed in more detail in later modules. ==== Event file (*.Nev) loading ==== ''*.Nev'' (**N**euralynx **Ev**ent) files contain timestamps of various task events. Use as follows: <code matlab> fn = FindFile('*Events.nev'); [EVTimeStamps, EventIDs, TTLs, EVExtras, EventStrings, EVHeader] = Nlx2MatEV(fn,[1 1 1 1 1],1,1,[]); </code> As before, all the ones in the function call make sure we load everything. In return, we get: <code matlab> >> whos Name Size Bytes Class Attributes EVExtras 8x462 29568 double EVHeader 12x1 1924 cell EVTimeStamps 1x462 3696 double EventIDs 1x462 3696 double EventStrings 462x1 103104 cell TTLs 1x462 3696 double fn 1x44 88 char </code> Each of the 462 events in this file has a timestamp (''EVTimeStamps'') and a description (''EventStrings'') as well as some other information we generally don't need. Let's inspect some of the ''EventStrings'': <code matlab> >> EventStrings(1:13) ans = 'Starting Recording' 'Stopping Recording' 'Starting Recording' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0020).' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0000).' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0020).' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0000).' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0080).' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0000).' 'TTL Output on AcqSystem1_0 board 0 port 0 value (0x0004).' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0080).' 'TTL Output on AcqSystem1_0 board 0 port 0 value (0x0000).' 'TTL Input on AcqSystem1_0 board 0 port 1 value (0x0000).' </code> The meaning of these cryptic strings depends on the specific experimental setup. "AcqSystem1_0 board 0 port 0" and "1" refer to connectors on the Neuralynx data acquisition mainbox, which can be hooked up to various experimental peripherals such as photobeams, levers, and pellet dispensers. In this session, Input/Output (I/O) Port 0 was configured as Output, controlling a pellet dispenser and a valve (for sucrose solution delivery). Port 1 was set to be an Input, receiving inputs from three photobeams (one on the central stem of the maze, and one each for each reward site on either end of the maze arms). The ''EventStrings'' above refer to the status of an I/O port, represented as a [[http://en.wikipedia.org/wiki/Hexadecimal|hexadecimal number]] (indicated by the prefix "0x"). The activation of each peripheral is associated with a unique number. As with the previous low-level loading functions, the Neuralynx loader does not provide us directly with what we want. We'd like a loader that just gives us the times for the events we are interested in, without us having to figure out what hexadecimal number they correspond to and then pull out the matching times. These wrapped loaders will be introduced below. ☛ (Optional exercise to test your %%MATLAB%% skills) Find out which ''EventString'' corresponds to which input or output (food pellet reward on left arm, sucrose water reward on right arm, left reward photobeam, right reward photobeam, central stem photobeam) by plotting the location of the animal at the time of each event. Hint: example pseudocode for a nice approach to find this out would look like the following: <code> get list of unique event strings to process -- unique() for each event string find indices of events that match current event string -- strncmp() get timestamps for matched events find indices of position timestamps that are closest in time -- nearest_idx() get x and y coordinates of closest timestamps plot x and y coordinates on top of position plot end </code> ===== Using the wrapped data loaders ===== You have already seen examples of TSD and TS data types returned by some loading functions. The full set used for Neuralynx data in this course follows below. You will notice that each loading function takes in a ''cfg'' ("configuration") variable, which is used to specify parameters and options such as the filenames to be loaded. This use of ''cfg'' variables is shared by many other vandermeerlab data analysis functions (as well as those in the %%FieldTrip%% toolbox), and is highly encouraged when you start writing your own code: it encourages well-organized code and enables provenance tracking, two principles of [[analysis:nsb2015:week0|good programming practice]]. To find out what cfg options are used by a given function, use the ''help'' (or ''doc'') function on each data loader, e.g. ''doc LoadCSC''. Some functions will run using default options when you pass an empty ''cfg'' (''[]''), whereas others will require you to input something. ==== LoadPos() ==== This loads raw Neuralynx position data (*.nvt). If no filename is specified in the input cfg, ''LoadPos()'' checks if a single .Nvt file is found in the current directory and loads that one: <code matlab> >> posdata = LoadPos([]); % note empty config LoadPos.m: 100.00% of samples tracked. >> posdata posdata = type: 'tsd' tvec: [1x131898 double] data: [2x131898 double] label: {'x' 'y'} cfg: [1x1 struct] </code> Note that ''LoadPos()'' provides some basic information on the quality of the data (percentage of samples tracked) -- consistent with the [[analysis:nsb2015:week0|"garbage in, garbage out" principle]], this helps ensure that you are aware of any potential issues at the raw data stage. Because the .Nvt files are large, it is often convenient to save this ''posdata'' variable as a .mat file. This should be named ''Rxxx-yyyy-mm-dd-vt.mat''. Note that the ''data'' field now has dimensionality [**2** x nSamples]; this is because there is both x and y data as indicated by the ''label'' field. So, if you wanted to plot x against y, you could do ''plot(posdata.data(1,:),posdata.data(2,:),'.');'', but a more general approach that doesn't require knowing which variable is which dimension is ''plot(getd(posdata,'x'),getd(posdata,'y'),'.');''. ==== LoadCSC() ==== To load a ''.Ncs'' file, containing sampled data (a hippocampal local field potential in this case): <code matlab> cfg = []; % starting with an empty config is good practice -- that way you avoid carryover of previous values! cfg.fc = {'R042-2013-08-18-CSC05a.ncs'}; csc = LoadCSC(cfg); </code> This gives the following struct of type ''tsd'': <code matlab> >> csc csc = type: 'tsd' tvec: [8802816x1 double] data: [1x8802816 double] label: {'R042-2013-08-18-CSC05a.ncs'} cfg: [1x1 struct] </code> Note that the format is the same as for the position data above; this is because both ''LoadPos()'' and ''LoadCSC()'' return TSDs. ''LoadCSC()'' outputs some information about the files being loaded; in particular the number of "bad blocks". These will be explored in Module 3 (short version: bad blocks indicate a problem with the recording system and should be fixed). Finally, the ''cfg'' field has the %%ExpKeys%%, the %%SessionID%% (''R042-2013-08-18''), the headers (''.hdr'') for each .Ncs file, and the ''history''. ==== LoadEvents() ==== By default, ''LoadEvents()'' returns a TS with the labels and timestamps of all unique strings found in the %%EventStrings%%: <code matlab> >> evt = LoadEvents([]) evt = type: 'ts' t: {1x9 cell} label: {1x9 cell} cfg: [1x1 struct] </code> ''evt.label(:)'' will reveal the familiar list of events introduced above. However, by using the cfg file, we can get something more specific: <code matlab> %% cfg = []; cfg.eventList = {'TTL Output on AcqSystem1_0 board 0 port 0 value (0x0004).','TTL Output on AcqSystem1_0 board 0 port 0 value (0x0040).'}; cfg.eventLabel = {'FoodDelivery','WaterDelivery'}; evt = LoadEvents(cfg) evt = type: 'ts' t: {[1x9 double] [1x9 double]} label: {'FoodDelivery' 'WaterDelivery'} cfg: [1x1 struct] </code> By specifying which %%EventString%% is associated with which human-readable event ('FoodDelivery','WaterDelivery') we now have a more user-friendly events variable. Of course, this requires knowing how these events map onto the event codes (given here in ''cfg.eventList'') generated by the system. Make sure that you know what the event codes generated by your system mean! ==== LoadSpikes() ==== LoadSpikes() loads spike trains (times of action potentials) in ''*.t'' files. By default, it loads all such files: <code matlab> >> S = LoadSpikes([]) S = type: 'ts' t: {1x67 cell} label: {1x67 cell} cfg: [1x1 struct] usr: [1x1 struct] </code> As you can see, this loaded spike data from 67 neurons. The ''usr'' field by default contains the tetrode number from which each spike train was recorded; this behavior can be disabled by setting ''cfg.getTTnumbers = 0''. If you wish to load *._t files (containing spikes from neurons of questionable cluster quality), do ''cfg.load_questionable_cells = 1;''. See the function documentation for further options. ==== Other ==== The other files of interest are all MATLAB ''.mat'' files which can be loaded directly using the ''load()'' function. ===== Putting it all together ===== Here are two examples that illustrate some simple operations you are now equipped to do. You should run them and make sure you understand what is happening -- how raw data is [[analysis:nsb2015:week0|transformed by some simple steps]]: <code matlab> %% example 1: use of restrict() LoadMetadata; pos = LoadPos([]); left_pos = restrict(pos,metadata.taskvars.trial_iv_L); % left trials only plot(getd(left_pos,'x'),getd(left_pos,'y'),'.'); % looks like right trials! camera reverses image, can fix with set(gca,'YDir','reverse') %% example 2: interplay between tsd and iv data LoadExpKeys; please = []; please.fc = ExpKeys.goodSWR(1); % local field potential with good "sharp wave-ripple" events lfp = LoadCSC(please); % aacarey is Canadian and asks nicely; cfg name is just arbitrary % detect possible artifacts cfg = []; cfg.method = 'zscore'; % first normalize the data cfg.threshold = -8; cfg.minlen = 0; % no minimum length on events to detect cfg.dcn = '<'; % detect intervals with z-score lower than threshold artifact_iv = TSDtoIV(cfg,lfp); % creates iv with start and end times of possible artifacts % plot detected intervals cfg = []; cfg.display = 'tsd'; % also try 'iv' mode! PlotTSDfromIV(lfp,artifact_iv,lfp) </code> ☛ (Optional exercise to test your understanding) Use the function ''IntersectIV()'' to only keep potential artifacts that occur when the rat's x-position is larger than 300. NOTE: if you are on a Mac, you might get an error related to the ''nearest_idx3'' function; you can fix that for now by changing this to ''nearest_idx'' in the ''PlotTSDfromIV'' function. ===== Challenge ===== ★ If you have your own data which is not from a Neuralynx system, there is some work to do! Write loading functions for your system's data that output ''ts'', ''tsd'' and/or ''iv'' data as appropriate. As a first step, you should find out if your system's manufacturer provides loading functions for MATLAB, similar to those from Neuralynx discussed above. If so, you are in luck, and by trying out those functions you should be able to figure out how it stores the data and output the data in the right formats. ★ Should you get to the point where you have written loading functions for your data, it is important to consider what data integrity checks or other tests you may wish to do. This is very important to maximize the possibility of detecting potential garbage at the source before it can cause havoc. In this module, you have already seen several examples of such checks: ''LoadCSC()'' reports the number of bad blocks detected (and corrects for them) and ''LoadPos()'' handles zero position samples. Make a list of things that can go wrong with your data, and how you would test for, report (and potentially fix), them!

wiki

User Tools

Site Tools

Sidebar

Page Tools