### Sidebar

Reference

analysis:course:week3

## Anatomy of neural data: sampling, aliasing

Goals:

• Understand the fundamentals of sampling theory: Nyquist frequency, Nyquist rate, aliasing, anti-aliasing
• Gain an awareness of the different ways of representing data (signed, unsigned, integer, floating-point)
• Examine the file structure of Neuralynx continuously sampled data, in all its gory detail
• Get familiar with object-oriented programming in MATLAB

Deliverables:

• An improved version of LoadCSC()

Resources:

## Introductory remarks

Systems for the acquisition of neural data give us a necessarily imperfect view of the brain. Some of these limitations are obvious, such as a finite number of recording sites and contamination of the data with movement artifacts and system noise. Other limitations are perhaps more subtle, and include the pitfalls associated with limited (temporal) sampling of the underlying process and the properties of specific recording systems.

An important phrase to remember in any data analysis workflow is “garbage in, garbage out.” Making sure that your input is not garbage starts with a thorough understanding of the recorded signal and how it is stored. (Of course, if you abuse it enough, even the best data can rapidly turn to garbage. But that is beyond the scope of this module, which deals with the raw data.)

## Step-by-step

### Experiments in aliasing

Subsampling is common in neural data analysis; for instance, the lab's Amplipex system records data sampled at 20kHz, but for some analyses we are only interested in frequencies in the 1-200Hz range. Thus it would be unnecessarily slow to keep 20kHz data around: at this rate, even a basic 64-channel dataset easily exceeds 6GB in size! So, how do we go about subsampling correctly? This section explores the key issue of aliasing: the appearance of “phantom” low-frequency signals through the subsampling of frequencies above the Nyquist frequency. See the Leis chapter for a thorough discussion of this.

We begin by creating a simple 100Hz signal, sampled at 2kHz:

fs1 = 2000;
tvec = 0:1/fs1:4; % construct time axis, sampled at 2kHz

freq1 = 100;
y = sin(2*pi*freq1*tvec); % 100Hz signal

ax1 = subplot(211);
stem(tvec,y); title('original');

Say we wish to subsample down to 500Hz to save space. Naively, we might simply take every 4th sample (in fact, this is what the MATLAB function downsample() does):

subsample_factor = 4;

tvec2 = tvec(1:subsample_factor:end); % take every 4th sample
y2 = y(1:subsample_factor:end);

ax2 = subplot(212);
stem(tvec2,y2,'r'); title('subsampled');
xlabel('time (s)');

It's hard to see what is going on when we are so zoomed out. Let's use the axes handles introduced last week to rectify this:

xl = [1 1.04];
set(ax1,'XLim',xl); set(ax2,'XLim',xl);

That's better. Notice that in the subsampled plot, you can detect the presence of the 100Hz signal by noticing that each sample repeats (has the same value) every 10ms. However, it doesn't look much like the original – as discussed in the Leis book, we can attempt to reconstruct it:

hold on;

y_interp = interp1(tvec2,y2,tvec,'linear');
p1 = plot(tvec,y_interp,'b');

y_interp2 = interp1(tvec2,y2,tvec,'spline');
p2 = plot(tvec,y_interp2,'g');

legend([p1 p2],{'linear','spline'},'Location','Northeast'); legend boxoff

Note carefully the operation of interp1(): this is one of the most frequently used MATLAB functions in signal processing. Calling interp1() here is like saying: I know the values of y2 at times tvec2, please estimate the values of y2 at times tvec.

You should now see this:

As you can see, the “spline” interpolation method recovers the original signal very well. (Apart from “linear” interpolation, there is also “nearest” which often comes in handy.)

Note also the use of the plot object handles p1 and p2 (which you already know you can use to change line properties; see last week) as inputs to legend() (MATLAB doc page).

Now, suppose we were oblivious to the idea of the Nyquist rate and we attempted to subsample to 200Hz, expecting to see our original 100Hz signal.

☛ Modify the subsampling code above to use a subsample_factor of 10 and plot the result.

You should see the following:

Oops!

What is going on here? First, it is obvious we are not seeing a 100Hz signal. Second, what are these weird small numbers (note the scale)?

This is actually a consequence of the finite precision MATLAB uses in representing numbers such as pi. It is important to be aware of this, otherwise you may be extremely puzzled:

sin(2*pi) == 0 % ...right? The answer might surprise you.
fprintf('Welcome to numerical computing!\n');

This page (from Loren on the Art of MATLAB, an excellent blog for MATLAB tips) and the MATLAB docs give the gory details, also discussed in the Leis chapter. For the moment we will not pursue this further and get back to aliasing.

But wait, wasn't the idea that sampling at the Nyquist frequency should be sufficient? Let's try again:

figure(2)
subsample_factor = 10;

tvec2 = tvec(2:subsample_factor:end); % best case scenario -- can detect 100Hz signal
y2 = y(2:subsample_factor:end);

subplot(212)
stem(tvec2,y2,'r');
set(gca,'XLim',xl);

OK, so here is our 100Hz signal, but as illustrated above, we might miss it if we happen to sample at an unlucky phase. Best to not cut it this fine, also for other reasons that will become clear further on in this module. We first wonder, is this even an acceptable way of subsampling, anyway? Let's do another test:

%% 2kHz Fs, 100Hz signal with 450Hz signal superimposed
figure(3)

fs1 = 2000;
tvec = 0:1/fs1:4;

freq1 = 100;
freq2 = 450; % note, above Nyquist frequency for our target subsampled Fs

y = sin(2*pi*freq1*tvec) + 0.5.*sin(2*pi*freq2*tvec);

subplot(211)
stem(tvec,y)
set(gca,'XLim',xl);

Suppose we are aware of Nyquist, but decide we don't care about this 450Hz component in the signal, so we go ahead and subsample down to 500Hz again:

%% ss -- we don't care about the 450Hz signal, but...
subsample_factor = 4;

tvec2 = tvec(1:subsample_factor:end);
y2 = y(1:subsample_factor:end);

subplot(212)
stem(tvec2,y2,'r');
set(gca,'XLim',xl);

The result:

☛ Inspect the subsampled signal. How does it compare to the previous result where we subsampled the pure 100Hz signal without the 450Hz component?

You should notice some points don't have a 100Hz repeat, so some other components crept in. This is aliasing, and is very dangerous: you might excitedly conclude you discovered a novel slower frequency in your favorite brain region and try to publish a paper about it. Don't do it!

So, what is the correct way? Given that subsampling is such an ubiquitous operation, MATLAB does the hard work for us.

☛ Use the decimate() function to subsample down to 500Hz. Inspect the difference with the incorrect method (in red). What do you notice?

In fact, decimate() first filters out frequency components that may cause aliasing. This is known as an anti-aliasing filter. Always remember to use this, and not downsample() or a variant thereof, when you want to downsample your data.

### Neuralynx data in detail

An idealized (neural) signal consists of samples acquired at exactly the specified sampling frequency, with no gaps or missing data. Real data, unfortunately, does not always meet this specification. This section illustrates some flaws typical for the Neuralynx system that may affect subsequent analysis.

#### Using the low-level loader to get the raw data

To get into the guts of actual Neuralynx data, we will not use the sanitized wrapper provided by LoadCSC() but instead use the loading function provided by Neuralynx. Using cell mode in a sandbox file as usual, cd into the R016-2012-10-08 data folder you downloaded previously in Week 1. Then deploy the Neuralynx loader:

%%
% cd to your location here
fname = 'R016-2012-10-08-CSC03b.ncs';
[Timestamps, ~, SampleFrequencies, NumberOfValidSamples, Samples, Header] = Nlx2MatCSC(fname, [1 1 1 1 1], 1, 1, []);

Note the use of the ~ output argument, which indicates that we do not wish to assign the second output of the Nlx2MatCSC function to a variable.

Most data files come with a “header” that describes some properties of the data. Let's look at the header of our file, which is a LFP recorded from the ventral striatum (nucleus accumbens core) of a rat running back and forth along a linear track:

>> Header

'## File Name C:\CheetahData\2012-10-08_11-24-30\CSC10.ncs'
'## Time Opened (m/d/y): 10/8/2012  (h:m:s.ms) 11:25:2.79'
'## Time Closed (m/d/y): 10/8/2012  (h:m:s.ms) 12:32:9.61'
'-CheetahRev 5.5.1 '
''
'-AcqEntName CSC10'
'-FileType CSC'
'-RecordSize 1044'
''
'-HardwareSubSystemName AcqSystem1'
'-HardwareSubSystemType DigitalLynxSX'
'-SamplingFrequency 2000'
''
'-InputRange 1500 '
'-InputInverted True'
'-DSPLowCutFilterEnabled True'
'-DspLowCutFrequency 1'
'-DspLowCutNumTaps 0'
'-DspLowCutFilterType DCO'
'-DSPHighCutFilterEnabled True'
'-DspHighCutFrequency 425'
'-DspHighCutNumTaps 128'
'-DspHighCutFilterType FIR'
'-DspDelayCompensation Disabled'
'-DspFilterDelay_µs 1984'

>> 

You should recognize at least a few of the phrases used: for instance, we see that the sampling frequency (often abbrevi