**Reference**

HowToGuides

Manuals

LabAlumni

DataAnalysis

Advice for...

**Admin**

analysis:course-w16:week5

Goals:

- Refresh your memory of basic trigonometry and become comfortable plotting periodic functions in MATLAB
- Understand the central idea of the Fourier transform and implement your own decomposition and reconstruction code
- Appreciate the fundamental issue of spectral leakage, and how analysis parameters and signal properties affect it
- Gain a basic familiarity with different methods for spectral estimation, and use some nonparametric estimators to obtain power spectral densities (PSDs) for some interesting neural signals

Resources:

- (background) Chapter 4 from Weeks, “Digital Signal Processing Using MATLAB & Wavelets”
- (background) Chapter 6 from Weeks, “Digital Signal Processing Using MATLAB & Wavelets”
- (optional) MATLAB's guide to spectral analysis

From very early electrical recordings of the human brain (e.g. Berger, 1929) it has been clear that *oscillations* are a salient feature of brain activity. The functional relevance, underlying mechanisms and clinical applications of oscillatory activity continue to be active areas of research; all of these endeavours rely critically on accurate characterizations and quantifications of the data. A fundamental tool in the analysis of oscillations is **Fourier analysis**, which is introduced in this module.

To study oscillations we require *periodic* functions of time, such as `sin(t)`

, which repeat its values in regular intervals.

Recall that to plot this sort of function in MATLAB, we first define the time axis (commonly a variable with the name `tvec`

), pass this to the function we wish to plot as an argument, and plot the result:

%% plot a simple sinusoid Fs = 100; % in samples per second (Hz) t0 = 0; t1 = 1; % start and end times tvec = t0:1./Fs:t1; % construct time axis f = 2; % frequency of sine to plot y = sin(2*pi*f*tvec); % note sin() expects arguments in radians, not degrees (see also ''sind()'') stem(tvec,y);

Apart from its frequency (2Hz in the above example), a sinusoid is further characterized by its *phase* and *amplitude*, which are readily manipulated as follows:

phi = pi/2; subplot(221) y = sin(2*pi*f*tvec + phi); % a phase shift stem(tvec,y); hold on; plot(tvec,cos(2*pi*f*tvec),'r--','LineWidth',2) % notice, cosine is simply phase shifted sine legend('sin (phase-shifted)', 'cos'); a = 2; subplot(222) y = a.*sin(2*pi*f*tvec + phi); % amplitude change stem(tvec,y); % note scale of y axis!

As an aside, sinusoids are the basis for wireless transmission, such as that for FM radio, where a message is encoded in the frequency modulation of a “carrier” signal:

%% frequency modulation (FM) signal f2 = 10; m = 2; subplot(311) s1 = sin(2*pi*f*tvec); plot(tvec,s1); title('message'); subplot(312); s2 = sin(2*pi*f2*tvec); plot(tvec,s2); title('carrier'); subplot(313); s3 = sin(2*pi*f2*tvec + m.*sin(2*pi*f*tvec - pi/2)); plot(tvec,s3); title('FM signal');

Notice how the sinusoid in the bottom plot smoothly modulates its frequency. Many oscillations in the brain, such as hippocampal theta and ventral striatal gamma, show such modulations, referred to “cross-frequency coupling”: in this example, the phase of one (slower, “message”) signal is related (“coupled”) to the frequency of another (faster, “carrier”) signal.

What happens if we sum some sinusoids together? A compact representation for this is a *harmonic series*:

$$S(t)=\sum_{n=1}^{N}{a_{n} cos(2 \pi n f_{0} t+\phi_{n})}$$

Note that each sinusoid of this series has a frequency that is an integer multiple $n$ of some base frequency $f_0$. This is a compact representation because we only specify the amplitude $a_n$ and phase $\phi_n$ for each of the series, with the frequencies fixed.

Figure 4.7 in the Weeks chapter is a sum of four sinusoids, with a base frequency of 2:

%% harmonic series example mag = [0.1 0 1.3 0.5]; % magnitudes for each term pha = [-pi/6 0 pi 2*pi/3]; % phases for each term f = 2; % base frequency signal_out = zeros(size(tvec)); for ii = 1:numel(mag) % note, the book chapter uses i, not best practice! this_signal = mag(ii)*cos(2*pi*f*ii*tvec + pha(ii)); plot(tvec,this_signal,'r:'); hold on; signal_out = signal_out + this_signal; % build the sum end plot(tvec,signal_out,'LineWidth',2);

☛ Why is it not a good idea to use the variable name `i`

? (Hint: for the same reason you should not define a variable named `pi`

…)

It looks like we can create some interesting signals (blue) by summing simple sinusoids (thin red lines)!

In fact, the central insight underlying Fourier analysis is that we can use a sum (series) of sinusoids to approximate *any* signal to arbitrary precision. An important corollary to this is that any signal can be *decomposed* into a series of sinusoids. Let's demonstrate this.

We will use the MATLAB function `fft()`

(for **F**ast **F**ourier **T**ransform; a technical point discussed in the Weeks chapter is that this in fact computes something called the Discrete Fourier Transform or DFT) to obtain the amplitudes and phases ($a_n$ and $\phi_n$ in the equation above) of a randomly generated signal, and then piece the signal back together by plugging them in to a harmonic series:

%% rng('default'); % reset random number generator to reproducible state, so your plot will look like mine! x = round(rand(1,8)*10); % generate a length 8 vector of integers between 0 and 10 xlen = length(x); % get magnitudes and phases of Fourier series X = fft(x); Xmag = abs(X); % magnitudes, a_n Xphase = angle(X); % phases, phi_n n = 0:xlen-1; t = 0:0.05:xlen-1; % a finer timescale to show the smooth signal later for iH = xlen-1:-1:0 % reconstruct each harmonic s(iH+1,:) = Xmag(iH+1)*cos(2*pi*n*iH/xlen + Xphase(iH+1))/xlen; sm(iH+1,:) = Xmag(iH+1)*cos(2*pi*t*iH/xlen + Xphase(iH+1))/xlen; % detail: xlen appears here because the fundamental frequency used by fft() depends on this end ssum = sum(s); % coarse timescale (original points) smsum = sum(sm); % fine timescale (to see full signal) figure; plot(n, x, 'go', t, smsum, 'b', n, ssum, 'r*'); legend({'original','sum - all','sum - points only'});

You should get:

Notice that the reconstruction is essentially perfect. We retrieved the original 8 points by plugging in the coefficients returned by `fft()`

into a series of sinusoids.

☛ What is the purpose of running the `for`

loop backwards in the above code? (Hint: recall that MATLAB slows down substantially if it needs to allocate memory for expanding variables.)

The set of magnitudes and phases that describe the harmonic series that reconstruct the signal are known as the *magnitude spectrum* and *phase spectrum* respectively. The square of the magnitude spectrum is referred to as the signal *power* at different frequencies.

You might have noticed the magnitude and phase spectra above have a peculiar form. Let's explore this a little further using a signal with known frequency and phase content:

Fs = 20; % in samples per second (Hz) t0 = 0; t1 = 1; % start and end times tvec = t0:1/Fs:t1-(1/Fs); % construct time axis; generate exactly 20 samples f = 2; % signal frequency y = sin(2*pi*f*tvec); % construct signal, a 2Hz sine wave sampled at 20Hz for 1s yfft = fft(y,length(y)); yfft_mag = abs(yfft); yfft_ph = angle(yfft); stem(yfft_mag)

The result:

Some issues are apparent:

`fft()`

did not return a frequency axis, so the output is not straightforward to interpret.- Whatever the coefficients returned, there is not a single peak as we would expect from the pure input signal: in fact there are two peaks.

To understand this, recall that the largest frequency that can be detected in a signal sampled at `Fs`

is `Fs/2`

, the *Nyquist frequency*. Thus, we would expect the frequency axis from `fft()`

to go from 0 to `Fs/2`

at most. In addition, it turns out the Fourier transform is defined for *real* as well as for *complex* signals, and it returns spectra for both these components. Since we are only interested in real signals, we get a symmetric spectrum back (note that if we did not use `abs()`

on the output of `fft()`

we would get some imaginary components as well).

To construct a frequency axis that takes both these ideas into account, we can do:

Npoints = length(y); F = [-Npoints/2:Npoints/2-1]./Npoints; % construct frequency axis yfft_mag = fftshift(yfft_mag); % align output, see note below stem(F,yfft_mag); xlabel('Frequency (Fs^{-1})');

`fftshift()`

cuts the second (complex) half of the spectrum and pastes it backwards at the beginning, so that our frequency axis is now correct; it is in units of `1 / Fs`

so 0.1 corresponds to the 2Hz we put in.

For analysis purposes we don't care about the complex part of the spectrum, and this is usually ignored to yield what is referred to as the “single-sided spectrum”. Because we would never actually use `fft()`

directly to estimate spectral content (see the section on spectral estimation below) we won't do more work to get rid of it now.

Notice also the superscript in the `xlabel`

: MATLAB can interpret basic LaTeX math symbols for figure text.

A final point about the output of `fft()`

is that it contains the magnitude spectrum at specific frequencies. For instance, we have the 0.1*Fs point, but not 0.125*Fs! This will become important later.

The above example was constructed nicely so that our signal contained exactly two full periods. This will not be true for real world signals. What happens if we don't have this perfect input?

☛ Change the `tvec`

variable above to contain one more sample, like so:

`tvec = t0:1/Fs:t1;`

This means that our signal is now no longer an integer number of periods. The resulting two-sided spectrum is:

Notice that:

- The peaks now appear at a frequency not exactly equal to the true frequency
- Other non-zero components have appeared

To explain (1), recall that `fft()`

evaluates the magnitudes of specific frequencies only. Inspection of the spectrum above indicates that we don't have a frequency bin that is exactly 0.1 (the true frequency of our signal). Let's fix that using the second, optional, argument of `fft()`

.

tvec = t0:1/Fs:t1; nPoints = [length(tvec) 64 256 1024]; for iP = 1:length(nPoints) % repeat fft with different numbers of points nP = nPoints(iP); subplot(2,2,iP); y = sin(2*pi*f*tvec); yfft = fft(y,nP); yfft_mag = abs(yfft); yfft_ph = angle(yfft); F = [-nP/2:nP/2-1]./nP; yfft_mag = fftshift(yfft_mag); plot(F,yfft_mag,'kx',F,yfft_mag,'k'); title(sprintf('%d point FFT',nP)); xlabel('Frequency (Fs^{-1})'); end

This gives:

As we increase the number of points to evaluate the FFT over, we get increased frequency resolution, and indeed the peaks of the spectrum converge to 0.1 as expected. Under the hood, this is in fact accomplished by *padding the input signal with zeros* before the DFT is computed; doing this does not change the spectral content (the zeros are not periodic), but allows a much longer harmonic series, with a smaller fundamental frequency, to be evaluated.

A typical value to use for the number of points to evaluate the FFT is the next power of 2 (after however many samples your signal contains). This is because part of what makes the FFT fast is that it can easily divide the signal in half. But non-power of 2 values also work.

☛ What happens if you try to evaluate the FFT using a number of points *smaller* than that in your signal?

As we increase the number of points to evaluate the FFT, we can obtain coefficients for frequencies of arbitrary precision. For this reason, the power spectrum is often referred to as *power spectral density* or PSD.

☛ Demonstrate that you get the same effect by zero-padding the original 21-point signal (rather than making the signal longer, as in the above example).

It is clear from the above figure that even if we increase the frequency resolution of our FFT with zero-padding, we still have an imperfect estimate of the true frequency content of our signal. In particular, the estimate around the true frequency has a nonzero width – this part of the magnitude spectrum is referred to as the *main lobe* – and we have nonzero spectral content for other frequencies as well (the *side lobes*).

To explore the source of this **spectral leakage**, let's make our signal longer while evaluating the FFT over the same number of points:

tvec = t0:1/Fs:t1-(1/Fs); nRepeats = [1 2 4 8]; nP = 1024; for iP = 1:length(nRepeats) subplot(2,2,iP); y = sin(2*pi*f*tvec); y = repmat(y,[1 nRepeats(iP)]); % repeat the signal a number of times yfft = fft(y,nP); yfft_mag = abs(yfft); yfft_ph = angle(yfft); F = [-nP/2:nP/2-1]./nP; yfft_mag = fftshift(yfft_mag); plot(F,yfft_mag,'kx',F,yfft_mag,'k'); title(sprintf('%d repeats',nRepeats(iP))); xlabel('Frequency (Fs^{-1})'); end

The result:

Notice that the magnitude spectrum converges to the true frequency as we increase the length of the signal. However, for any signal of finite length, there will always be some spectral leakage. This occurs because we are effectively cutting off the signal abruptly; other than changing the length of the signal (which is not always an option) we can attempt to minimize spectral leakage through using less abrupt cutoffs, discussed in the next section.

Computing a FFT over a window of finite size is as if we are taking a finite-size window (for instance ` w(n) = 1 if 0 ⇐ n ⇐ N; 0 otherwise`

; note that this defines a *rectangular* window) and multiplying this with a hypothetical infinite signal.

It turns out that the spectrum of the windowed signal equals the *convolution* of the signal's spectrum and the window's spectrum; this occurs because of the convolution theorem that states that multiplication in the time domain equals convolution in the frequency domain. (If the idea of convolution is new to you, you can think of it as a kind of “blurring”: for instance, convolving a signal with a 5-point rectangular pulse is equivalent to taking the running average with a 5-point moving window. For a graphical exploration of this idea, see this website or look at the MATLAB doc for the `conv()`

function.)

Thus, it becomes important to understand the spectral properties of the windows used for Fourier analysis. Let's plot a few:

nP = 25; nPFFT = 1024; windows = {'rectwin','triang','hamming','hanning','blackman'}; cols = 'rgbcmyk'; for iW = 1:length(windows) eval(cat(2,'wn = ',windows{iW},'(nP);')); % make sure you understand this wn = wn./sum(wn); subplot(211); % plot the window plot(wn,cols(iW),'LineWidth',2); hold on; subplot(212); yfft = fft(wn,nPFFT); yfft_mag = abs(yfft); yfft_ph = angle(yfft); F = [-nPFFT/2:nPFFT/2-1]./nPFFT; yfft_mag = fftshift(yfft_mag); h(iW) = plot(F,yfft_mag,cols(iW),'LineWidth',2); hold on; end xlabel('Frequency (Fs^{-1})'); legend(h,windows);

☛ What does the `eval()`

statement in the above code do?

The result:

Inspection of the magnitude spectra of the different windows shows that although the rectangular window has the narrowest mainlobe (a desirable characteristic; narrower means a better estimate of the true underlying frequency) it also has the largest sidelobes (an undesirable characteristic; these sidelobe frequencies are not actually present in the signal!). As you can see, different windows represent different tradeoffs between these properties.

☛ Change the y axis to log scale to better visualize the differences (recall that you can use `get(gca)`

to access the properties of the current axes).

☛ Modify the code in the Spectral Leakage section, above, to use a Hamming window instead of a rectangular window. Verify that the sidelobes are now smaller, at the expense of a slightly wider mainlobe.

Note that the integral of the windows should be 1, to preserve power estimates, as guaranteed by the normalization statement in the code above.

The above method of spectral estimation, obtaining the Fourier coefficients by applying the DFT to the entire data set, is also referred to as constructing a *periodogram*. MATLAB has a function for this:

[Pxx,F] = periodogram(y,[],nP,Fs); plot(F,Pxx); xlabel('Frequency (Hz)');

Note that this plots the *one-sided* spectrum, and that the units on the frequency axis are now in Hz. In addition, rather than plotting the magnitudes of the sinusoid components as before, we now have *power*, which is simply the magnitude squared.

It is easy to change the window:

hold on; [Pxx,F] = periodogram(y,hanning(length(y)),nP,Fs); plot(F,Pxx,'r');

Note again the trading off of mainlobe width against sidelobe magnitude.

It turns out that the periodogram, although usable, is not a very good spectral estimator in some ways; it is *biased*: its variance does not tend to zero as data length goes to infinity (details). This makes later statistical comparisons difficult, and the spectrum look noisy, as we'll demonstrate in a later section.

A useful approach to address these issues is to cut the signal up into smaller segments, estimate the spectrum for each segment, and combine the estimates, a process known as Bartlett's method. An example of a spectral estimator that uses this approach is *Welch's method*, which uses segments (or windows) that can overlap.

Type `help pwelch`

and read the overall description of what this function does. Based on the discussion above, this should make sense.

Here is an illustration of how `pwelch()`

performs:

Fs = 20; % in samples per second (Hz) t0 = 0; t1 = 1; f = 2; nRepeats = 4; tvec = t0:1/Fs:t1-(1/Fs); nP = 1024; y = sin(2*pi*f*tvec); y = repmat(y,[1 nRepeats]); [Pxx,F] = periodogram(y,rectwin(length(y)),nP,Fs); plot(F,Pxx); hold on; wSize = 40; [Pxx,F] = pwelch(y,rectwin(wSize),wSize/2,nP,Fs); plot(F,Pxx,'r'); xlabel('Frequency (Hz)');

Note that the Welch spectrum sacrifices some frequency resolution, because of the smaller window that results from cutting up the data, but our estimate is now more robust.

We are almost ready to apply these methods to some real data. However, we first need to be aware of an issue that often comes up with real data. `fft()`

and tools that rely on it, such as the spectral estimators in the previous section, assume that the data is sampled at evenly spaced intervals. However, this is only approximately true for Neuralynx data: chunks of 512 samples are sampled at a regular interval (`Fs = 2000`

) but the interval between chunks is slightly larger than within, such that the overall Fs is slightly smaller than 2000. Because this difference is tiny, it will not affect our spectral estimate much. However if the difference becomes large our spectrum will be wrong. For instance:

Fs = 20; % in samples per second (Hz) t0 = 0; t1 = 1; f = 2; nP = 1024; gaps = [5 10 15]; % idx of samples to be removed tvec = t0:1/Fs:t1;%-(1/Fs); y = sin(2*pi*f*tvec); subplot(211) plot(tvec,y,'k*'); hold on; yfft = fft(y,nP); yfft_mag = abs(yfft); yfft_ph = angle(yfft); F = [-nP/2:nP/2-1]./nP; yfft_mag = fftshift(yfft_mag); subplot(212); plot(F,yfft_mag,'k-x'); hold on; xlabel('Frequency (Fs^{-1})'); % signal with gaps y = sin(2*pi*f*tvec); y2 = y; y2(gaps) = []; tvec(gaps) = []; % remove subplot(211); plot(tvec,y2,'bo'); hold on; yfft = fft(y2,nP); yfft_mag = abs(yfft); yfft_ph = angle(yfft); F = [-nP/2:nP/2-1]./nP; yfft_mag = fftshift(yfft_mag); subplot(212); plot(F,yfft_mag,'b-x'); legend('y','y2 (with gaps)')

Notice that the estimate for the data with some samples removed (in blue) is substantially lower than the true frequency.

In general, if you have unevenly spaced samples, you can consider:

- Interpolating the data on an evenly spaced timebase using a function such as
`interp1()`

. This is only safe if the frequencies you will be analyzing are of a much lower order than the variation in the sample spacing – the interpolation can introduce high-frequency components into the signal. - Using spectral estimators designed to work with uneven spacing, such as the Lomb-Scargle periodogram.

An extension of the uneven sampling problem is gaps in the data. This can occur if there is a technical problem with the recording system, if there are artifacts that have been removed, or simply if as part of your setup you naturally have gaps (e.g. intertrial intervals). Of course you may decide to cut up your data and analyze each segment (with even sampling) separately, much like Bartlett's method itself. However often it is convenient not to have to do this.

A reasonable solution to deal with gaps is to fill them with zeros, such that a constant `Fs`

is maintained. The zeros will introduce edge effects due to windowing. In addition, if there is temporal structure in where the gaps appear this will also show up in the spectrum: for instance, imagine there was a repeating pattern of 100ms gap, followed by 100ms of data, then a strong 10Hz component would appear. Thus, if you are filling gaps in this way, it is prudent to check the *autocorrelation* of the times where gaps occur, to form an idea of where artifactual components might show up; we will treat this later in the course.

Let's begin by loading a ventral striatal LFP signal:

% cd to your R016-2012-10-08 folder cfg = []; cfg.fc = {'R016-2012-10-08-CSC04d.ncs'}; csc = LoadCSC(cfg);

This dataset contains a baseline (rest) recording followed by two sessions on the track (a “value” and a “risk” session) and another rest recording.

Let's focus on the first rest segment, using the `ExpKeys`

to restrict the data:

% restrict to prerecord, leaving some time (10s) before rat actually goes on track csc_pre = restrict(csc,0,csc.cfg.ExpKeys.TimeOnTrack(1)-10);

Check if we have regular sampling with no gaps:

% check if sampling is ok plot(diff(csc_pre.tvec)); % only minimal differences Fs = 1./mean(diff(csc_pre.tvec));

The slightly odd structure of the diffs is because of Neuralynx's 512-sample-per-block format. So, technically we don't have uniformly spaced samples, but it's close enough that we don't have to bother with interpolating.

Let's decimate to speed up subsequent processing:

dsf = 4; csc_pre.data = decimate(csc_pre.data,dsf); csc_pre.tvec = downsample(csc_pre.tvec,dsf); csc_pre.cfg.hdr{1}.SamplingFrequency = csc_pre.cfg.hdr{1}.SamplingFrequency./dsf;

(Note that the above is a quick hack of something that should really be implemented in a nice function, because the above commands really correspond to one conceptual operation of decimation, presenting an opportunity for making the code more readable and maintainable. But I haven't gotten around to that yet.)

☛ Now that you have decimated the data, also update your sampling frequency `Fs`

.

Now we can compute the spectrum:

wSize = 1024; [Pxx,F] = periodogram(csc_pre.data,hamming(length(csc_pre.data)),length(csc_pre.data),Fs); plot(F,10*log10(Pxx),'k'); xlabel('Frequency (Hz)'); ylabel('Power (dB)'); xlim([0 150]);

This gives:

Note that we are plotting not the raw power spectrum but rather the `10*log10()`

of it; this is a convention similar to that of the definition of the decibel (dB), the unit of signal power also applied to sound waves. The values on your vertical axis may look different, depending on what version of `LoadCSC()`

was used.

Regardless, it doesn't look very good! The estimates look very noisy, a characteristic of the periodogram method.

☛ Edit the above code to compute a Welch spectrum instead, using a Hamming window of the same size and 50% overlap. Much better!

This PSD exhibits a number of features characteristic for neural data:

- The sharp spike at 60Hz is
*line noise*, an artifact of the 60Hz oscillation in the North American alternating-current power supply. - The overall shape of the PSD is described as
`1/f`

, i.e. to a first approximation, the signal power is proportional to the inverse of the frequency. This is characteristic of many biological signals, see e.g. the Buzsaki (2006) book for a discussion of the possible significance of this. - This LFP is recorded from the ventral striatum, which has been reported to exhibit delta (~4Hz), theta (~7-9Hz), beta (~12-20Hz), and gamma (~40-100Hz) oscillations. These are are all apparent as “humps” protruding from the underlying 1/f shape.

☛ Compute the PSD of a LFP, simultaneously recorded with the signal above, but now from the hippocampus. The `ExpKeys`

specify the filename of a verified hippocampal LFP (in the `GoodSWR`

field, for “good sharp wave-ripple complexes”, characteristic of the hippocampus). How does it compare to the ventral striatal LFP?

☛ Compute the PSD of “white noise”, i.e. a random signal where each sample is drawn independently from the interval (0,1) with equal probability. Is it 1/f? How would you describe its shape? *Hint*: use the MATLAB function `rand()`

.

☛ For both LFPs, explore the effects of the window size parameter of the Welch power spectrum, making sure to vary it across an order of magnitude. What do you notice?

☛ Compare the PSD following the use of `decimate()`

as in the above example, to a PSD obtained from downsampling without using `decimate()`

. Are there any differences?

★ Compute PSDs for some piece of data you collected. Make sure you specify briefly in the code comments, or a short separate document, what this data describes, and its basic properties such as sampling rate and any filtering that may have been applied. When computing the PSD, systematically vary a few of the parameters (zero-padding, window size, window type). Do you feel some are more suitable than others? Why? Also note some of your conclusions or interpretations based on what the PSDs show.

analysis/course-w16/week5.txt · Last modified: 2018/07/07 10:19 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International

## Discussion