User Tools

Site Tools


analysis:course-w16:week11

~~DISCUSSION~~

Interactions between multiple signals: coherence, Granger causality, and phase-slope index

Goals:

  • Develop an intuition for what determines the coherence between two signals
  • Employ some different methods of estimating coherence to appreciate the tradeoffs involved
  • Understand the main limitation of the coherence measure: a lack of information about directionality
  • Grasp the concept of Granger causality in the context of autoregressive models
  • Explore cases where Granger causality can give inaccurate or misleading results, and learn how to detect such cases
  • Apply an alternative method, the phase-slope index, that can work in cases where G-causality fails

Resources:

  • (background reading, a brief review) Fries (2005) Communication through coherence paper
  • (optional, a nice example application) deCoteau et al. (2007) hippocampus-striatum coherence changes with learning
  • (technical background) Ding et al. (2006) theory of Granger causality and applications to neuroscience

Introduction

So far, our analysis has been limited to single local field potentials, or phrased more generally, univariate time series. In this module we consider basic methods for characterizing the relationship between simultaneously recorded LFPs. Two such signals could in principle be completely unrelated (independent) or display various forms of coordination, such as transient synchronization in a specific frequency band.

Characterizing and quantifying relationships between different signals, recorded from anatomically related areas in the brain, is an important tool in systems and cognitive neuroscience. Much evidence supports the idea that the effective flow of information along a fixed anatomical projection can be dynamically regulated, for instance by emphasizing bottom-up rather than top-down inputs in a task-related manner.

One possible mechanism for this routing of information is “communication through coherence” and its many variants (Fries, 2005) which propose that effective connectivity (i.e. the flow of information) depends on the degree to which two areas exhibit coherent oscillatory activity. In this module we define LFP coherence, explore its properties, and apply it to some example data. (For a brief review on what is meant by structural, functional and effective connectivity, see here).

Coherence is an inherently symmetric measure, i.e. it cannot distinguish whether A influences B or the other way around. To address the directionality question, we also explore Granger causality and phase slopes.

Coherence: definition

Let's start by considering how two oscillating signals may be related. (We will treat the more general case of non-periodic signals later.) One can imagine various possible relationships between the two, such as illustrated here (From Siegel et al. 2012):

Recall that oscillations of a given frequency are characterized by their amplitude and phase. The bottom left panel shows a case in which the amplitudes of two signals are correlated, as can be seen from the signal envelopes in red.

The top two panels show examples of two signals whose phases are related. The signals in the top left panel have the same phase at any given point in time; this situation is often referred to as synchrony. In the top right panel, the phases are not the same at every point, but there is a constant phase relationship (one signal is shifted relative to the other).

An intuitive definition of the coherence between two signals (at a given frequency) is the extent to which two signals display a consistent phase relationship. It is 0 if the phases are completely unrelated, and 1 if the phase relationship is identical at every time point. Thus, both top row panels show signals that are coherent. The zero phase shift “synchrony” in the top left is a special case of the more general idea of “coherence”.

☛ Are the two signals in the lower left panel coherent?

Diversion: the Wiener-Khinchin theorem

We would like to capture formally the coherence between two signals as illustrated above. To do so, it is useful to gain an intuition for another piece of Fourier theory, the Wiener-Khinchin theorem. This theorem (essentially) states that the Fourier transform of a given signal's autocorrelation corresponds to the Fourier transform of the signal itself. This can be illustrated by plotting the autocorrelation function of a periodic signal:

Fs = 500; dt = 1./Fs;
t = [0 2]; tvec = t(1):dt:t(2)-dt;
 
f1 = 8;
data1 = sin(2*pi*f1*tvec)+0.1*randn(size(tvec));
 
[acf,lags] = xcorr(data1,100,'coeff');
lags = lags.*(1./Fs); % convert samples to time
plot(lags,acf); grid on;

The autocorrelation has the same periodicity as the original signal (8Hz, so peaks 0.125s apart). So, its Fourier transform would result in a spectral decomposition with a strong 8Hz peak.

The key step underlying the formal definition of coherence is to take the Fourier transform, not of the autocorrelation function as above, but of the cross-correlation function between two signals.

If two signals have a consistent phase relationship at a given frequency, this will show up in the cross-correlation:

f2 = 8;
data2 = sin(2*pi*f2*tvec+pi/4)+0.1*randn(size(tvec)); % phase-shifted version of data1
 
[ccf,lags] = xcorr(data1,data2,100,'coeff'); % now a cross-correlation
lags = lags.*(1./Fs); % convert samples to time
plot(lags,ccf); grid on;

Note that the xcorr no longer has a peak at time lag zero, as is the case for an autocorrelation. Rather, the peak is offset by an amount corresponding to the phase difference between the two signals. The correlation at this phase offset is nearly 1, indicating a strong correlation; in other words, signal 1 is very similar to signal 2 some time later; the xcorr is periodic at 8Hz as above. The Fourier transform of this xcorr would thus have a strong coefficient for 8Hz, but the phase for this component would be different from the autocorrelation.

☛ Verify that changing the phase shift in data2 indeed changes the phase of the cross-correlogram.

The Fourier spectrum of the cross-correlation function is known as the cross-spectrum or cross-spectral density (csd).

Definition and example

Now we are ready for the formal definition of coherence (full name: “magnitude-squared coherence”) between two signals x and y:

$$ C_{xy} = \frac{\lvert{P_{xy}}\rvert^2}{P_{xx}P_{yy}} $$

that is, the cross-spectrum $P_{xy}$ normalized by the auto-spectra of the two signals. $|x|$ indicates the modulus (magnitude) of $x$, so for now we are ignoring the angle (phase) component of the cross-spectrum.

Let's see this definition in action:

figure;
subplot(221);
plot(tvec,data1,'r',tvec,data2,'b'); legend({'signal 1','signal 2'});
title('raw signals');
 
[Pxx,F] = pwelch(data1,hanning(250),125,length(data1),Fs);
[Pyy,F] = pwelch(data2,hanning(250),125,length(data1),Fs);
subplot(222)
plot(F,abs(Pxx),'r',F,abs(Pyy),'b'); xlim([0 100]);
xlabel('Frequency (Hz)'); ylabel('power'); title('PSD');
 
[Pxy,F] = cpsd(data1,data2,hanning(250),125,length(data1),Fs);
subplot(223)
plot(F,abs(Pxy)); xlim([0 100]);
xlabel('Frequency (Hz)'); ylabel('power'); title('cross-spectrum');
 
[acf,lags] = xcorr(data1,data2,100,'coeff');
lags = lags.*(1./Fs); % convert samples to time
 
subplot(224)
plot(lags,acf); grid on;
xlabel('time lag (s)'); ylabel('correlation ({\itr})'); title('xcorr');

Note that the cross-spectrum Pxy is computed by a special function, cpsd(), which takes the familiar arguments of window, overlap, nFFT, and Fs.

You should get:

Notice that the cross-spectrum has a clear peak at 8Hz as expected.

☛ What happens if you change the amplitude of one of the input signals? Make the amplitude of data1 twice as large.

As you can see, the cross-spectrum depends on the amplitude of the input signals. This is usually not what we want when analyzing brain signals, because the amplitude at any given time could depend on the electrical properties of our electrode and the precise recording location relative to a source of interest. Thus, coherence normalizes the cross-spectrum by the spectra of the individual signals.

☛ For the two signals of unequal amplitude, compute the coherence by normalizing the cross-spectrum (C = (abs(Pxy).^2)./(Pxx.*Pyy);). Verify that now there is no change in coherence when scaling data1 by a factor 2 as above.

This normalization means that coherence should theoretically be independent of signal amplitude. However, in practice we have noise to worry about: if the signal becomes small enough, the phases will be corrupted by noise.

☛ Find out how small you need to make data1 relative to the noise before a drop in coherence occurs.

Instead of computing the coherence manually from the cross-spectrum and the individual spectra, we can also use mscohere(), which takes the same arguments. An example follows in the next section. However, a useful property of cpsd() is that it can be used to obtain the phase of the cross-spectrum, i.e. the phase lag (or lead) between the two signals.

☛ Instead of plotting the modulus (abs() in the above), plot the angle (angle()) of the cross-spectrum. Verify that it recovers the phase lag used in generating the two signals.

☛ Important! Can (absolute) coherence be interpreted as evidence for a directional relationship such as “A leads B” or “A causes B”? What about the angle?

Properties of the coherence measure

Example 1

Thinking about coherence in terms of the cross-correlation between the signals is often helpful in interpreting coherence values. For instance, it can explain why two signals that have an 8Hz component in their power spectra are not necessarily coherent:

%% just verify some cases where we break the phase relationship
f = 0.5; % freq modulation (Hz) 
f2 = 8;
m = 4; % freq modulation strength
wsz = 250; % window size 
 
subplot(421)
s2 = data2;
plot(tvec,s2,tvec,data1); title('signal 1 - constant phase');
 
subplot(422)
s3 = sin(2*pi*f2*tvec + m.*sin(2*pi*f*tvec - pi/2)) + 0.1*randn(size(tvec));
plot(tvec,s3,tvec,data1); title('signal 2 - varying phase');
 
subplot(423)
[Ps2,F] = pwelch(s2,hanning(wsz),wsz/2,length(data2),Fs);
plot(F,abs(Ps2)); title('PSD');
 
subplot(424)
[Ps3,F] = pwelch(s3,hanning(wsz),wsz/2,length(data2),Fs);
plot(F,abs(Ps3)); title('PSD');
 
subplot(425)
[C,F] = mscohere(data1,s2,hanning(wsz),wsz/2,length(data1),Fs); % shortcut to obtain coherence
plot(F,C); title('coherence'); xlabel('Frequency (Hz)');
 
subplot(426)
[C,F] = mscohere(data1,s3,hanning(wsz),wsz/2,length(data1),Fs);
plot(F,C); title('coherence'); xlabel('Frequency (Hz)');
 
[acf,lags] = xcorr(data1,s2,100,'coeff');
lags = lags.*(1./Fs); % convert samples to time
 
subplot(427)
plot(lags,acf); grid on;
xlabel('time lag (s)'); ylabel('correlation ({\itr})'); title('xcorr');
 
[acf,lags] = xcorr(data1,s3,100,'coeff');
lags = lags.*(1./Fs); % convert samples to time
 
subplot(428)
plot(lags,acf); grid on;
xlabel('time lag (s)'); ylabel('correlation ({\itr})'); title('xcorr');

This should give something like:

In the left column we have two signals with a constant phase relationship, so the cross-correlation has large values (we can predict one signal from the other with high accuracy). In the right column, we have a frequency-modulated signal, such that the phase relationship with the reference (8Hz) signal is much more variable. Accordingly, the cross-correlation values are much smaller (note the scale) and therefore the coherence at 8Hz is much lower compared to the left side as well.

The coherence measure is subject to the same estimation tradeoffs and issues that we encountered previously for spectral estimation in general. These include sensitivity to window size and shape, and the number of windows used for averaging. As you can see from the above plot, our coherence measure looks quite noisy.

☛ Increase the length of the data generated for analysis to 10s instead of 2s and recompute the coherence. What do you notice?

cpsd and mscohere use Welch's method of overlapping windows for spectral estimation. Thus, the robustness of the resulting estimate depends critically on the number of windows used. The coherence estimate should clean up somewhat as you increase the data length.

Example 2

Consider the following pair of signals:

wsize = 50;
 
Fs = 500; dt = 1./Fs;
t = [0 2];
 
tvec = t(1):dt:t(2)-dt;
f1 = 40; f2 = 40;
 
% generate some strange sine waves
mod1 = square(2*pi*4*tvec,20); mod1(mod1 < 0) = 0;
mod2 = square(2*pi*4*tvec+pi,20); mod2(mod2 < 0) = 0;
 
data1 = sin(2*pi*f1*tvec); data1 = data1.*mod1 + 0.01*randn(size(tvec));
data2 = sin(2*pi*f2*tvec); data2 = data2.*mod2 + 0.01*randn(size(tvec)) ;
 
subplot(221);
plot(tvec,data1,'r',tvec,data2,'b'); legend({'signal 1','signal 2'});
title('raw signals');
 
[P1,F] = pwelch(data1,hanning(wsize),wsize/2,length(data2),Fs);
[P2,F] = pwelch(data2,hanning(wsize),wsize/2,length(data2),Fs);
subplot(222)
plot(F,abs(P1),'r',F,abs(P2),'b'); title('PSD');
 
subplot(223);
[C,F] = mscohere(data1,data2,hanning(wsize),wsize/2,length(data1),Fs);
plot(F,C); title('coherence'); xlabel('Frequency (Hz)');
 
[ccf,lags] = xcorr(data1,data2,100,'coeff');
lags = lags.*(1./Fs); % convert samples to time
 
subplot(224)
plot(lags,ccf); grid on;
xlabel('time lag (s)'); ylabel('correlation ({\itr})'); title('xcorr');

Note that the two signals both have 40Hz components in the PSD, but the times at which the 40Hz oscillation is present do not actually overlap between the two signals. With a window size of 50 samples (100ms) we accordingly do not see any coherence at 40Hz. If you look at the cross-correlation, there is nothing much different from zero in that 100ms window.

☛ What happens when you change the window size to 500ms?

This pair of signals is clearly a pathological case, but it should be clear that coherence estimates can depend dramatically on the window size used. In this case, the larger window connects the two signals that do not actually overlap in time, causing “spurious” coherence values.

Other points about coherence

  • Note that coherence is a symmetric measure, that is, the coherence for signals A and B is the same as that for B and A. This means that, as with a correlation coefficient, coherence has no sense of directionality. It cannot resolve whether A precedes or causes B, or vice versa.

Application to real data

Overall comparison of vStr-vStr and vStr-HC coherence

Let's load three simultaneously recorded LFPs, two from the same structure (but a different electrode, both in ventral striatum) and one from a different but anatomically related structure (hippocampus):

cd('D:\data\R016\R016-2012-10-03');
LoadExpKeys;
 
cfg = []; 
cfg.fc = cat(2,ExpKeys.goodGamma(1:2),ExpKeys.goodTheta(1));
cfg.label = {'vStr1','vStr2','HC'};
csc = LoadCSC(cfg);
 
csc = restrict(csc,ExpKeys.TimeOnTrack(1),ExpKeys.TimeOffTrack(2)); % restrict to task

Next we can compute the PSDs for each signal in the familiar manner, as well as the coherence between signal pairs of interest:

Fs = csc.cfg.hdr{1}.SamplingFrequency; 
wsize = 2048;
 
nS = length(csc.label);
for iS = 1:nS
    [P{iS},F{iS}] = pwelch(getd(csc,csc.label{iS}),hanning(wsize),wsize/2,2*wsize,Fs);
 
    for iS2 = iS+1:nS
        [C{iS,iS2},Fc{iS}] = mscohere(getd(csc,csc.label{iS}),getd(csc,csc.label{iS2}),hanning(wsize),wsize/2,2*wsize,Fs);
    end
end
 
% plot
subplot(121)
cols = 'kgm';
for iS = 1:nS
h(iS) = plot(F{iS},10*log10(P{iS}),cols(iS),'LineWidth',2); hold on;
end
set(gca,'XLim',[0 150],'XTick',0:25:150,'FontSize',12); grid on;
legend(h,csc.label,'Location','Northeast'); legend boxoff;
xlabel('Frequency (Hz)'); ylabel('Power (dB)'); 
 
subplot(122); clear h;
h(1) = plot(Fc{1},C{1,2},'LineWidth',2); hold on;
h(2) = plot(Fc{1},C{1,3},'r','LineWidth',2);
set(gca,'XLim',[0 150],'XTick',0:25:150,'FontSize',12); grid on;
legend(h,{'vStr1-vStr2','vStr1-HC'},'Location','Northeast'); legend boxoff;
xlabel('Frequency (Hz)'); ylabel('Coherence');

This should give:

You can see that the PSDs show the profile characteristic for each structure: HC has a clear theta peak, which is just about visible as a slight hump in vStr. vStr has large gamma components absent from HC.

The coherence between the to vStr signals is high overall compared to that between vStr and HC. The vStr gamma frequencies are particularly coherent within the vStr. This is what we would expect from plotting the raw signals alongside each other – there is a clear relationship, as you can readily verify.

However, it is more difficult to interpret if, say, a vStr-HC coherence value of 0.1 at 25Hz is meaningful. Such comparisons are easier to make by moving to FieldTrip.

Comparison of vStr-HC coherence between experimental conditions

Based on previous work (e.g. van der Meer and Redish, 2011) we might ask: is the coherence between hippocampus and ventral striatum modulated by task events? Here we will determine if there is a change in coherence between approach to the reward site and reward receipt. This entails estimating the coherence spectrum for two different task epochs, both aligned to the time at which the rat nosepoked in the reward well. FieldTrip is ideal for this, especially because we will be doing the same operations on multiple LFPs.

First, let's load the data. In your path shortcut, remember to add the FieldTrip path first, and then the lab codebase path, and also do a git pull.

cd('D:\data\R016\R016-2012-10-03'); 
LoadExpKeys;
 
cfg.fc = cat(2,ExpKeys.goodGamma(1:2),ExpKeys.goodTheta(1));
data = ft_read_neuralynx_interp(cfg.fc);
data.label = {'vStr1','vStr2','HC'};

We will segment the data into trials using a “trialfun”, a function that conforms to a specific FieldTrip output format (see the manual) when extracting task-specific timestamps. For this data set (see the paper for the details), the timestamps of interest are the times our subject (rat) noespoked into the reward receptacles, in anticipation of receiving a number of pellets.

%% trialify
data.hdr.Fs = data.fsample;
 
cfg = [];
cfg.trialfun = 'ft_trialfun_lineartracktone2';
cfg.trialdef.hdr = data.hdr;
cfg.trialdef.pre = 2.5; cfg.trialdef.post = 5; % define time window of interest
 
cfg.trialdef.eventtype = 'nosepoke'; % could be 'nosepoke', 'reward', 'cue'; this and what follows are all task-specific
cfg.trialdef.location = 'both'; % could be 'left', 'right', 'both'
cfg.trialdef.block = 'both'; % could be 'value', 'risk', 'both'
cfg.trialdef.cue = {'c1','c3','c5'}; % cell array with choice of elements {'c1','c3','c5','lo','hi'} (1, 3, 5 pellets; low and high risk)
 
[trl, event] = ft_trialfun_lineartracktone2(cfg);
cfg.trl = trl;
 
data_trl = ft_redefinetrial(cfg,data);

Next, we compute the trial-averaged cross-spectrum; note the similarity to the code used for computing spectrograms in a previous module – we have changed cfg.output from 'pow' to 'powandcsd' (csd is for for cross-spectral density):

cfg              = [];
cfg.output       = 'powandcsd';
cfg.method       = 'mtmconvol';
cfg.taper        = 'hanning';
cfg.foi          = 1:1:100; % frequencies to use
cfg.t_ftimwin    = 20./cfg.foi;  % frequency-dependent, 20 cycles per time window
cfg.keeptrials   = 'yes';
cfg.channel      = {'vStr1', 'vStr2', 'HC1'};
cfg.channelcmb   = {'vStr2', 'HC1'; 'vStr2', 'vStr1'}; % channel pairs to compute csd for

cfg.toi          = -2:0.05:0; % pre-nosepoke baseline (time 0 is time of nosepoke)

TFR_pre = ft_freqanalysis(cfg, data_trl);

Now we can compute the coherence from the cross-spectrum and the indvidual spectra:

cfg            = [];
cfg.method     = 'coh'; % compute coherence; other measures of connectivity are also available
fd             = ft_connectivityanalysis(cfg,TFR_pre);

And finally plot the results – for this we are bypassing ft's built-in plotter so that we can add some custom touches more easily:

figure;
cols = 'rgb';
for iCmb = 1:size(fd.labelcmb,1)
    lbl{iCmb} = cat(2,fd.labelcmb{iCmb,1},'-',fd.labelcmb{iCmb,2});
 
    temp = nanmean(sq(fd.cohspctrm(iCmb,:,:)),2);
    h(iCmb) = plot(fd.freq,temp,cols(iCmb));
    hold on;
end
legend(h,lbl);

☛ The resulting coherence spectra are for the pre-nosepoke period (see cfg.toi in the frequency analysis step above). Also compute the coherence spectrum for the post-nosepoke period (0 to 2 seconds).

As you will see, a few differences in the HC-vStr coherence between the two windows are visible, such as the elevated 15Hz coherence during reward approach. To get a sense of where this may be coming from (artifact or biological? a single trial or reliable?) it would be helpful to plot the raw LFPs aligned to the nosepoke time; of course, more sessions would need to be analyzed as well to obtain errorbars on the data.

Notice that in our trial selection, we included trials on which 1 food pellet, 3 food pellets, and 5 food pellets were all included together (cfg.trialdef.cue = {'c1','c3','c5'};).

☛ Compare the coherence spectra for the 1-pellet and 5-pellet trials. What do you notice?

Time-frequency coherence analysis

A limitation of the coherence analyses up to this point has been that, like Welch's PSD, they have been averages. Just like the spectrogram provided a time-frequency view of signal power, we can attempt to compute a coherogram, coherence as a function of time and frequency.

In fact, the previous steps in FieldTrip already did this, so we can plot it (note, this is the “post-nosepoke” epoch):

iC = 1; % which signal pair to plot
lbl = [fd.labelcmb{1,:}]; % get the label of this pair
imagesc(fd.time,fd.freq,sq(fd.cohspctrm(iC,:,:))); axis xy; colorbar
xlabel('time (s)'); ylabel('Frequency (Hz)'); title(lbl);

You should get:

As you can see, the coherence at higher frequencies in particular looks very noisy. These “spurious” high coherence bins commonly show up when estimating coherence, typically when one or both signals have little power in certain frequencies.

☛ Change the time window to a fixed 1s. How does the coherogram change?

We can improve the robustness of our estimate by giving up some time resolution. Of course, averaging over more trials is another approach; other spectral estimation methods such as wavelets can also improve things (if you are interested in this, there is a nice MATLAB tutorial on wavelet coherence here).

Beyond coherence

Coherence is only one of many measures that attempt to characterize the relationship between LFPs. A glance at the documentation for ft_connectivityanalysis() reveals a who's who of popular neuroscience tools for assessing functional connectivity. A review of these methods is beyond the scope of this module, but in general they address some of the limitations of the coherence measure. For instance:

  • Phase slope index (PSI), Granger causality, and partial directed coherence (PDC) are directional measures that under certain circumstances can capture the direction of the flow of information between two signals. We will discuss a few of these in this module, below.
  • Weighted phase lag index (WPLI) can exclude contributions from a volume-conducted source common to both signals
  • Pairwise phase consistency (PPC) addresses some statistical issues of how coherence estimates are affected by the amount of data

For an illustration of how these improved methods can give a more reliable estimate of interactions than coherence, let's give pairwise phase consistency a try:

cfg            = [];
cfg.method     = 'ppc';
fd             = ft_connectivityanalysis(cfg,TFR_post);

☛ Plot the coherogram as above, changing cohspectrm to ppcspctrm in the plotting code. You should get (again for the post-nosepoke epoch):

Note that some of the spurious high-frequency events have now been eliminated. In general, however, estimates of coherence and other connectivity measures require relatively large amounts of data to obtain – more than the small number of trials from one single session considered here.

Amplitude cross-correlation

As should be clear from the discussion of coherence so far, it is a non-directional measure – it doesn't address whether signal A leads or lags signal B. There are many methods out there that can be used to address this directionality question. One that you already have the tools to perform is computing the amplitude cross-correlation between two signals, filtered in a specific frequency band. Looking at the lower left panel in the figure at the top of the page, you can see that the amplitude envelope (red line) is clearly correlated between the two signals. Computing the cross-correlation would establish at what time lead (or lag) that correlation is maximal; a peak offset from zero would indicate a specific temporal asymmetry suggesting one signal leads the other.

We will not cover this method in detail here since you already know how to compute amplitude envelopes and cross-correlations; however, if you'd like to delve more into this, example code that performs this analysis, including a very nice shuffling procedure to determine chance level, can be found on the vandermeerlab papers repository here. A recent paper introducing the method is Adhikari et al. (2010).

Granger causality: introduction

The concept of Granger causality ( Granger 1969) is simple: if a signal $X$ “Granger-causes” signal $Y$, then knowing the value of $X$ improves your ability to predict $Y$ beyond what can be predicted from the history of $Y$ alone (see also Seth 2007). Thus, Granger-causality is inferred based on the relative fits of statistical models applied to time series data.

More mathematically:

\[ M_1: Y(t) = \sum_{l = 1}^{L} a_l Y(t-l) + \epsilon_1 \\ M_2: Y(t) = \sum_{l = 1}^{L} a'_l Y(t-l) + b'_l X(t-l) + \epsilon_2 \]

If $M_2$ provides a better fit to the data (best predicts the value of $Y(t)$) then $X$ is said to Granger-cause $Y$. The parameter $L$ indicates the number of samples into the past that are included in the model; as was the case in our discussion of filters, the order of the model refers to how many past samples are included (i.e. the value of $L$).

In general, the above models $M$ are examples of autoregressive (AR) models: the dependent variable $Y(t)$ is regressed against linear combinations of past values of that variable itself, where the coefficients $a$ and $b$ can be thought of as the regression coefficients or weights of each past value. You may also encounter the term vector autoregressive (VAR) models, this is simply the multivariate extension of AR models. $M_2$ above is a VAR model since it has two variables. There is a large literature on (V)AR models, since it is a major tool in forecasting of all sorts of things ranging from the stock market to the weather.

Generating artificial data

To explore how to fit AR models to data, it's a good idea to start with some artificial data of which we know the structure. Earlier in this module we did so “by hand”, but here we will use FieldTrip's useful ft_connectivitysimulation():

cfg             = [];
cfg.ntrials     = 1000;
cfg.triallength = 5; % in seconds
cfg.fsample     = 1000;
cfg.nsignal     = 2; % two signals, X and Y, which start out as identical white noise
 
cfg.method      = 'linear_mix';
cfg.mix         = [0; 0]; % multiply white noise for X and Y by this
cfg.delay       = [0; 0]; % Y is n samples delayed relative to X (both 0)
cfg.bpfilter    = 'no';
cfg.absnoise    = 1; % add independent noise to both signals, so now X and Y should be independent
 
data            = ft_connectivitysimulation(cfg);
data.label      = {'X','Y'};

The above code generates 1000 trials of 5 seconds each of independent white noise for two signals $X$ and $Y$. We do so in a somewhat roundabout way, by first setting the common signal in A and B to zero for each (cfg.mix = [0; 0]) and then adding independent noise of amplitude 1 to each (cfg.absnoise = 1). Why this is so will become clear later when we generate more interesting combinations of signals.

☛ Verify that indeed the two signals X and Y are uncorrelated, as one would expect from independently generated white noise. One way to do so is to compute a correlation coefficient for each trial and plot the distribution of resulting correlation coefficients (use corrcoef()).

Next, we can fit our AR model:

cfg_ar         = [];
cfg_ar.order   = 3;
cfg_ar.toolbox = 'bsmart';
mdata          = ft_mvaranalysis(cfg_ar, data);

Note the order parameter, which specifies how far back to estimate coefficients for (the $L$ parameter in the equations above). Although this is FieldTrip code, it uses the BSMART toolbox under the hood to fit the model. The ft_mvaranalysis() function has some useful options we aren't using right now, such as the ability to estimate errorbars with the jackknife option. This takes a long time, however, so we don't do this now.

What we are interested in are the coefficients $a$ and $b$, i.e. the extent that we can predict each signal separately based on its own past, and then how much that prediction can be improved by knowledge of the other signal.

To plot these coefficients, we can do:

figure; subplot(221)
 
labels = {'X->X','X->Y';'Y->X','Y->Y'}; cols = 'rgbc';
nP = 0;
for iI = 1:cfg.nsignal   
    for iJ = 1:cfg.nsignal
        nP = nP + 1;
 
        h(nP) = plot(1:cfg_ar.order,sq(mdata.coeffs(iI,iJ,:)),cols(nP));
        hold on;
        plot(1:cfg_ar.order,sq(mdata.coeffs(iI,iJ,:)),'.','MarkerSize',20,'Color',cols(nP));
 
    end
end
set(gca,'FontSize',18,'LineWidth',1); box off;
set(h,'LineWidth',2);
xlabel('lag (samples)'); ylabel('coefficient');
title('cfg.delay = [0; 0];');
legend(h,labels(:));

You should see that the coefficient values are very small (on the order of $10^{-4}$). This is what we expect from signals that we know to be uncorrelated; these values should not be statistically different from zero, which would mean that we cannot predict anything about our signal based on its past – the definition of white noise!

Let's now create some signals that do have some structure:

%%
cfg.mix         = [0.8; 0.8]; % X and Y are identical white noise with amplitude 0.8
cfg.absnoise    = 0.2; % add amplitude 0.2 *independent* noise
cfg.delay       = [0; 2]; % advance Y 2 samples relative to X
 
data            = ft_connectivitysimulation(cfg);
data.label      = {'X','Y'};

☛ Fit the VAR model again, and plot the coefficients in the next subplot. You should get something like:

Note how for the delay case, we correctly estimate that X can be predicted from Y, at the expected delay of 2 samples.

It is important to be aware of the limitations of Granger causality. The term “Granger-causes” is often used to indicate the inherently descriptive nature of VAR models, which cannot distinguish true causality from a number of alternative scenarios. Prominent among these is the possibility of a common input Z affecting both X and Y, but with different time lags. X may then “Granger-cause” Y, without any direct anatomical connection between them. A different, all-too common case is when signals X and Y have different signal-to-noise ratios; we will highlight this issue in the next section. More generally, it is unclear what conclusions can be drawn from Granger causality in systems that with recurrent (feedback) connections, which are of course ubiquitous in the brain – a nice paper demonstrating and discussing this is Kispersky et al. 2011.

Spectrally resolved Granger causality

Given how ubiquitous oscillations are in neural data, it is often informative to not fit VAR models directly in the time domain (as we did in the previous section) but go to the frequency domain. Intuitively, spectrally resolved Granger causality measures how much of the power in $X$, not accounted for by $X$ itself, can be attributed to $Y$ ( technical paper). To explore this, we'll generate some more artificial data:

nTrials = 1000;
 
cfg             = [];
cfg.ntrials     = nTrials;
cfg.triallength = 5;
cfg.fsample     = 1000;
cfg.nsignal     = 2;
 
cfg.method      = 'linear_mix';
cfg.mix         = [0.5; 0.5];
cfg.delay       = [0; 4];
cfg.bpfilter    = 'yes';
cfg.bpfreq      = [50 100]; % white noise gets filtered in this frequency band
cfg.absnoise    = 0.5; % add independent noise to both signals
 
data            = ft_connectivitysimulation(cfg);
data.label      = {'X','Y'};

Note that we are now using the bpfilter cfg option, which filters the original white noise in the specified frequency band. Thus, X and Y are 50% identical signal with frequency content between 50 and 100 Hz, and 50% independent noise. (You can inspect what this looks like by doing ft_databrowser([],data)).

Next, we perform the frequency decomposition, FieldTrip-style:

cfg_TFR = [];
cfg_TFR.channel = {'X','Y'};
cfg_TFR.channelcmb = {'X' 'Y'};
cfg_TFR.method = 'mtmfft';
cfg_TFR.output = 'fourier';
cfg_TFR.foi = 1:1:150;
cfg_TFR.taper = 'hanning';
 
TFR = ft_freqanalysis(cfg_TFR,data);

Now we can compute the Granger spectra:

cfg_G = [];
cfg_G.method = 'granger';
cfg_G.channel = {'X','Y'};
cfg_G.channelcmb = {'X' 'Y'};
 
C = ft_connectivityanalysis(cfg_G,TFR);

…and plot the results:

figure;
for iP = 1:4
    subplot(2,2,iP);
    plot(C.freq,C.grangerspctrm(iP,:));
    set(gca,'FontSize',14,'YLim',[0 0.5]);
    title([C.labelcmb{iP,:}]);
end

You should get something like

These panels show how much of the power in X (or Y) can be predicted based on itself, or the other signal. You can see that the top right panel (Y→X) has higher coefficients than the reverse (X→Y), consistent with the 4-sample advancement of Y relative to X.

☛ Why are there non-zero coefficients for the X→Y direction? Test your hypothesis with artificial data.

Now, let's consider the following case:

cfg             = [];
cfg.ntrials     = nTrials;
cfg.triallength = 5;
cfg.fsample     = 1000;
cfg.nsignal     = 2;
 
cfg.method      = 'linear_mix';
cfg.mix         = [1; 0.5]; % X bigger than Y
cfg.delay       = [0; 0];
cfg.bpfilter    = 'yes';
cfg.bpfreq      = [50 100]; % white noise gets filtered in this frequency band
cfg.absnoise    = 0.5; % add independent noise to both signals
 
data            = ft_connectivitysimulation(cfg);
data.label      = {'X','Y'};

Note that $X$ is the same signal as $Y$, but twice as large; there is no delay between them. Independent noise is then added to both $X$ and $Y$ as before.

☛ Compute the Granger cross-spectra for these signals as was done above.

You should see that X Granger-causes Y.

☛ How is this possible, given that we generated X and Y to have zero delay?

This case, in which two (near-)identical signals have different signal-to-noise ratios, is very common in neuroscience. As you have seen, Granger causality can be easily fooled by this.

How can we detect if we are dealing with a Granger-causality false positive like this? An elegant way is to reverse both signals and test again; if the Granger asymmetry persists after this, we have a tell-tale of a signal-to-noise Granger artifact.

☛ Reverse the two signals and compute Granger cross-spectra, both for the zero-delay artifact case and for the true causal case above. Verify that this reverse-Granger test accurately distinguishes the two cases. This paper discusses these issues in more detail and has thoughtful discussion.

Phase-slope index

If we have a situation such as the above, it is possible that a true lag or lead between two signals is obscured by different signal-to-noise ratios. If such a case is detected by the reverse-Granger analysis, how can we proceed with identifying the true delay?

A possible solution is offered by the analysis of phase slopes: the idea that for a given lead or lag between two signals, the phase lag (or lead) should systematically depend on frequency ( Nolte et al. (2008), see also precedents in the literature such as Schoffelen et al. (2005)).

Catanese and van der Meer (2016) diagram the idea as follows:

In the example in (A) above, the red signal always leads the blue signal by 5 ms, which results in a different phase lag across frequencies (20, 25 and 33.3 Hz in this example). This is because 5ms is a much bigger slice of a full oscillation cycle at 33.3Hz than it is at 25Hz; the bottom panel shows the linear relationship between phase lag and frequency for the above examples, resulting in a positive slope for the red-blue phase difference indicating a red signal lead.

(B) shows the raw phase differences for an example real data session in the top panel: note that the phase lag as a function of frequency contains approximately linear regions in the “low-gamma” (45-65 Hz, green) and “high-gamma” (70-90 Hz, red) frequency bands, with slopes in opposite directions. The phase slope (middle panel) is the derivative of the raw phase lag, and the reversal of the phase slope sign around 65-70 Hz indicates that high and low gamma are associated with opposite directionality in the vStr-mPFC system, with vStr leading for low gamma and mPFC leading for high gamma oscillations. The bottom panel shows the phase slope index (PSI) which normalizes the raw phase slope by its standard deviation.

Thus, to summarize, the phase slope index (PSI) is a normalized form of the phase slope – obtained by dividing the raw phase slope at each frequency by its standard deviation (estimated using a bootstrap). The phase slope itself is obtained by taking the derivative (slope) of the raw phase differences across frequencies; as discussed above, these raw phase differences can be obtained by estimating the phase (angle) of the cross-spectrum.

The time lag (or lead) between two signals given a phase slope is:

\begin{equation} t_{a-b} = [\frac{\phi_{a-b}(f+df) - \phi_{a-b}(f)}{df}]/ 360^{\circ} \label{eq:psi} \end{equation}

where $t_{a-b}$ is the time lag (or lead) in seconds between signals $a$ and $b$, to be inferred from the phase differences $\phi_{a-b}$ (in degrees) observed at frequencies $f$ and $f+df$. For instance, given a phase difference $\phi_{a-b} = 45^{\circ}$ between signals $a$ and $b$ at $f = 25$Hz, and $\phi_{a-b} = 36^{\circ}$ at $f = 20$Hz, $t_{a-b} = [(45-36)/(25-20)]/360 = 5$ms (the example in panel A above). As $df \to 0$, the fraction shown in square brackets above corresponds to the derivative $\phi_{a-b}'(f)$, i.e. the phase slope. Positive time lags indicate that $a$ leads $b$.

To test how this works, let's generate two signals with an ambiguous Granger-relationship:

nTrials = 1000;
 
cfg             = [];
cfg.ntrials     = nTrials;
cfg.triallength = 5;
cfg.fsample     = 1000;
cfg.nsignal     = 2;
 
cfg.method      = 'linear_mix';
cfg.mix         = [1; 0.3]; % X bigger than Y
cfg.delay       = [0; 4];
cfg.bpfilter    = 'yes';
cfg.bpfreq      = [50 100]; % white noise gets filtered in low gamma band
cfg.absnoise    = 0.5; % add independent noise to both signals
 
data            = ft_connectivitysimulation(cfg);
data.label      = {'X','Y'};

Note that Y leads X, but X has larger amplitude than Y.

☛ Verify that according to the Granger spectra, there is no evidence to support an asymemtric (Granger-causal) relationship between Y and X. Since we generated the signals with a 4 sample lead for Y, we know this to be incorrect.

Now, let's compute the phase slope. We start with the Fourier decomposition, as before:

cfg_TFR = [];
cfg_TFR.channel = {'X','Y'};
cfg_TFR.channelcmb = {'X' 'Y'};
cfg_TFR.method = 'mtmfft';
cfg_TFR.output = 'fourier';
cfg_TFR.foi = 1:1:150;
cfg_TFR.taper = 'hanning';
 
TFR = ft_freqanalysis(cfg_TFR,data);

But now, we use a different method for the connectivity analysis:

cfg_psi = [];
cfg_psi.method = 'psi';
cfg_psi.bandwidth = 8; % number of frequencies to compute slope over
cfg_psi.channel = {'X','Y'};
cfg_psi.channelcmb = {'X' 'Y'};
 
C = ft_connectivityanalysis(cfg_psi,TFR);

We plot the phase slope between Y and X:

figure;
plot(C.freq,sq(C.psispctrm(2,1,:)));
xlabel('Frequency'); ylabel('Phase slope');

The positive phase slope correctly identified that Y leads X.

☛ What are the units on the vertical axis?

Challenges

★ If you have your own data with at least two signals that you suspect may be related, identify an appropriate functional connectivity analysis and apply it to the data. Comment on why the chosen method was used.

★ The “theta” rhythm, which is about 8 Hz in moving rodents, is important in coordinating the spike timing of hippocampal neurons. However, theta frequencies also appear in LFPs recorded from other brain areas, including the prefrontal cortex and the ventral striatum. One hypothesis is that those areas simply “inherit” theta activity from their hippocampal inputs. Test this idea using data from R020, which has electrodes in hippocampus and ventral striatum, and your chosen connectivity analysis method.

analysis/course-w16/week11.txt · Last modified: 2018/07/07 10:19 (external edit)