User Tools

Site Tools


analysis:nsb2014:week0

This is an old revision of the document!


1. Garbage in, garbage out.

Analysis will be meaningless if performed on bad data. Even if you start out with good data, there are many analysis steps that have the power to corrupt.

An important corollary of this is that you need to determine at every step whether you are dealing with garbage or not. Two essential habits that help with this are visualization (explored in Module 5) and unit testing (put simply, the practice of testing specific pieces of functionality or “units”; employed throughout the modules).

To see why this principle is critical, consider performing a complex multistep experimental procedure such as implanting a recording probe. In this setting, the surgeon always verifies the success of the previous step before proceeding. One would never attempt to insert a probe without making sure dura is removed first. Or apply dental cement to the skull without first making sure it is dry. Apply the same mindset to analysis and confirm the success of every step before proceeding!

2. Plan ahead.

Before the start of data collection, you should identify the steps in your data processing pipeline. Doing so often highlight key dependencies and potentially important controls that ensure you collect the data in such a way that you are then in a position to do or test what you set out to do. Work from the raw data all the way to the statistical tests, plots, or resource that is the final outcome of the analysis.

There are two steps to this:

First, think in terms of data, and transformations on those data, and create a schematic that captures each data type and the transformations. For instance, to determine whether the number of sharp wave-ripple complexes (SWRs) that occur depends on an experimental manipulation, this analysis workflow might be represented as follows (generated with GraphViz):

The above workflow shows how raw local field potential (LFP) data is first loaded (by the LoadCSC() function) and then filtered (FilterLFP()). Note that at this stage, you can simply make up function names, as long as they are descriptive (see Principle 3, below). Next, SWRs events are detected from the filtered LFP, and the number for each trial counted before applying a statistical test.

The square brackets such as [TSD] refer to standardized data types, introduced in Module 2. Briefly, a TSD object describes one or more time-varying signals (such as a LFP or videotracker data), an IV object describes interval data (such as SWR events, which have a start and end time as well as some properties such as their power), and a TS object describes timestamps (for instance spikes). By standardizing the form in which these data types are handled, we can more easily implement unit tests and write clean, modular code.

The second step: based on a data analysis workflow such as the above, write out the pseudocode that would implement the workflow. This would look something like:

cfg = []; cfg.target = 'SWR';
LFP = LoadCSC(cfg); % looks at ExpKeys to find filename of LFP tagged as containing SWR
 
cfg = []; cfg.f = [150 220];
LFPfilt = FilterLFP(cfg,LFP);
 
cfg = []; cfg.method = 'zscore';
cfg.threshold = 5; cfg.select =  '>'; % return intervals where threshold is exceeded
 
SWR = MakeIV(cfg,LFPfilt); % make intervals (corresponding to SWR events)

3. Learn about, and implement, good programming practice.

There are many resources and opinions on what constitutes this, but some of the most important ideas are:

  • Readability. Whatever analysis you do, you will have to do it again. Maybe tomorrow, maybe next year. You might think you will remember what you did and why, but you probably won't. Even if somehow you do, it's likely someone else will have to run and understand your code. Whether or not they can will reflect on you. So, use expressive variable and function names. Comment a lot. Write example workflows.
  • Don't repeat yourself. Implementing each piece of functionality only once means your code will be easier to troubleshoot, re-use, and extend.

4. Use protection.

Disk, computer, and connection failures happen, usually when you are least prepared. Make sure your least prepared level is sufficient to withstand it.

analysis/nsb2014/week0.1403819199.txt.gz · Last modified: 2018/07/07 10:19 (external edit)