User Tools

Site Tools


analysis:nsb2015:week0

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
analysis:nsb2015:week0 [2016/01/05 18:40]
mvdm [7. Test on synthetic data]
analysis:nsb2015:week0 [2023/02/02 14:29] (current)
mvdm
Line 1: Line 1:
-~~DISCUSSION~~ 
- 
 ==== Introduction:​ Principles of careful data analysis ==== ==== Introduction:​ Principles of careful data analysis ====
  
Line 97: Line 95:
 === 4. Write to share === === 4. Write to share ===
  
-A desirable endpoint of a successful analysis is that you can share the code and the raw data with anyone, and they will be able to generate all the figures and results in the paper.+A desirable endpoint of a successful analysis is that you can share the code and the raw data with anyone, and they will be able to generate all the figures and results in the paper. A nice example is [[http://​www.jneurosci.org/​content/​34/​5/​1892.full | Bekolay et al. J Neurosci 2014]] where the Notes section gives the link to a %%GitHub%% release with very nicely organized and documented code that reproduces the results.
  
 This means, among other things, that: This means, among other things, that:
  
-  * Annotate your data. We use two annotation files, %%ExpKeys%% and metadata, which contain a number of mandatory descriptors common across all our lab's tasks (hat-tip to A. David Redish -- MvdM) as well as more experiment-specific information. Our current lab-general specification for these files can be found [[https://​github.com/​mvdm/​vandermeerlab/​blob/​master/​doc/​HOWTO_ExpKeys_Metadata.md|here]],​ and task-specific descriptors can be found [[http://​ctnsrv.uwaterloo.ca/​vandermeerlab/​doku.php?​id=analysis:​dataanalysis#​task_descriptions_and_metadata|here]].+  * Annotate your data. We use two annotation files, %%ExpKeys%% and metadata, which contain a number of mandatory descriptors common across all our lab's tasks as well as more experiment-specific information. Our current lab-general specification for these files can be found [[https://​github.com/​mvdm/​vandermeerlab/​blob/​master/​doc/​HOWTO_ExpKeys_Metadata.md|here]],​ and task-specific descriptors can be found [[http://​ctnsrv.uwaterloo.ca/​vandermeerlab/​doku.php?​id=analysis:​dataanalysis#​task_descriptions_and_metadata|here]]. As a postdoc in the Redish lab, a similar standardized annotation system enabled me to analyze and compare three large data sets, recorded by three different people from different brain regions ([[http://​www.cell.com/​neuron/​abstract/​S0896-6273(10)00507-6 | van der Meer et al. Neuron 2010]]).
   * Don't hard-code the locations of any files. Follow the [[http://​ctnsrv.uwaterloo.ca/​vandermeerlab/​doku.php?​id=analysis:​course-w16:​week2#​data_files_overview|database format and file naming conventions]] so that it is sufficient to specify the root folder where the data are located.   * Don't hard-code the locations of any files. Follow the [[http://​ctnsrv.uwaterloo.ca/​vandermeerlab/​doku.php?​id=analysis:​course-w16:​week2#​data_files_overview|database format and file naming conventions]] so that it is sufficient to specify the root folder where the data are located.
   * Be explicit about what version numbers of various pieces of software you used to generate the results. Taken to the limit, this means also specifying the exact operating system version and shared libraries -- an issue best addressed by including an image or virtual machine (see e.g. [[http://​www.russpoldrack.org/​2015_12_01_archive.html|this blogpost]] by Russ Poldrack for discussion). A nice way to handle this with respect to code on %%GitHub%% is to create a [[https://​help.github.com/​articles/​creating-releases/​|release]] for a publication (essentially an easily linked to snapshot of the code on the repository).   * Be explicit about what version numbers of various pieces of software you used to generate the results. Taken to the limit, this means also specifying the exact operating system version and shared libraries -- an issue best addressed by including an image or virtual machine (see e.g. [[http://​www.russpoldrack.org/​2015_12_01_archive.html|this blogpost]] by Russ Poldrack for discussion). A nice way to handle this with respect to code on %%GitHub%% is to create a [[https://​help.github.com/​articles/​creating-releases/​|release]] for a publication (essentially an easily linked to snapshot of the code on the repository).
 +
 === 5. Be safe === === 5. Be safe ===
  
Line 122: Line 121:
   * [[https://​en.wikipedia.org/​wiki/​Resampling_(statistics)|Resampling]] (aka bootstrapping,​ shuffling, permutation testing): generating synthetic data sets based on some known distribution,​ usually to compare to actual data.   * [[https://​en.wikipedia.org/​wiki/​Resampling_(statistics)|Resampling]] (aka bootstrapping,​ shuffling, permutation testing): generating synthetic data sets based on some known distribution,​ usually to compare to actual data.
   * Model comparison: the process of determining which model best describes the data.   * Model comparison: the process of determining which model best describes the data.
 +
 === 7. Test on synthetic data === === 7. Test on synthetic data ===
  
 Analysis pipelines can get complicated quickly, such that it can be difficult to track down where things may be going wrong. A great tool to verify the integrity of single analysis steps, as well as entire workflows, is to test on data you generate, such that you know what the answer should be. For instance, if you input Poisson (random) spike data with a constant firing rate, totally independent of your experimental conditions, it better not be the case that your analysis reports a significant difference! Analysis pipelines can get complicated quickly, such that it can be difficult to track down where things may be going wrong. A great tool to verify the integrity of single analysis steps, as well as entire workflows, is to test on data you generate, such that you know what the answer should be. For instance, if you input Poisson (random) spike data with a constant firing rate, totally independent of your experimental conditions, it better not be the case that your analysis reports a significant difference!
analysis/nsb2015/week0.1452037230.txt.gz ยท Last modified: 2018/07/07 10:19 (external edit)