User Tools

Site Tools


This is an old revision of the document!

Data analysis in Neuroscience, Week 1: Good habits for data analysis (paths, backups, versioning, annotation)


  • Set up a working MATLAB installation with appropriate path shortcuts
  • Use GitHub to acquire the lab codebase
  • Create a well-designed folder structure for your projects (including this course)
  • Choose and implement a backup strategy for your project files
  • Understand the flow of raw data, from acquisition to analysis
  • Connect to the database, download a data set, and test your path setup



Installing MATLAB

If you are using a lab computer, it will have MATLAB installed. Verify that it can start successfully (you'll get the » prompt in the Command Window).

If you are using your own computer, you can download MATLAB from the website. For it to work, you need the username to be 'mvdmlab' and the network.lic license file available from the lab Dropbox. If you want to use MATLAB off-campus, you also may need to connect to the campus network using a VPN (See the uWaterloo VPN page on how to do this). It may also be possible to acquire your own working copy of MATLAB that does not have these restrictions.

This course assumes a basic working knowledge of MATLAB, corresponding roughly to the material in the "Getting Started with MATLAB" Primer. If you are unsure, take a look at the table of contents. If there are things you don't recognize, use the Primer itself, or Chapter 2 of the MATLAB for Neuroscientists book to get up to speed.

Setting up GitHub

GitHub is a system for “distributed version control”: it keeps track of changes to a set of collaboratively edited files, such as pieces of MATLAB code. This is done so that it is easy to share improvements between collaborators. If you are new to GitHub, watch the video under Resources above.

If you don't already have a GitHub account, go to GitHub and sign up.

Download and install the GUI client of your choice; for Windows, I recommend GitHub Windows to get started. For installing Git and setting up GitHub on various operating systems, see GitHub: Set Up Git

Configure your client. For GitHub Windows, you'll first need to sign in with your account, then click Tools > Options. Set the “Default Storage Directory” to something reasonable: on lab computers, this should be something on the D: drive (for example, D:\My_Documents\GitHub\). Also check that your username and e-mail address look ok (I am mvdm).

Next, on the GitHub website, search for the repository called BIOL680 and Fork it. Notice that this creates your own personal copy of the original repository.

Go back to your local client and hit Refresh. You should now see your forked repo appear, along with the option to Clone it. Do this. Verify that your local filesystem now contains a BIOL680 folder in the location you specified in your GitHub client configuration: this is the local version of your forked repo, which you can now make changes to.

Open the file in the MATLAB editor (if you don't have one, type edit at the MATLAB command prompt) and add a line to it with your name and date. Save your updated file (notice the * in the MATLAB editor window that indicates unsaved changes).

In your local GitHub client, hit Refresh (making sure you are looking at the BIOL680 repo of course). It should indicate that there is now a file to be committed, and highlight the line you added. Write a short commit message, and hit Commit. Notice you now have “unsynced commits”. To upload (push) your change to GitHub, hit sync. On the GitHub site for this repo, verify that your change appears there.

Important: the change you just made is limited to *your fork* only. The original repo that you forked is *not* automatically updated with your change! To remedy this, select “Pull Requests” on your repo site and choose “New Pull Request”. The resulting page will show the diff of the file in your repo against the same file in the original repo, so it should highlight the change you made. Select “Click to create a Pull Request…”, add a title, and click “Send Pull Request”.

Now, the owner of the original repo (in this case, mvdm) will need to approve the request before the change is merged into the original repo. You will get a confirmation message by e-mail when this is done.

This process works the same way for changes in the other direction. If the original repo is updated, your fork does not automatically update. If it did, it could have some nasty consequences: someone could break your code without you knowing about it! You are in control of your own fork and need to check for, and approve, any changes you want. A good way of doing this is to Watch the original repo using the button on the GitHub site.

Note: if you are confused or curious about GitHub, or distributed version control in general, a great way to get answers beyond reading the documentation and doing the tutorials is to go for drinks with the folks in the Eliasmith lab!

Using GitHub to acquire the lab codebase

Using your experience from the previous section, fork and create a local clone of the vandermeerlab repo. Verify that it exists in your local filesystem before proceeding.

Configuring MATLAB to use the lab codebase

See Setting up MATLAB. Things should be set up so that you have a shortcut that does the following:

  • Restore the default path
  • Add all folders from the vandermeerlab repo
  • cd to your BIOL680 folder

Check that the shortcut works before proceeding.

Grab a data session from the lab database

Use a FTP client such as Filezilla or WinSCP to connect to the lab FTP server, mvdmlab-nas1 ( Configure your FTP client to require “explicit FTP over TLS” and use BIOL680 as username and password. In the BIOL680 folder, download the folder R016-2012-10-08. A good place to put this folder is in D:\data\promoted\R016\. (In general you want to keep your data separate from your code; for instance, multiple analysis projects may use the same data, so you don't want to duplicate it.)

You will have to be on campus to connect. If you still cannot log in to the server, send me your IP address and I will temporarily enable access for you. IF it still does not work, get the .zip here.

Verify things are working

As explained in the Noble paper, create a folder with today's date; I do this within a folder called daily so to keep things manageable. Create a sandbox.m file in it and use Cell Mode to check that you can load a data file from the data folder you grabbed (because the loader function is in your path):

%% load data
% first, cd to where the data you just grabbed is located
[csc,csc_info] = LoadCSC('R016-2012-10-08-CSC02d.ncs');
tvec = Range(csc);
raw_LFP = Data(csc);
%% plot
nSamples = 10000;

You should replace the comment above with a cd command to change directory to where your data is located. Do not place the data in your code folder!

If you get no errors and see a nice neural signal, save your sandbox.m script. Commit and sync to your GitHub fork. If you do get errors, verify that your path is set up correctly (you can type path to get a listing; it should have the various vandermeerlab folders in it. If not, go back to the Setting up MATLAB steps.)

For Mac/OS X users

If you are running Matlab on OS X (and possibly Linux), the above sandbox.m code will probably fail. This is most likely because the vandermeerlab codebase downloaded from GitHub calls on low-level Windows functions. The following steps have worked for someone using OS X 10.8, with Matlab R2013a:

  • Head over to the Neuralynx website.
  • Download the Neuralynx to Matlab Import for Linux and Mac OS X package (direct link).
  • Extract the archive you have downloaded into a folder, and add that folder to your path shortcut (See Setting up MATLAB)
  • Navigate to the extracted folder/binaries/, find the file Nlx2MatCSC_v3.mexmaci, and rename it to Nlx2MatCSC.mexmaci (removing _v3)
  • Again, make sure this folder is included in your path, and try running the sandbox.m again.
  • If you add neuralynx above vandermeerlab in your path Matlab should use the new neuralynx binaries. If not, you may need to delete vandermeerlab/util/neuralynx/ for this to work.

The sandbox.m should run properly now, and you should see the plot you're supposed to see.

For Linux users

Follow the instructions above for Mac/OS X users, except you will have to recompile the binaries (note that you will probably need C and C++ compilers installed. Install the build-essential package on Ubuntu):

  • You may want to just delete the existing binaries.
  • Edit to set PLATFORM=64PC or PLATFORM=32PC depending on your architecture, and edit INCLMATLAB and BINMATLAB so that they point to the correct directories for your Matlab installation. If you don't remember, run locate mexsh in the shell and you should see the path.
  • You can rename all the files in the binary directory with the shell command:
    > rename 's/_v3//' *

This worked on 64 bit Ubuntu with Matlab R2013b.

Read up on the data preprocessing pipeline

Be backup-aware

If you are using a lab computer, only put data and code on the D:\ drive. This actually has two underlying hard drives (a RAID 1 array in “mirroring” mode) such that if one fails, your data is still available. However, this does not protect accidentally deleting data, overwriting a key file, any sort of data corruption or damage, et cetera. Some options to minimize the impact of those:

  • Save your code (and other work that does not take up huge amounts of space) on Dropbox, Google Drive, or similar service that keeps a (limited) revision history
  • Schedule periodic back-ups to another location
  • Periodically commit your work to GitHub

That's it!

You're set! Just make sure that your sandbox.m file appears in your repo (not the original that you forked), online on GitHub. I will look at this file to verify that you got to this point OK.

analysis/course/week1.1379692240.txt.gz · Last modified: 2018/07/07 10:19 (external edit)