User Tools

Site Tools


analysis:course:week1

This is an old revision of the document!


~~DISCUSSION~~

Data analysis in Neuroscience, Week 1: Good habits for data analysis (paths, backups, versioning, annotation)

Goals:

  • Set up a working MATLAB installation with appropriate path shortcuts
  • Use GitHub to acquire the lab codebase
  • Create a well-designed folder structure for your projects (including this course)
  • Choose and implement a backup strategy for your project files
  • Understand the flow of raw data, from acquisition to analysis
  • Connect to the database, download a data set, and test your path setup

Resources:

Step-by-step:

Installing MATLAB

If you are using a lab computer, it will have MATLAB installed. Verify that it can start successfully (you'll get the » prompt in the Command Window).

If you are using your own computer, you can download MATLAB from the mathworks.com website. For it to work, you need the username to be 'mvdmlab' and the network.lic license file available from the lab Dropbox. If you want to use MATLAB off-campus, you also may need to connect to the campus network using a VPN (See the uWaterloo VPN page on how to do this). It may also be possible to acquire your own working copy of MATLAB that does not have these restrictions.

This course assumes a basic working knowledge of MATLAB, corresponding roughly to the material in the "Getting Started with MATLAB" Primer. If you are unsure, take a look at the table of contents. If there are things you don't recognize, use the Primer itself, or Chapter 2 of the MATLAB for Neuroscientists book to get up to speed.

Setting up GitHub

GitHub is a system for “distributed version control”: it keeps track of changes to a set of collaboratively edited files, such as pieces of MATLAB code. This is done so that it is easy to share improvements between collaborators. If you are new to GitHub, watch the video under Resources above.

If you don't already have a GitHub account, go to GitHub and sign up.

Download and install the GUI client of your choice; for Windows, I recommend GitHub Windows to get started.

Configure your client. For GitHub Windows, you'll first need to sign in with your account, then click Tools > Options. Set the “Default Storage Directory” to something reasonable: on lab computers, this should be something on the D: drive (for example, D:\My_Documents\GitHub\). Also check that your username and e-mail address look ok (I am mvdm).

Next, on the GitHub website, search for the repository called BIOL680 and Fork it. Notice that this creates your own personal copy of the original repository.

Go back to your local client and hit Refresh. You should now see your forked repo appear, along with the option to Clone it. Do this. Verify that your local filesystem now contains a BIOL680 folder in the location you specified in your GitHub client configuration: this is the local version of your forked repo, which you can now make changes to.

Open the readme.md file in the MATLAB editor (if you don't have one, type edit at the MATLAB command prompt) and add a line to it with your name and date. Save your updated file (notice the * in the MATLAB editor window that indicates unsaved changes).

In your local GitHub client, hit Refresh (making sure you are looking at the BIOL680 repo of course). It should indicate that there is now a file to be committed, and highlight the line you added. Write a short commit message, and hit Commit. Notice you now have “unsynced commits”. To upload (push) your change to GitHub, hit sync. On the GitHub site for this repo, verify that your change appears there.

Important: the change you just made is limited to *your fork* only. The original repo that you forked is *not* automatically updated with your change! To remedy this, select “Pull Requests” on your repo site and choose “New Pull Request”. The resulting page will show the diff of the file in your repo against the same file in the original repo, so it should highlight the change you made. Select “Click to create a Pull Request…”, add a title, and click “Send Pull Request”.

Now, the owner of the original repo (in this case, mvdm) will need to approve the request before the change is merged into the original repo. You will get a confirmation message by e-mail when this is done.

This process works the same way for changes in the other direction. If the original repo is updated, your fork does not automatically update. If it did, it could have some nasty consequences: someone could break your code without you knowing about it! You are in control of your own fork and need to check for, and approve, any changes you want. A good way of doing this is to Watch the original repo using the button on the GitHub site.

Note: if you are confused or curious about GitHub, or distributed version control in general, a great way to get answers beyond reading the documentation and doing the tutorials is to go for drinks with the folks in the Eliasmith lab!

Using GitHub to acquire the lab codebase

Using your experience from the previous section, fork and create a local clone of the vandermeerlab repo. Verify that it exists in your local filesystem before proceeding.

Configuring MATLAB to use the lab codebase

See Setting up MATLAB. Things should be set up so that you have a shortcut that does the following:

  • Restore the default path
  • Add all folders from the vandermeerlab repo
  • cd to your BIOL680 folder

Check that the shortcut works before proceeding.

Grab a data session from the lab database

Use a FTP client such as Filezilla or WinSCP to connect to the lab FTP server, mvdmlab-nas1 (129.97.62.84). Configure your FTP client to require “explicit FTP over TLS” and use BIOL680 as username and password. In the BIOL680 folder, download the folder R016-2012-10-08. A good place to put this folder is in D:\data\promoted\R016\. (In general you want to keep your data separate from your code; for instance, multiple analysis projects may use the same data, so you don't want to duplicate it.)

You will have to be on campus to connect. If you still cannot log in to the server, send me your IP address and I will temporarily enable access for you. IF it still does not work, get the .zip here.

Verify things are working

As explained in the Noble paper, create a folder with today's date; I do this within a folder called daily so to keep things manageable. Create a sandbox.m file in it and use Cell Mode to check that you can load a data file from the data folder you grabbed (because the loader function is in your path):

%% load data
% first, cd to where the data you just grabbed is located
[csc,csc_info] = LoadCSC('R016-2012-10-08-CSC02d.ncs');
tvec = Range(csc);
raw_LFP = Data(csc);
 
%% plot
nSamples = 10000;
plot(tvec(1:nSamples),raw_LFP(1:nSamples));

You should replace the comment above with a cd command to change directory to where your data is located. Do not place the data in your code folder!

If you get no errors and see a nice neural signal, save your sandbox.m script. Commit and sync to your GitHub fork. If you do get errors, verify that your path is set up correctly (you can type path to get a listing; it should have the various vandermeerlab folders in it. If not, go back to the Setting up MATLAB steps.)

Read up on the data preprocessing pipeline

Be backup-aware

If you are using a lab computer, only put data and code on the D:\ drive. This actually has two underlying hard drives (a RAID 1 array in “mirroring” mode) such that if one fails, your data is still available. However, this does not protect accidentally deleting data, overwriting a key file, any sort of data corruption or damage, et cetera. Some options to minimize the impact of those:

  • Save your code (and other work that does not take up huge amounts of space) on Dropbox, Google Drive, or similar service that keeps a (limited) revision history
  • Schedule periodic back-ups to another location
  • Periodically commit your work to GitHub

That's it!

You're set! Just make sure that your sandbox.m file appears in your repo (not the original that you forked), online on GitHub. I will look at this file to verify that you got to this point OK.

analysis/course/week1.1379522346.txt.gz · Last modified: 2018/07/07 10:19 (external edit)