### Sidebar

Reference

analysis:course:week1

#### Data analysis in Neuroscience, Week 1: Good habits for data analysis (paths, backups, versioning, annotation)

Goals:

• Set up a working MATLAB installation with appropriate path shortcuts
• Use GitHub to acquire the lab codebase
• Create a well-designed folder structure for your projects (including this course)
• Choose and implement a backup strategy for your project files
• Understand the flow of raw data, from acquisition to analysis

Resources:

Step-by-step:

#### Installing MATLAB

If you are using a lab computer, it will have MATLAB installed. Verify that it can start successfully (you'll get the » prompt in the Command Window).

If you are using your own computer, you can download MATLAB from the mathworks.com website. For it to work, you need the username to be 'mvdmlab' and the network.lic license file available from the lab Dropbox. If you want to use MATLAB off-campus, you also may need to connect to the campus network using a VPN (See the uWaterloo VPN page on how to do this). It may also be possible to acquire your own working copy of MATLAB that does not have these restrictions.

This course assumes a basic working knowledge of MATLAB, corresponding roughly to the material in the "Getting Started with MATLAB" Primer. If you are unsure, take a look at the table of contents. If there are things you don't recognize, use the Primer itself, or Chapter 2 of the MATLAB for Neuroscientists book to get up to speed.

#### Setting up GitHub

GitHub is a system for “distributed version control”: it keeps track of changes to a set of collaboratively edited files, such as pieces of MATLAB code. This is done so that it is easy to share improvements between collaborators. If you are new to GitHub, watch the video under Resources above.

Download and install the GUI client of your choice; for Windows, I recommend GitHub Windows to get started. For installing Git and setting up GitHub on various operating systems, see GitHub: Set Up Git

Configure your client. For GitHub Windows, you'll first need to sign in with your account, then click Tools > Options. Set the “Default Storage Directory” to something reasonable: on lab computers, this should be something on the D: drive (for example, D:\My_Documents\GitHub\). Also check that your username and e-mail address look ok (I am mvdm).

Next, on the GitHub website, search for the repository called BIOL680 and Fork it. Notice that this creates your own personal copy of the original repository.

Go back to your local client and hit Refresh. You should now see your forked repo appear, along with the option to Clone it. Do this. Verify that your local filesystem now contains a BIOL680 folder in the location you specified in your GitHub client configuration: this is the local version of your forked repo, which you can now make changes to.

Open the readme.md file in the MATLAB editor (if you don't have one, type edit at the MATLAB command prompt) and add a line to it with your name and date. Save your updated file (notice the * in the MATLAB editor window that indicates unsaved changes).

In your local GitHub client, hit Refresh (making sure you are looking at the BIOL680 repo of course). It should indicate that there is now a file to be committed, and highlight the line you added. Write a short commit message, and hit Commit. Notice you now have “unsynced commits”. To upload (push) your change to GitHub, hit sync. On the GitHub site for this repo, verify that your change appears there.

Important: the change you just made is limited to *your fork* only. The original repo that you forked is *not* automatically updated with your change! To remedy this, select “Pull Requests” on your repo site and choose “New Pull Request”. The resulting page will show the diff of the file in your repo against the same file in the original repo, so it should highlight the change you made. Select “Click to create a Pull Request…”, add a title, and click “Send Pull Request”.

Now, the owner of the original repo (in this case, mvdm) will need to approve the request before the change is merged into the original repo. You will get a confirmation message by e-mail when this is done.

This process works the same way for changes in the other direction. If the original repo is updated, your fork does not automatically update. If it did, it could have some nasty consequences: someone could break your code without you knowing about it! You are in control of your own fork and need to check for, and approve, any changes you want. A good way of doing this is to Watch the original repo using the button on the GitHub site.

Note: if you are confused or curious about GitHub, or distributed version control in general, a great way to get answers beyond reading the documentation and doing the tutorials is to go for drinks with the folks in the Eliasmith lab!

#### Using GitHub to acquire the lab codebase

Using your experience from the previous section, fork and create a local clone of the vandermeerlab repo. Verify that it exists in your local filesystem before proceeding.

#### Configuring MATLAB to use the lab codebase

See Setting up MATLAB. Things should be set up so that you have a shortcut that does the following:

• Restore the default path
• Add all folders from the vandermeerlab repo
• cd to your BIOL680 folder

Check that the shortcut works before proceeding.

#### Grab a data session from the lab database

Use a FTP client such as Filezilla or WinSCP to connect to the lab FTP server, mvdmlab-nas1 (129.97.62.84). Configure your FTP client to require “explicit FTP over TLS” and use BIOL680 as username and password. In the BIOL680 folder, download the folder R016-2012-10-08. A good place to put this folder is in D:\data\promoted\R016\. (In general you want to keep your data separate from your code; for instance, multiple analysis projects may use the same data, so you don't want to duplicate it.)

Correct FileZilla configuration is the following:

You will have to be on campus to connect. If you still cannot log in to the server, send me your IP address and I will temporarily enable access for you. IF it still does not work, get the .zip here.

#### Verify things are working

As explained in the Noble paper, create a folder with today's date; I do this within a folder called daily so to keep things manageable. Create a sandbox.m file in it and use Cell Mode to check that you can load a data file from the data folder you grabbed (because the loader function is in your path):

%% load data
% first, cd to where the data you just grabbed is located
tvec = Range(csc);
raw_LFP = Data(csc);

%% plot
nSamples = 10000;
plot(tvec(1:nSamples),raw_LFP(1:nSamples));

You should replace the comment above with a cd command to change directory to where your data is located. Do not place the data in your code folder!

If you get no errors and see a nice neural signal, save your sandbox.m script. Commit and sync to your GitHub fork. If you do get errors, verify that your path is set up correctly (you can type path to get a listing; it should have the various vandermeerlab folders in it. If not, go back to the Setting up MATLAB steps.)

#### For Mac/OS X users

If you are running Matlab on OS X (and possibly Linux), the above sandbox.m code will probably fail. This is most likely because the vandermeerlab codebase downloaded from GitHub calls on low-level Windows functions. The following steps have worked for someone using OS X 10.8, with Matlab R2013a:

• Head over to the Neuralynx website.
• Navigate to the extracted folder/binaries/, find the file Nlx2MatCSC_v3.mexmaci, and rename it to Nlx2MatCSC.mexmaci (removing _v3)
• Again, make sure this folder is included in your path, and try running the sandbox.m again.
• If you add neuralynx above vandermeerlab in your path Matlab should use the new neuralynx binaries. If not, you may need to delete vandermeerlab/util/neuralynx/ for this to work.

The sandbox.m should run properly now, and you should see the plot you're supposed to see.

#### For Linux users

Follow the instructions above for Mac/OS X users, except you may need to recompile the binaries (note that you will need C and C++ compilers installed. Install the build-essential package on Ubuntu):

• You may want to just delete the existing binaries.
• Edit compile.sh to set PLATFORM=64PC or PLATFORM=32PC depending on your architecture, and edit INCLMATLAB and BINMATLAB so that they point to the correct directories for your Matlab installation. If you don't remember, run locate mexsh in the shell and you should see the path.
• You can rename all the files in the binary directory with the shell command:
> rename 's/_v3//' *

This worked on 64 bit Ubuntu with Matlab R2013b.

#### Be backup-aware

If you are using a lab computer, only put data and code on the D:\ drive. This actually has two underlying hard drives (a RAID 1 array in “mirroring” mode) such that if one fails, your data is still available. However, this does not protect against accidentally deleting data, overwriting a key file, any sort of data corruption or damage, et cetera. Some options to minimize the impact of those:

• Save your code (and other work that does not take up huge amounts of space) on Dropbox, Google Drive, or similar service that keeps a (limited) revision history
• Schedule periodic back-ups to another location
• Periodically commit your work to GitHub

#### That's it!

You're set! Just make sure that your sandbox.m file appears in your repo (not the original that you forked), online on GitHub. I will look at this file to verify that you got to this point OK.

## Discussion

, 2013/09/18 12:59

I enjoyed noble's paper, but I think the use of dated directories may be missing the point of version control systems. Completed analyses at various stages can be saved with Git tags, and different revisions made for producing different output documents (i.e., poster presentations, submissions for various journals, etc.) can be better managed as Git branches. That way files from one analysis or revision can simply be modified for another -maintaining history, avoiding redundancy, and allowing for fixes to be made to all versions simultaneously. Branches may be advanced for someone new to VCS, but tagging is quite simple and very useful.

, 2013/09/18 14:55, 2013/09/25 15:15

Great point – when you get comfortable using your favorite VCS, there is no need for the dated directory system any more. I have found the dated directory system useful to get started with a certain amount of discipline and mindfulness about how to manage files, without introducing the complexity of a true VCS. The two systems can actually coexist quite well, e.g. you can have a 'shared' or 'promoted' folder under VCS where you move files that you keep using and developing, and the dated directories are really just a scratchpad. (In fact starting out with the date system often enables users to “discover” that what they really want is a VCS!)

, 2013/09/20 13:44

I added a section on fixing the LoadCSC function in Matlab running on OS X and making sandbox.m work.

, 2013/09/20 13:46

Great, thanks! (Also @jlocklin for the Linux section)

, 2021/06/03 18:26

, 2021/06/03 23:18

, 2021/06/04 08:06

, 2021/06/04 10:34