wiki

~~DISCUSSION~~ === Data analysis in Neuroscience, Week 1: Good habits for data analysis (paths, backups, versioning, annotation) === Goals: * Set up a working MATLAB installation with appropriate path shortcuts * Use %%GitHub%% to acquire the lab codebase * Create a well-designed folder structure for your projects (including this course) * Choose and implement a backup strategy for your project files * Understand the flow of raw data, from acquisition to analysis * Connect to the database, download a data set, and test your path setup Resources: * Must-read: [[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000424|Noble, A Quick Guide to Organizing Computational Biology Projects]] ([[http://www.ploscompbiol.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pcbi.1000424&representation=PDF|direct link to pdf]]) (yes, yours is one) * Optional: [[http://www.youtube.com/GitHubGuides|Introduction to version control with GitHub]] ([[http://git-scm.com/book/en/Getting-Started-Git-Basics|more detailed doc pages]], [[http://stackoverflow.com/questions/tagged/git|git tagged questions on StackOverflow]], [[https://github.s3.amazonaws.com/media/progit.en.pdf|Pro Git manual]], surprisingly readable) * Optional: MATLAB documentation on [[http://www.mathworks.com/help/matlab/matlab_env/understanding-file-locations-in-matlab.html|File Locations]] and [[http://www.mathworks.com/help/matlab/matlab_env/what-is-the-matlab-search-path.html#br8ch8o|Search Paths]] Step-by-step: === Installing MATLAB === If you are using a lab computer, it will have MATLAB installed. Verify that it can start successfully (you'll get the ''>>'' prompt in the Command Window). If you are using your own computer, you can download MATLAB from the mathworks.com website. For it to work, you need the username to be 'mvdmlab' and the network.lic license file available from the lab Dropbox. If you want to use MATLAB off-campus, you also may need to connect to the campus network using a VPN (See the uWaterloo VPN page on how to do this). It may also be possible to acquire your own working copy of MATLAB that does not have these restrictions. This course assumes a basic working knowledge of MATLAB, corresponding roughly to the material in the [[http://www.mathworks.com/academia/student_center/tutorials/launchpad.html|"Getting Started with MATLAB" Primer]]. If you are unsure, take a look at the table of contents. If there are things you don't recognize, use the Primer itself, or Chapter 2 of the MATLAB for Neuroscientists book to get up to speed. === Setting up GitHub === %%GitHub%% is a system for "distributed version control": it keeps track of changes to a set of collaboratively edited files, such as pieces of MATLAB code. This is done so that it is easy to share improvements between collaborators. If you are new to %%GitHub%%, watch the video under Resources above. If you don't already have a %%GitHub%% account, go to [[http://www.github.com|GitHub]] and sign up. Download and install the GUI client of your choice; for Windows, I recommend [[http://windows.github.com/|GitHub Windows]] to get started. For installing Git and setting up GitHub on various operating systems, see [[https://help.github.com/articles/set-up-git|GitHub: Set Up Git]] Configure your client. For %%GitHub%% Windows, you'll first need to sign in with your account, then click Tools > Options. Set the "Default Storage Directory" to something reasonable: on lab computers, this should be something on the D: drive (for example, %%D:\My_Documents\GitHub\%%). Also check that your username and e-mail address look ok (I am ''mvdm''). Next, on the %%GitHub%% website, search for the repository called ''BIOL680'' and ''Fork'' it. Notice that this creates your own personal copy of the original repository. Go back to your local client and hit Refresh. You should now see your forked repo appear, along with the option to ''Clone'' it. Do this. Verify that your local filesystem now contains a ''BIOL680'' folder in the location you specified in your %%GitHub%% client configuration: this is the local version of your forked repo, which you can now make changes to. Open the ''readme.md'' file in the MATLAB editor (if you don't have one, type ''edit'' at the MATLAB command prompt) and add a line to it with your name and date. Save your updated file (notice the ''*'' in the MATLAB editor window that indicates unsaved changes). In your local %%GitHub%% client, hit Refresh (making sure you are looking at the ''BIOL680'' repo of course). It should indicate that there is now a file to be committed, and highlight the line you added. Write a short commit message, and hit Commit. Notice you now have "unsynced commits". To upload (push) your change to %%GitHub%%, hit ''sync''. On the %%GitHub%% site for this repo, verify that your change appears there. Important: the change you just made is limited to *your fork* only. The original repo that you forked is *not* automatically updated with your change! To remedy this, select "Pull Requests" on your repo site and choose "New Pull Request". The resulting page will show the diff of the file in your repo against the same file in the original repo, so it should highlight the change you made. Select "Click to create a Pull Request...", add a title, and click "Send Pull Request". Now, the owner of the original repo (in this case, ''mvdm'') will need to approve the request before the change is merged into the original repo. You will get a confirmation message by e-mail when this is done. This process works the same way for changes in the other direction. If the original repo is updated, your fork does not automatically update. If it did, it could have some nasty consequences: someone could break your code without you knowing about it! You are in control of your own fork and need to check for, and approve, any changes you want. A good way of doing this is to Watch the original repo using the button on the %%GitHub%% site. Note: if you are confused or curious about %%GitHub%%, or distributed version control in general, a great way to get answers beyond reading the documentation and doing the tutorials is to go for drinks with the folks in the Eliasmith lab! === Using GitHub to acquire the lab codebase === Using your experience from the previous section, fork and create a local clone of the ''vandermeerlab'' repo. Verify that it exists in your local filesystem before proceeding. === Configuring MATLAB to use the lab codebase === See [[computing:matlabsetup|Setting up MATLAB]]. Things should be set up so that you have a shortcut that does the following: * Restore the default path * Add all folders from the vandermeerlab repo * ''cd'' to your BIOL680 folder Check that the shortcut works before proceeding. === Grab a data session from the lab database === Use a FTP client such as [[https://filezilla-project.org/|Filezilla]] or ''WinSCP'' to connect to the lab FTP server, ''mvdmlab-nas1'' (129.97.62.84). Configure your FTP client to require "explicit FTP over TLS" and use ''BIOL680'' as username and password. In the ''BIOL680'' folder, download the folder ''R016-2012-10-08''. A good place to put this folder is in ''D:\data\promoted\R016\''. (In general you want to keep your data separate from your code; for instance, multiple analysis projects may use the same data, so you don't want to duplicate it.) Correct FileZilla configuration is the following: {{ :analysis:course:ftp_config.png?600 |}} You will have to be on campus to connect. If you still cannot log in to the server, send me your IP address and I will temporarily enable access for you. IF it still does not work, get the .zip {{:analysis:course:r016-2012-10-08.zip|here}}. === Verify things are working === As explained in the Noble paper, create a folder with today's date; I do this within a folder called ''daily'' so to keep things manageable. Create a ''sandbox.m'' file in it and use [[http://blogs.mathworks.com/videos/2011/07/26/starting-in-matlab-cell-mode-scripts/|Cell Mode]] to check that you can load a data file from the data folder you grabbed (because the loader function is in your path): <code matlab> %% load data % first, cd to where the data you just grabbed is located [csc,csc_info] = LoadCSC('R016-2012-10-08-CSC02d.ncs'); tvec = Range(csc); raw_LFP = Data(csc); %% plot nSamples = 10000; plot(tvec(1:nSamples),raw_LFP(1:nSamples)); </code> You should replace the comment above with a ''cd'' command to change directory to where your data is located. Do not place the data in your code folder! If you get no errors and see a nice neural signal, save your ''sandbox.m'' script. Commit and sync to your %%GitHub%% fork. If you do get errors, verify that your path is set up correctly (you can type ''path'' to get a listing; it should have the various vandermeerlab folders in it. If not, go back to the Setting up MATLAB steps.) === For Mac/OS X users === If you are running Matlab on OS X (and possibly Linux), the above ''sandbox.m'' code will probably fail. This is most likely because the ''vandermeerlab'' codebase downloaded from GitHub calls on low-level Windows functions. The following steps have worked for someone using OS X 10.8, with Matlab R2013a: * Head over to the [[http://neuralynx.com/research_software/file_converters_and_utilities]|Neuralynx website]]. * Download the **//Neuralynx to Matlab Import for Linux and Mac OS X//** package ([[http://neuralynx.com/software/Nlx2Mat_relDec11.tar|direct link]]). * Extract the archive you have downloaded into a folder, and add that folder to your path shortcut (See [[computing:matlabsetup|Setting up MATLAB]]) * Navigate to the ''extracted folder/binaries/'', find the file ''Nlx2MatCSC_v3.mexmaci'', and rename it to ''Nlx2MatCSC.mexmaci'' (removing ''_v3'') * Again, make sure this folder is included in your path, and try running the ''sandbox.m'' again. * If you add neuralynx above vandermeerlab in your path Matlab should use the new neuralynx binaries. If not, you may need to delete ''vandermeerlab/util/neuralynx/'' for this to work. The ''sandbox.m'' should run properly now, and you should see the plot you're supposed to see. === For Linux users === Follow the instructions above for Mac/OS X users, except you may need to recompile the binaries (note that you will need C and C++ compilers installed. Install the ''build-essential'' package on Ubuntu): * You may want to just delete the existing binaries. * Edit ''compile.sh'' to set ''PLATFORM=64PC'' or ''PLATFORM=32PC'' depending on your architecture, and edit INCLMATLAB and BINMATLAB so that they point to the correct directories for your Matlab installation. If you don't remember, run ''locate mexsh'' in the shell and you should see the path. * You can rename all the files in the binary directory with the shell command: <code>> rename 's/_v3//' *</code> This worked on 64 bit Ubuntu with Matlab R2013b. === Read up on the data preprocessing pipeline === [[computing:datapromotion|Data promotion]] === Be backup-aware === If you are using a lab computer, only put data and code on the ''D:\'' drive. This actually has two underlying hard drives (a [[http://en.wikipedia.org/wiki/RAID#RAID_1|RAID 1 array]] in "mirroring" mode) such that if one fails, your data is still available. However, this does not protect against accidentally deleting data, overwriting a key file, any sort of data corruption or damage, //et cetera//. Some options to minimize the impact of those: * Save your code (and other work that does not take up huge amounts of space) on Dropbox, Google Drive, or similar service that keeps a (limited) revision history * Schedule periodic back-ups to another location * Periodically commit your work to %%GitHub%% === That's it! === You're set! Just make sure that your ''sandbox.m'' file appears in your repo (not the original that you forked), online on %%GitHub%%. I will look at this file to verify that you got to this point OK.

wiki

User Tools

Site Tools

Sidebar

Page Tools