wiki

~~DISCUSSION~~ === Module 1: Good habits for data analysis (paths, backups, versioning, annotation) === Goals: * Set up a working MATLAB installation with appropriate path shortcuts * Use %%GitHub%% to acquire the analysis code we will use * Perform some elementary %%GitHub%% operations (pull, add, commit, push) and create a branch for your project * Create a well-designed folder structure for your projects (including this course) * Choose and implement a backup strategy for your project files * Understand the (pre)processing pipeline from raw data to promoted data set * Connect to the database, download a data set, and test your path setup Resources: * Must-read: [[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000424|Noble, A Quick Guide to Organizing Computational Biology Projects]] ([[http://www.ploscompbiol.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pcbi.1000424&representation=PDF|direct link to pdf]]) (yes, yours is one) * Optional: [[http://www.youtube.com/GitHubGuides|Introduction to version control with GitHub]] ([[http://git-scm.com/book/en/Getting-Started-Git-Basics|more detailed doc pages]], [[http://stackoverflow.com/questions/tagged/git|git tagged questions on StackOverflow]], [[https://github.s3.amazonaws.com/media/progit.en.pdf|Pro Git manual]], surprisingly readable) * Optional: MATLAB documentation on [[http://www.mathworks.com/help/matlab/matlab_env/understanding-file-locations-in-matlab.html|File Locations]] and [[http://www.mathworks.com/help/matlab/matlab_env/what-is-the-matlab-search-path.html#br8ch8o|Search Paths]] Step-by-step: === Installing MATLAB === If you are using a lab computer, it will have MATLAB installed. Verify that it can start successfully (you'll get the ''>>'' prompt in the Command Window). These modules assumes a basic working knowledge of MATLAB, corresponding roughly to the material in the [[http://www.mathworks.com/academia/student_center/tutorials/launchpad.html|"Interactive MATLAB tutorial"]]. If you are unsure, take a look at the table of contents. If there are things you don't recognize, work through the tutorial. If you prefer a different format, you can use Chapter 2 of the MATLAB for Neuroscientists book to get up to speed. If you are uncertain of your MATLAB abilities, two great ways to keep learning are: * [[http://www.mathworks.com/matlabcentral/about/cody/ | Cody]], a continually expanding set of problems with solutions to work through, with a satisfying points system to track your progress * [[http://stackoverflow.com/questions/tagged/matlab | MATLAB questions on StackOverflow]], a Q&A site where you can browse previous questions and add new ones === Setting up GitHub === %%GitHub%% is a system for "distributed version control": it keeps track of changes to a set of collaboratively edited files, such as pieces of MATLAB code. This system makes it easy to share improvements between collaborators. If you are new to %%GitHub%%, you can watch the video under Resources above to get an overall idea of how it works and why it is useful. If you don't already have a %%GitHub%% account, go to [[http://www.github.com|GitHub]] and sign up. E-mail me (%%MvdM%%) your account name, so I can give you access to the code repository. Meanwhile, download and install the Git client of your choice if you don't already have one installed. For Windows, I recommend [[http://windows.github.com/|GitHub Windows]] as a user-friendly way to get started. For installing Git and setting up GitHub on various operating systems, see [[https://help.github.com/articles/set-up-git|GitHub: Set Up Git]]; note that on the NS&B computers, you need to select "Run as administrator" when running the installer. Next, configure your client. For %%GitHub%% Windows, after starting up the %%GUI%% you'll first need to sign in with your account, then click Tools > Options. Set the "Default Storage Directory" to something reasonable: on vandermeerlab computers, this should be something on the D: drive (for example, %%D:\My_Documents\GitHub\%%). Also check that your username and e-mail address look ok (I am ''mvdm''). === Cloning the NS&B codebase === Once you have access to the code repository, we can create a local copy of the NS&B codebase. To do this, first open a shell. In Windows, you can do this from within the %%GitHub%% client by selecting "Tools and Options" in the top right and then "Open in Git shell", or by clicking the "Git shell" icon. Make sure that you are in the ''GitHub'' directory. Now, clone the ''nsb2014'' repository and ''cd'' to it: <code> git clone https://github.com/mvdm/nsb2014.git cd nsb2014 </code> Verify that this has created a ''nsb2014'' folder with various subfolders and files in it, indicating that you have a local copy of the codebase. Because Git is tracking the contents of this folder, it is now easy to "pull" the latest version from %%GitHub%%: <code> git pull </code> This "pull" should do nothing, because you already have the latest version. But the basic idea is that you can stay up-to-date easily as well as contribute to the codebase so that everyone else can benefit. As you might expect, that part is known as a "push", which we will do in the next step. === A first commit and push === Open the ''README.md'' file. The ''.md'' extension is for %%Markdown%%, a lightweight set of commands to format text (syntax reference is [[https://help.github.com/articles/markdown-basics | here]]). Add your name to the list and save the file. Then go to your git shell and type ''git status''. Git has noticed the change, but it says that this change is not yet "staged for commit". In other words, git is not tracking this file. Let's fix this: <code> git add README.md git commit -m "Added name to list in README file" </code> If you now do a ''git status'' you will see that you are ahead of the origin (the online repository) by 1 commit. This makes sense because you just made a change. Let's push this by doing ''git push''. If you get an "access denied" type error, email me (mvdm@uwaterloo.ca) your %%GitHub%% username and I will give you permission. If everything goes to plan you should now be able to see the updated README file [[https://github.com/mvdm/nsb2014 | on GitHub]]. A schematic of these basic operations (pull, commit, push) is shown below. <graphviz dot center> digraph G { remote -> local [label=" pull"]; local -> staging [label=" commit"]; staging -> remote [label=" push"]; } </graphviz> What happens if in between your pull and push someone else pushes a change? In that case you cannot push your changes unless you do a pull first and [[http://stackoverflow.com/questions/161813/fix-merge-conflicts-in-git | resolve any conflicts]]. === Create a branch for your project === To prevent a large number of conflicts from occurring as we all work on our projects, you should create a **branch**. Changes to a branch don't affect the original until a **merge** is done. Create a new branch as follows: <code> git checkout -b myprojectname </code> Now, create a subdirectory and empty README: <code> mkdir myprojectname cd myprojectname touch README.md git add README.md </code> Commit and push the empty project to GitHub: <code> git commit -m "Initialized the project" git push origin myprojectname </code> You should now be able to see your branch on [[https://github.com/mvdm/nsb2014 | GitHub]]. Note that you can also view and compare others' branches. Please take care to only work on your own branch for now. === Using GitHub to acquire the FieldTrip toolbox === Using your experience from the previous section, create a local clone of the %%FieldTrip%% toolbox at ''https://github.com/mvdm/fieldtrip''. Make sure that you ''cd'' to your ''GitHub'' folder, i.e. that you are not within some other project such as ''nsb2014'', before cloning. If things worked correctly you should have ''fieldtrip'' and ''nsb2014'' folders within your %%GitHub%% folder; **not** a ''fieldtrip'' folder within your ''nsb2014'' folder! We will use this toolbox extensively for the analysis of local field potentials. Be aware that it is about 1.2GB! Note: there is also an official %%FieldTrip%% %%GitHub%% repository. The one you are cloning is a **Fork** of it. I (MvdM) did this so that I could make a few changes to the %%FieldTrip%% code to make it play well with Neuralynx data. === Configuring MATLAB to use the code from GitHub === Open MATLAB and [[http://www.mathworks.com/help/matlab/matlab_env/create-matlab-shortcuts-to-rerun-commands.html | create a shortcut]] titled "NS&B 2014 - Hippocampus". The code for the shortcut should be <code matlab> restoredefaultpath; % start with a clean slate cd('D:\My_Documents\GitHub\FieldTrip'); % or whatever you chose, obviously p = genpath(pwd); % create list of all folders from here addpath(p); cd('D:\My_Documents\GitHub\nsb2014\code-matlab'); p = genpath(pwd); % create list of all folders from here addpath(p); </code> This ensures that whenever you click this button, you have a "clean" path of only the MATLAB default plus your local versions of the two %%GitHub%% repositories. Note: if you don't like the ''.git'' folders in your path, you can get clever with [[http://www.mathworks.com/help/matlab/matlab_prog/regular-expressions.html|regular expressions]] to remove these: <code> p = regexprep(p,'D.*?\.git.*?;',''); </code> === Establish a sensible folder structure === So far, you have local %%GitHub%% repository clones added to MATLAB's path. But as you work on your project, you will write your own analysis code. You will also have data files to work with; some that you download as part of these modules, and some that you will perhaps collect yourself. It is important to consider where all of these files will go, and how you will manage them. I recommend using three separate locations: * //%%GitHub%% folders//. Files in here you only change (or add) when you can improve what is already there. This content is backed up and version-controlled (i.e. you can see the complete history of changes and revert to any version you want) through the %%GitHub%% system. These files can be shared by multiple different projects, including working through these modules, analysis related to the data you collect, and perhaps a %%PhD%% project! For me, this folder is in ''D:\My_Documents\GitHub\''. * //Project folders//. Each project has a home folder which holds the code for that project. As explained in the Noble paper, it is a good idea to create a new folder for each day you work on the project. If you find you are copying certain functions or snippets of code from day to day, those should be moved to the ''shared'' folder. It is critical that the contents of this folder are backed up in case of computer failure. I use Dropbox for this, so an example project folder I have is ''D:\My_Documents\Dropbox\projects\vStrGammaProbe\''. * //Data folders//. Data, both raw and preprocessed, should live in a different place: ''D:\data\'' in my case. This is because different projects may access the same data, and because backup strategies for data are typically different than for code. With this trifold division, when you want to work on a project, you would click the appropriate MATLAB shortcut for it first. Following the example above, this should add the appropriate %%GitHub%% folders to the path. Next, the ''shared'' folder of the project is also added to the path. Data is generally not added to the path, because some data files in different folders may have the same name. Then, you create a new folder with today's date, and you are ready to go! There are several situations when it is appropriate to move code from your //project folder// to a //%%GitHub%% folder//: * you improve a piece of code that was already on %%GitHub%% * you have a new piece of code in the //shared// project folder that is proving useful * you reach a milestone, such as an analysis that tests a certain hypothesis Make sure you push these changes and additions into your own branch. If you think your additions would be useful or interesting to others, you can issue a [[https://help.github.com/articles/using-pull-requests | pull request]] to have your contribution incorporated into the ''master'' branch. === Grab a data session from the lab database === If you can access the NS&B share (usually ''Z:\''), the data can be found in the ''NSB_2014\4_MouseHippocampus\DataAnalysisTutorial\data'' folder. For this module you will only need the ''R016-2012-10-08'' folder, but feel free to just copy all the folders. A good place to put these is in ''D:\data\promoted\'' (Rxxx indicate different rats, followed by the date of each session). In general you want to keep your data separate from your code; for instance, multiple analysis projects may use the same data, so you don't want to duplicate it. If you cannot access the NS&B share, use a %%FTP%% client such as [[https://filezilla-project.org/|Filezilla]] to connect to the lab %%FTP%% server, ''mvdmlab-nas1'' (129.97.62.84). Configure your %%FTP%% client to require "explicit %%FTP%% over %%TLS%%" and use ''BIOL680'' as username and password. Correct FileZilla configuration is the following: {{ :analysis:course:ftp_config.png?600 |}} If you cannot log in to the server, send me your IP address and I will enable access for you. === Verify things are working === As explained in the Noble paper, create a folder with today's date in your project folder. Create a ''sandbox.m'' file in it, and use [[http://blogs.mathworks.com/videos/2011/07/26/starting-in-matlab-cell-mode-scripts/|Cell Mode]] to check that you can load a data file: <code matlab> %% load data cd('D:\Data\R016\R016-2012-10-08'); % replace this with where you saved the data cfg = []; cfg.fc = {'R016-2012-10-08-CSC02d.ncs'}; % cell array with filenames to load csc = LoadCSC(cfg); </code> When you execute the above cell (Ctrl+Enter when it is selected in the Editor), you should get: <code matlab> LoadCSC: Loading 1 files... LoadCSC: R016-2012-10-08-CSC02d.ncs 44/10761 bad blocks found (0.41%). >> csc csc = tvec: [5498360x1 double] data: [1x5498360 double] label: {'R016-2012-10-08-CSC02d.ncs'} cfg: [1x1 struct] >> csc.cfg ans = history: [1x1 struct] hdr: {[1x1 struct]} ExpKeys: [1x1 struct] SessionID: 'R016-2012-10-08' </code> What you have loaded is in fact a local field potential recorded from the rat ventral striatum. The different file types and data fields above will be explained in more detail in the next module. For now, let's just take a peek at the data: <code matlab> plot(csc.tvec,csc.data); xlim([1338.6 1339.2]); </code> You should see some interesting oscillations -- we will explore these in detail in upcoming modules. If you see this, you have successfully completed this module! {{ :analysis:nsb2014:verify.png?600 |}} === Folder and file naming === FIXME === For Mac/OS X users === If you are running MATLAB on OS X (and possibly Linux), the above ''sandbox.m'' code will probably fail. The following steps have worked for someone using %%OS%% X 10.8, with MATLAB R2013a: * Head over to the [[http://neuralynx.com/research_software/file_converters_and_utilities]|Neuralynx website]]. * Download the **//Neuralynx to MATLAB Import for Linux and Mac %%OS%% X//** package ([[http://neuralynx.com/software/Nlx2Mat_relDec11.tar|direct link]]). * Extract the archive you have downloaded into a folder, and add that folder to your path shortcut. * Navigate to the ''extracted folder/binaries/'', find the file ''Nlx2MatCSC_v3.mexmaci'', and rename it to ''Nlx2MatCSC.mexmaci'' (removing ''_v3'') * Again, make sure this folder is included in your path, and try running the ''sandbox.m'' again. * If you add neuralynx above nsb2014 in your path MATLAB should use the new neuralynx binaries. If not, you may need to delete ''nsb2014/util/neuralynx/'' for this to work. The ''sandbox.m'' should run properly now, and you should see the plot you're supposed to see. === For Linux users === Follow the instructions above for Mac/OS X users, except you may need to recompile the binaries (note that you will need C and C++ compilers installed. Install the ''build-essential'' package on Ubuntu): * You may want to just delete the existing binaries. * Edit ''compile.sh'' to set ''PLATFORM=64PC'' or ''PLATFORM=32PC'' depending on your architecture, and edit INCLMATLAB and BINMATLAB so that they point to the correct directories for your Matlab installation. If you don't remember, run ''locate mexsh'' in the shell and you should see the path. * You can rename all the files in the binary directory with the shell command: <code>> rename 's/_v3//' *</code> This worked on 64-bit Ubuntu with Matlab R2013b.

wiki

User Tools

Site Tools

Sidebar

Page Tools