If it’s not in source control, it doesn’t exist.
https://www.troyhunt.com/10-commandments-of-good-source-control/
Source: PhD Comics.
Statisticians, as opposed to closet mathematicians, rarely do things in vacuum.
We talk to scientists/clients about their data and questions.
We write code (a lot!) together with team members or coauthors.
We run code/program on different platforms.
We write manuscripts/reports with co-authors.
We distribute software so potential users have access to your methods.
In every project you have at least one other collaborator, future-you. You don’t want future-you to curse past-you.
A centralized repository helps coordinate multi-person projects.
Time machine. Keep track of all the changes and revert back easily (reproducible).
Storage efficiency.
Synchronize files across multiple computers and platforms.
GitHub is becoming a de facto central repository for open source development. E.g., all packages in Julia are distributed through GitHub; Hadley Wickham also recommends Git/GitHub as the best practices for R package development.
Advertise yourself through GitHub.
Open source: Git, Apache subversion (aka svn), cvs, mercurial.
Proprietary: Visual SourceSafe (VSS), etc.
Dropbox? Mostly for file backup and sharing, limited version control (1 month?).
We use Git in this course.
Currently the most popular version control system according to Google Trend.
Initially designed and developed by Linus Torvalds in 2005 for Linux kernel development. Git is the British English slang for unpleasant person.
I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’.
Linus Torvalds
Svn is a centralized version control system:
Git is a distributed version control system:
A Git server enabling multi-person collaboration through a centralized repository.
github.com: unlimited public repositories, private repositories costs $, but unlimited private repositories for free from Student Developer Pack.
bitbucket.org: unlimited public repositories, unlimited private repositories for academic account (register for free using your edu email).
We use github.com in this course for developing and submitting homework.
A Git client on your own machine.
Linux: shipped with many Linux distributions, e.g., Ubuntu. If not, install using a package manager, e.g., yum install git
on CentOS.
Mac: install by port install git
or other package managers.
Windows: GitHub Desktop (GUI), TortoiseGIT (is this good?).
Don’t totally rely on GUI or IDE. Learn to use Git on command line, which is needed for cluster and cloud computing.
Git-the simple guide by Roger Dudler. Korean version
Synchronize local Git directory with remote repository:
git pull
same as git fetch
plus git merge
.
Modify files in local working directory.
Add snapshots to staging area:
git add FILES
Commit: store snapshots permanently to (local) Git repository
git commit -m "MESSAGE"
Push commits to remote repository:
git push
Register for an account on a Git server, e.g., github.com.
Upload your SSH public key to the server.
Identify yourself at local machine, e.g.,
git config --global user.name "Your Name"
git config --global user.email "your_email@ucla.edu"
Name and email appear in each commit you make.
Initialize a project.
Create a repository, e.g., snustat-326-621a-fall
on the server.
Clone the repository to your local machine:
git clone git@github.com:snustat-326-621a-fall.git
Working with your local copy.
git pull
: update local Git repository with remote repository (fetch + merge).
git log FILENAME
: display the current status of working directory.
git diff
: show differences (by default difference from the most recent commit).
git add file1 file2 ...
: add file(s) to the staging area.
git commit
: commit changes in staging area to Git directory.
git push
: publish commits in local Git repository to remote repository.
git reset --soft HEAD~1
: undo the last commit.
git checkout FILENAME
: go back to the last commit, discarding all changes made.
git rm FILENAME
: remove files from git control.
For this course, you need to have two branches:
develop
for your own development.
master
for releases, i.e., homework submission.
Note master
is the default branch when you initialize the project; create and switch to develop
branch immediately after project initialization.
Commonly used commands:
git branch branchname
: create a branch.
git branch
: show all project branches.
git checkout branchname
: switch to a branch.
git tag
: show tags (major landmarks).
git tag tagname
: create a tag.
A link to join the 326.621A Github Classroom and a link to create an individual Github repository for homework is provided in the eTL. First join the classroom, and then create your own homework repo by accepting these two invitations in turn.
For each homework, the teaching assistant will make a pull request. Merge each pull request to your homework repo.
Maintain two branches master
and develop
. The develop
branch will be your main playground, the place where you develop solution (code) to homework problems and write up report. The master
branch will be your presentation area. Submit your homework files (R markdown file Rmd
, html
file converted from R markdown, all code and data sets to reproduce results) in master
branch.
Before each homework’s due date, commit your master branch. The teaching assistant and the instructor will check out your committed master branch for grading. Commit time will be used as your submission time. That means if you commit your Homework 1 submission after the deadline, penalty points will be deducted for late submission according to the syllabus.
On your local machine:
Clone the repository, create develop
branch, where your work on solutions.
# clone the project
git clone git@github.com:snustat-326-621a-fall.git
# enter project folder
cd snustat-326-621a-fall
# what branches are there?
git branch
# create develop branch
git branch develop
# switch to the develop branch
git checkout develop
# create folder for HW1
mkdir hw1
cd hw1
# let's write solutions
echo "sample solution" > hw1.Rmd
echo "some bug" >> hw1.Rmd
# commit the code
git add hw1.Rmd
git commit -m "start working on problem #1"
# push to remote repo
git push
Submit and tag HW1 solution to master branch.
# which branch are we in?
git branch
# change to the master branch
git checkout master
# merge develop branch to master branch
# git pull origin develop
git merge develop
# push to the remote master branch
git push
# tag version hw1
git tag hw1
git push --tags
RStudio has good Git integration. But practice command line operations also.
Be judicious what to put in repository.
Not too less: Make sure collaborators or yourself can reproduce everything on other machines.
Not too much: No need to put all intermediate files in repository. Make good use of the .gitignore
file.
Strictly version control system is for source files only, e.g. xxx.Rmd
, xxx.bib
, and figure files are necessary to produce a pdf file. Pdf file doesn’t need to be version controlled or, if version controlled, doesn’t need to be frequently committed.
Commit early, commit often and don’t spare the horses.
Adding an informative message when you commit is not optional. Spending one minute on commit message saves hours later for your collaborators and “future-you”. Read the following mantra to yourself 3 times:
Write every commit message like the next person who reads it is an axe-wielding maniac who knows where you live.