Movies and television portray programming as a frantic activity. A bomb’s timer counts down, and the nerdy sidekick who was a burden up until that point works feverishly on a laptop. Code flies across the screen until she hits the final return key with a satisfying clack! The timer stops, and all is right in the world again. But programming is nothing like that. It includes design documents, UI mock-ups, and code reviews. And more than anything it involves trial-and-error, especially if you’re a beginner (like me).
Now, you may already be convinced of the value of keeping your coding projects in source control. Committing all this trial and error to your repository will make for a large and untidy project, with many revisions. Most of the commits you make will contain stuff that’s broken, so why save it? The git branch feature can help keep all this messy code away from the stuff you know works.
In this article we’ll look at what branching your code means, how to do it, and ways to manage updates to the “main” branch.
Creating a Git Branch
We’ve learned from our introduction to source control that you can create a new repository with the following command in a terminal:
When you do this, it creates a hidden directory in your current path, called “.git.” If you can’t see it, consult some of our prior articles on viewing hidden files. This directory contains all the information needed to keep the revision history of your files. Once you set your file manager to show hidden files, you can open the .git folder and see a number of sub-directories. One of these is called “refs” (shown in the below image), and its “heads” sub-directory contains listings for all the branches in your project. At the outset you’ll have just one, dubbed “master.”
The refs directory keeps a record of the branch, but not the branch itself. At least, not the branch you’re currently using. Those files are kept in the working directory (“~/Temp/post-programming-git-branch” in the above example) so you can access them in a normal way. Doing exactly that, we create our first file (named “first-file.txt”) in the working directory, i.e. outside the .git directory. Check this in using the trusty commands from the prior intro to git article:
We can see now the working directory has two files, the one we committed, and one created automatically by git (it contains the commit message we just entered).
Now, let’s create a new git branch called “testing”:
git branch testing
Checking Out The Branch, and Working in It
We can issue the above command without any names in order to get a list of all branches, as well as which one we’re currently using (i.e. which is “checked out”):
Now, if we examine the directory structure, we’ll see “.git/refs/heads” now has two files, one each for “master” and “testing.”
Each of these is a list of the commits that comprise the branch. For example, if we examine the “master” branch with the “git log” command, we can see a line for each of the commits made to that branch.
The process of “checking out” a branch means that changes you make to files will be part of that branch, but not other branches. For example, suppose we check out the testing git branch with the following command:
git checkout testing
Then we add a line of text to first-file.txt, of course committing it afterwards. If you switch back to the master branch, you’ll find the file is still blank (the cat command displays a file’s contents in the terminal):
But returning to the “testing” branch, the file still has the added text:
But all we ever see in the directory proper is the single file “first-file.txt.” Where then are these two alternate versions of the same file? The .git directory accounts for these changes, but not in the way you might expect.
How Git Stores Things
If you were to examine a repository for other version control systems such as Subversion, you’d find they maintain one copy per revision for each individual file. This means if you have a repository of one file, then create two branches, the repo contains two different copies of that file. In fact, you actually use “svn copy” as the command to create a branch in Subversion! On the other hand, git operates on the concept of “changes.”
The output of the above tells us the entire contents of “trunk” (SVN’s version of “master”) has been copied. A look in a file manager confirms this:
The “.git/objects” directory is what holds all these little changes, and each one is tracked with an ID. For example, in the below image, you’ll see files with long ID-style names. Each lives in a folder named with two hexadecimal characters (8c and 8d below).
Together, these folder and file names make up the ID for a particular change (actually a SHA1 hash). You can explore their contents using the git cat-file command. Based on their modified dates, we can see the one beginning “8d” came first, and it shows nothing. While the one starting “8c” contains the line of text we added to the file in the “testing” branch.
The takeaway is that git branches (including the default “master”) aren’t separate “folders” containing copies of files. Rather, they’re lists of changes made to files over time. This is more efficient storage-wise, but the result is the same. Anything you do (break?) in a git branch stays in there until you merge it.
Strategies for Merging Back to the Main Branch (To Delete, or Not Delete?)
Suppose you had an existing project with a number of files, which you then branched into “testing,” and did some work on it. Now you want to get those changes back into the “master” branch. You can do with the following from the “master” branch:
git merge testing
Provided there aren’t conflicts, the change consisting of the new text will be applied to “master.” The result is “first-file.txt” will be identical in both branches. Yet there were never really alternate versions of the file.
Now comes the question, what to do with the branch? There are two common strategies for dealing with branches.
- One is to keep a git branch like “testing” to play around in (shown above). You try some things out and, if you can make them work merge the changes back into “master.” Then you can end work on that branch, or even get rid of it entirely. This means your “master” is the most stable version of the code, because it should only contain things you know work. You can also think of this approach as using “feature branches.” This is because their purpose is to refine one (or several) specific additions to the code.
- A different approach is to do all your main work in the “master” branch, making commits along the way. Once you’re happy with the functionality, you create a branch within it which you’ll do bug-fixing and the like. This results in more of a “release branch” (shown below), as they end with the release. It also means your “master” branch is typically the most unstable version of your code.
Use Whichever Method Is Best For You
The first approach here is more common when working with git. Its other features (specifically tags) make it easy to mark a particular snapshot of the code as “stable.” It also lends itself better to larger, collaborative projects. But when working on your own personal projects, feel free to use whichever of these makes the most sense to you. The great thing about using git is that you capture all your changes, so you can always go back and find something. And organizing your project with git branches makes that a little easier to do.
How do you approach branches in your projects? Do you use them to experiment, then trash them afterwards? Or do they represent a detailed history of your work on the project?
Let us know below in the comments if you have any clever ways of dealing with branches in your git projects!