Manage Your File Versioning Like a Programmer With Git

If you've ever worked with colleagues on a document, you know this pain: someone takes the first pass (document.doc), then emails it to everyone. The next person does the same, as does the third, each tacking a revision (document_rev3.doc) to the file's name. The manager likes what she sees and tags it as complete (document_final.doc)... until last-minute changes come along (document_final_Aaron's changes_reviewed_by_Bob_now_really_final.doc).

Now imagine changes like this in dozens of source code files. Programmers created version control systems (VCS) to solve this problem. They're a bit technical in nature, but can be useful for power users as well. Let's look at the basics of version control using the top system today, Git.

Overview of Version Control in Use

The primary purpose of VCSs is to capture versions (also called revisions) of a file over time. This means not relying on crude methods like the one described above. In these systems, when you access "document.doc" you're working on the most recent version by default. But you can also go back into the file's history and retrieve past versions. Systems built specifically to manage file versions began in software development, but have made their way into more "mainstream" applications as well.

The Evolution of Version Control in Software Development

Version control applications originally came about as a way to prevent programmers from breaking each others' code. The first systems simply allowed users to lock files from other edits, while second-generation systems (such as Subversion) added full-project checkout from a central repository.

While these second-gen systems are still widely used, the industry is migrating towards third generation systems. The main characteristic of these systems is that they are distributed, meaning there is no central repository. Rather, each user has their own full copy of the repository (including all past revisions). But any changes (even full revisions) are "local" until he "pushes" them to another repository. That user can then pull a clone in order to make changes and repeat the process.

Current Version Control for Non-Programmers

The ability to capture multiple versions of the same thing is also a common feature for average users too. With access to cheaper storage, both on the local machine and via easy-to-use cloud services, users now have the ability to track the history of their files. General-purpose programs do this in a number of ways.

Operating systems can capture a history of files or folders. This can happen automatically every time a file is saved, or on a regular schedule. Some desktop applications also let you capture versions or snapshots of files. For example, LibreOffice provides the "Save Version" feature (as shown in the top image below) that allows you to save multiple versions of the document in one file. Lastly, web/cloud applications such as Google Drive, ownCloud, etc. will also keep past iterations of a file or document (as shown for Google Docs in the bottom image below), either behind the scenes or at your express command.

version control system google docs versions

Why Use Programming-Centric Version Control Then?

Since all these options at the OS or application level already exist, why bother with these nerdy programmer tools? There are a couple of drawbacks to the above methods, including:

Some operating systems will save each and every tweak you make to a file in a new version. This will not only take up unnecessary storage, but also make it difficult to identify a particular restore point for a file.
These solutions may also use a file as the basic unit to manage. But version control systems typically work on the directory level (including sub-directories). A VCS makes it easy to see precisely what file changed and when.
These same solutions also don't give you the ability to label the iterations in most cases. This likewise makes it hard to find the file with the passage of text you overwrote at some point before you realized how brilliant it was.
Versioning by operating systems and applications such as Word also introduce single points of failure. For Word it's the file itself (if it gets corrupted, say goodbye to all your past copies as well). For the operating system it's the hard drive, unless you're a responsible user and back up regularly.
The nature of web-based services means that your older revisions might be out there in the cloud. For example, if you try to view the history of a Dropbox file on Linux (as shown in the image below), you're redirected to the website.

If any or all of these issues make you want something more, check out the next section where we solve them with Git.

Version Control With Git

In the following sections we'll go through each of these steps for a very important project: this very article. We'll do so using the command line version of Git. Knowing the terminal commands will make it easy enough to find their GUI equivalents, and it'll guarantee you can use it on every platform.

1. Setting Up the Git Repository

If your machine doesn't have git installed (Macs do, as do some Linux distributions), you can either grab the Windows installer from the project's download page or install on Linux with a command (specific to your distro) such as:

        sudo apt install git

The next thing we need to do is set up a git repository, which is just a folder holding your project. To take advantage of Git versioning, you need to initialize the folder as a repository. You can do this at a command line with the following command:

        git init

You may not see anything once this completes, but enable the "show hidden files" option in your file manager, and you'll see there's a new .git folder (as shown in the above image). This is where git keeps all its info so it's out of your way.

2. Add and Commit Your First File

The next step is to create some content (files) in your project. These can be any type of file, as most programming projects are made up of code (text) and assets like graphics (binary). Once created, do the following at the command line while still in your project directory:

        git status

The first step to a commit is to "stage" the new or updated items. See how the above output lets you know there are "unstaged" changes? You can stage everything in your directory (recursively, i.e. to include sub-folders and their files) with this command:

        git -a .

The "-a" flag is for "add," and the period refers to the current directory. It's basically saying "add all files to the my commit." Now to actually make the commit, type the following:

        git commit -m "WHOOP, my first commit!"

Now when you check the status again, there should be no waiting changes. To see the commit itself, check the git log, using graph mode, with listing cut to one line, and decorated for readability:

        git log --all --decorate --oneline --graph

This will show you a nice timeline of your commit, with the most recent on top.

3. Create a Branch to Experiment

As you're working, you may want to go off in a particular direction, not knowing if it's going to work out. For this you'll want to create a branch with the following command:

        git branch experiment1
git checkout experiment1

The second command switches you to the "experiment1" branch. You can switch back by replacing it with "master." As you start working on your text, note the difference between the same file from each branch. First, the "experiment1" branch with your new text:

Compare this to the original:

Now after working for a while (ending with a new commit), the best you could come up with is "Lorem ipsum dolor sit amet...?" What is that? It's nonsense, and should be stricken from your project immediately. You can trash your branch with the following command ("-D" for force delete):

        git branch -D experiment1

Now start another branch and add something intelligent, along with some images:

        git branch experiment2
git checkout experiment2 git add . git commit -m "More text, added images"

4. Merge Your Changes

Finally, once you're happy with the changes in your current branch the following command will merge them with the "master" branch. Provided you haven't been doing a lot of hopping back and forth between the two, this will result in all your new changes being applied, and a new revision created. The following at the command-line will combine them for you:

        git merge experiment2

Now you'll have a revision that combines the latest in "master" with the latest in "experiment2." At this point you can get rid of the experiment (it was successful after all) with the following:

        git branch -d experiment2

Note that you'll only lose any of the incremental changes you made within that branch if you do so.

5. Push to a Safe Place

Lastly, Git is a distributed system, which means you can have multiple copies of your repository and sync them up. For example, create a new repository on a server you can SSH into with this command:

        git init --bare

The "--bare" flag sets it up as a sort of read-only repository, in that you can't modify the files directly. Then you can set that as a remote copy of your local repository with the following (the first command ensures new local branches are created remotely):

        git config push.default matching
git remote add central ssh://[username]@[URL]/path/to/repository git push --set-upstream central master

Set Up for Versioning and Backup With Git

With the above setup, you can follow a simple process to keep all the work you do on a project not only versioned, but backed up as well.

Make changes to project files.
Issue "git add ." to add all changes to a commit.
Issue "git commit -m [some message]" to commit the changes to your local repository. Repeat from the start.
Issue "git push" to commit the changes to a remote repository at regular intervals.
Issue "git clone ssh://[username]@[URL]/path/to/repository" from some other machine to

Do the benefits of version control systems something that interests you? Or are the non-development backup applications enough for your needs? Let us know your thoughts in the comments below!