POSTS

Workflow with Git

I’ve been toying with my Git workflow the past year at Continuum and have come up with a good workflow for handling semantically versioned software inside Git. This post is my attempt to catalog what I’m doing.

Here’s the TL;DR version:

  • master is always releases that are tagged
  • Code gets merged back in to develop before master, all work happens in feature branches off of develop
  • Bug fixes are handled in branches created from tags and merged directly back in to master, then master is merged to develop.

That’s the high level overview. Below is that information in more depth.

master of code

The master branch always contains the latest released code. At any time, you can checkout that branch, build it, install it, and know that it was the same code you would have gotten had you installed it via npm, pypi, or conda.

Merges into master are always done with --no-ff and --no-commit. The --no-ff ensures a merge commit so you can revert the commit if you ever need to. Using --no-commit gives you a chance to adjust the version numbers in the appropriate meta data files (conda recipe, setup.py, package.json, and so on) to reflect the new version before committing. For most of my commit releases, I’m simply removing the alpha suffix from the version number.

There should only be one commit in the repository for any given version number and every commit that’s in master is considered to be released. Keep in mind, that means you can’t use GitHub’s built-in Merge Pull Request functionality for releases, but that’s ok by me. You have to go to the command line to tag anyhow.

With the appropriate changes for versions, the next step is to create the commit and then tag it as vX.Y.Z immediately. From there, you build the packages and upload them or kick off your deployment tools and the code with the new version is distributed.

Managing Development with develop

Now you need to start working on the next feature release. All work happens in the develop branch and it should have a new version number. The first thing you should do is merge master in, then bump the version number to the next minor release with a suffix of some sort. I use alpha, but you can change that as needed depending on your language / tools.

For example, I just released v0.8.0 of an internal tool for testing yesterday (no, it’s not being used in production yet, thus the 0 major version). Immediately after tagging the new version, I checked out develop, merged master into it via a fast-forward merge, then bumped the version number to v0.9.0alpha. Now, every commit from that point forward will be the next version with the alpha suffix so I can immediately see that it was built from the repository.

Managing Branches

Everything is developed in branches. New features, refactoring, code cleanup, and so on happens off of the develop branch, bug fixes happen in branches created directly from the tagged release that the bug fix is being applied to. Let’s deal with feature branches first, they’re more fun.

I’ve gotten into the happen of adding prefixes to my branch names. New features have feature/ tacked on at the start, refactor/ is used whenever the branch is solely based on refactoring code, and fix/ is used when I’m fixing something. The prefixes provide a couple of benefits:

  • They communicate the intent of the branch to other developers. Reviewing a new feature requires a slightly different mindset than reviewing a set of changes meant solely to refactor code.
  • They help sort branches. With enough people working on a code base, we’ll end up with a bunch of different types of changes in-flight at any given time. Having prefixes lets me quickly sort what’s happening, where it’s happening, and prioritize what I should be looking at. I generally don’t want any fix/ branches sitting around for very long.

Some people like having the developer name in the branch as well to provide a namespace. I can understand this approach, but I think its wrong. First, Git is distributed, so if you truly need a namespace for your code to live where it doesn’t interact with other’s code, create a new repository (or fork if you’re on GitHub).

The second, and much more important, reason I don’t like using names in branches is that they promote code ownership. I’m all for taking ownership of the codebase and particularly your changes. It’s part of a being a professional: own up to the code you created and all its flaws. What I’m not for is fiefdoms in a codebase.

I worked at one company where I found a bug in the database interaction from the calendar module. I fixed the bug in MySQL, but didn’t have the know-how to fix the bug in the other databases. I talked to the engineering manager and was directed to the developer that owned the calendar. I explained the bug, my fix, and what I thought was needed for the other databases to work and they were to fix it. When I left the company six months later, my fix still wasn’t applied and none of the other databases had been fixed. All because the person who owned the calendar code didn’t bother to follow through.

Having a branch called tswicegood/fix/new-calendar-query gives the impression that I now own the new calendar fix. Removing the signature from that is a small step toward increasing the team ownership of a code base and removing the temptation to think of that feature as your own.

Managing Bugfixes

So what about bugs? You want the bug fix to originate as close to the originally release code as possible. To do this, create the branch directly from the tag, bump the version number, then work on your fix. For example, let’s say you need to find a bug in v1.2.0 that you need to fix.

$ git checkout -b v1.2.1-prep v1.2.0
... adjust version number to v1.2.1alpha, then commit 

The -b v1.2.1-prep tells Git to create a branch with that name, then check it out. The v1.2.0 at the end tells Git to use that as the starting point for the branch. The next commit adjusts the version number so anything you build from this branch is going to be the alpha version of the bug fix. With that bookkeeping out of the way, you’re ready to fix the code.

For projects that have a robust test suite (which unfortunately isn’t all of them, even mine), the very next commit should be a failing test case by itself. Even when you know the fix to make the test pass, you should create this commit so there’s a single point in the history that you and other developers can check out and run the tests to see the failure. The next commit then shows the actual code that makes the test pass again.

Once the fix has been tested and is ready for release it’s time to merge back in to master. You should do this with --no-ff, and --no-commit and remove the alpha suffix before committing just like making a feature release.

Once you’ve merged and tagged the code, you need to get develop up-to-date with the bug fix. Since master and develop have now diverged — remember, develop has at least one commit bumping the version number — you have to deal with a merge conflict.

Hopefully, the merge conflict is limited to the version number. If that’s the case, you can just tell git merge to ignore those changes by with this command:

$ git checkout develop
$ git merge -X ours master

The -X command tells git merge which strategy option to use when merging, and using ours tells it that the code in the branch you’re merging into wins. You need to be careful with this, however. It means that any real conflicts would be swallowed up. Hopefully you know the changes well enough to realize if there’s a larger conflict, but if for some reason you don’t know, you can always try this approach:

$ git merge master
… ensure that the only conflicts are around the version 
… numbers and that the develop branch code should be used
$ git reset --hard ORIG_HEAD
$ git merge -X ours master

You’ll have to manage any merge conflicts manually (or use git mergetool) if the conflicts are larger than the version number change. If you do confirm that you don’t need any of the conflicted changes, you can use git reset --hard ORIG_HEAD to reset the working tree back to its pre-merge state, then the git merge -X ours master to pull the changes in ignoring the conflicts from master.

On develop versus master

I’ve gone back and forth on this. My preference is to release often. Sometimes multiple times a day. In that case, master is just a quick staging ground. Create a branch, bump the version, write one feature, merge it, bump the version number, rinse, then repeat.

There are a few problems with this approach. First, not every team or for that matter project can work that way. Sometimes the code needs more testing across multiple platforms or configurations. Sometime’s there’s an integration test suite that takes awhile to run. Sometimes releases need to be timed to coincide with scheduled downtime giving you time to implement a few features while waiting for your release window.

Second, it doesn’t scale. One branch that merges one feature is fine, but if you have a team of developers working on a project you probably have multiple things being worked on in parallel. Having them all branch off master, all bump their version number, and all coordinate for an octopus merge (or merge and release separately) is a nightmare.

Having everyone branch and merge off of develop provides a base that keeps in sync with the rest of your code base. Your feature branch exists by itself, and all it needs to do to stay in sync is occasionally merge develop.

Compared to git-flow

This is very similar to the workflow called git-flow. There are a few differences.

If my memory serves, it used to call for branch names with the author’s name in it (a re-reading of it now doesn’t show that though). That’s what remote repositories are for, so I don’t want to use that.

Correction, nvie just confirmed that it’s never been there, so one of my biggest gripes with it wasn’t founded. Oops. :-/

Next, hot fixes or bug fixes in git-flow are merged to master and develop instead of only master. I want the versions going through master then back out to develop. To me, it’s a cleaner conceptual model.

Versions, a thing I’ve written about, are important. I want develop to be installable, but I don’t want it confused with any released version. There should only be one commit, a tagged commit at that, in each repository that can be built for any given version.

I don’t call out release branches in my description because my hope is that they aren’t necessary. Of course, if your project has a long QA cycle that’s independent of development or you’re trying to chase down a stray bug or two before a release, then a release branch is great, I just don’t make them required.

In Closing

The most important thing is to create some process to how code moves through your repository, document it, and stick to it. Everyone always committing directly to master is not sustainable. It also makes it much harder to revert changes if something makes it in by accident as you have to go find all the relevant commits instead of reverting one merge commit.

Worst than a free-for-all in master is the hybrid. Committing some of the time directly to master and other times to a feature branch means there’s no pattern to how your code is used. What’s the threshold for creating a feature branch? Is it based on how big the feature is, or how long it’s going to take? Answering these questions distracts you and future contributors. Providing a solid pattern of how contributions flow through your repository is an important step in making your project more accessible to fellow contributors regardless of whether those are in the open-source community or an office down the hall.

Some of the things outlined here might seem like a lot of overhead, but in the end they save you time. Most importantly, they’ll scale beyond just you.