POSTS
Workflow with Git
I’ve been toying with my Git workflow the past year at Continuum and have come up with a good workflow for handling semantically versioned software inside Git. This post is my attempt to catalog what I’m doing.
Here’s the TL;DR version:
master
is always releases that are tagged- Code gets merged back in to
develop
before master, all work happens in feature branches off ofdevelop
- Bug fixes are handled in branches created from tags and merged directly back
in to
master
, thenmaster
is merged todevelop
.
That’s the high level overview. Below is that information in more depth.
master
of code
The master
branch always contains the latest released code. At any time,
you can checkout that branch, build it, install it, and know that it was the
same code you would have gotten had you installed it via npm, pypi, or conda.
Merges into master
are always done with --no-ff
and --no-commit
. The
--no-ff
ensures a merge commit so you can revert the commit if you ever need
to. Using --no-commit
gives you a chance to adjust the version numbers in
the appropriate meta data files (conda recipe, setup.py
, package.json
, and
so on) to reflect the new version before committing. For most of my commit
releases, I’m simply removing the alpha
suffix from the version number.
There should only be one commit in the repository for any given version
number and every commit that’s in master
is considered to be released. Keep
in mind, that means you can’t use GitHub’s built-in Merge Pull Request
functionality for releases, but that’s ok by me. You have to go to the command
line to tag anyhow.
With the appropriate changes for versions, the next step is to create the
commit and then tag it as vX.Y.Z
immediately. From there, you build the
packages and upload them or kick off your deployment tools and the code with
the new version is distributed.
Managing Development with develop
Now you need to start working on the next feature release. All work happens in
the develop
branch and it should have a new version number. The first thing
you should do is merge master
in, then bump the version number to the next
minor release with a suffix of some sort. I use alpha
, but you can change
that as needed depending on your language / tools.
For example, I just released v0.8.0
of an internal tool for testing yesterday
(no, it’s not being used in production yet, thus the 0 major version).
Immediately after tagging the new version, I checked out develop
, merged
master
into it via a fast-forward merge, then bumped the version number to
v0.9.0alpha
. Now, every commit from that point forward will be the next
version with the alpha
suffix so I can immediately see that it was built from
the repository.
Managing Branches
Everything is developed in branches. New features, refactoring, code cleanup,
and so on happens off of the develop
branch, bug fixes happen in branches
created directly from the tagged release that the bug fix is being applied to.
Let’s deal with feature branches first, they’re more fun.
I’ve gotten into the happen of adding prefixes to my branch names. New
features have feature/
tacked on at the start, refactor/
is used whenever
the branch is solely based on refactoring code, and fix/
is used when I’m
fixing something. The prefixes provide a couple of benefits:
- They communicate the intent of the branch to other developers. Reviewing a new feature requires a slightly different mindset than reviewing a set of changes meant solely to refactor code.
- They help sort branches. With enough people working on a code base, we’ll
end up with a bunch of different types of changes in-flight at any given time.
Having prefixes lets me quickly sort what’s happening, where it’s happening,
and prioritize what I should be looking at. I generally don’t want any
fix/
branches sitting around for very long.
Some people like having the developer name in the branch as well to provide a namespace. I can understand this approach, but I think its wrong. First, Git is distributed, so if you truly need a namespace for your code to live where it doesn’t interact with other’s code, create a new repository (or fork if you’re on GitHub).
The second, and much more important, reason I don’t like using names in branches is that they promote code ownership. I’m all for taking ownership of the codebase and particularly your changes. It’s part of a being a professional: own up to the code you created and all its flaws. What I’m not for is fiefdoms in a codebase.
I worked at one company where I found a bug in the database interaction from the calendar module. I fixed the bug in MySQL, but didn’t have the know-how to fix the bug in the other databases. I talked to the engineering manager and was directed to the developer that owned the calendar. I explained the bug, my fix, and what I thought was needed for the other databases to work and they were to fix it. When I left the company six months later, my fix still wasn’t applied and none of the other databases had been fixed. All because the person who owned the calendar code didn’t bother to follow through.
Having a branch called tswicegood/fix/new-calendar-query
gives the impression
that I now own the new calendar fix. Removing the signature from that is a
small step toward increasing the team ownership of a code base and removing the
temptation to think of that feature as your own.
Managing Bugfixes
So what about bugs? You want the bug fix to originate as close to the originally release code as possible. To do this, create the branch directly from the tag, bump the version number, then work on your fix. For example, let’s say you need to find a bug in v1.2.0 that you need to fix.
$ git checkout -b v1.2.1-prep v1.2.0
... adjust version number to v1.2.1alpha, then commit
The -b v1.2.1-prep
tells Git to create a branch with that name, then check it
out. The v1.2.0
at the end tells Git to use that as the starting point for
the branch. The next commit adjusts the version number so anything you build
from this branch is going to be the alpha version of the bug fix. With that
bookkeeping out of the way, you’re ready to fix the code.
For projects that have a robust test suite (which unfortunately isn’t all of them, even mine), the very next commit should be a failing test case by itself. Even when you know the fix to make the test pass, you should create this commit so there’s a single point in the history that you and other developers can check out and run the tests to see the failure. The next commit then shows the actual code that makes the test pass again.
Once the fix has been tested and is ready for release it’s time to merge back
in to master
. You should do this with --no-ff
, and --no-commit
and
remove the alpha
suffix before committing just like making a feature release.
Once you’ve merged and tagged the code, you need to get develop
up-to-date
with the bug fix. Since master
and develop
have now diverged — remember,
develop
has at least one commit bumping the version number — you have to deal
with a merge conflict.
Hopefully, the merge conflict is limited to the version number. If that’s the
case, you can just tell git merge
to ignore those changes by with this
command:
$ git checkout develop
$ git merge -X ours master
The -X
command tells git merge
which strategy option to use when merging,
and using ours
tells it that the code in the branch you’re merging into wins.
You need to be careful with this, however. It means that any real conflicts
would be swallowed up. Hopefully you know the changes well enough to realize
if there’s a larger conflict, but if for some reason you don’t know, you can
always try this approach:
$ git merge master
… ensure that the only conflicts are around the version
… numbers and that the develop branch code should be used
$ git reset --hard ORIG_HEAD
$ git merge -X ours master
You’ll have to manage any merge conflicts manually (or use git mergetool
) if
the conflicts are larger than the version number change. If you do confirm
that you don’t need any of the conflicted changes, you can use git reset
--hard ORIG_HEAD
to reset the working tree back to its pre-merge state, then
the git merge -X ours master
to pull the changes in ignoring the conflicts
from master
.
On develop
versus master
I’ve gone back and forth on this. My preference is to release often.
Sometimes multiple times a day. In that case, master
is just a quick staging
ground. Create a branch, bump the version, write one feature, merge it, bump
the version number, rinse, then repeat.
There are a few problems with this approach. First, not every team or for that matter project can work that way. Sometimes the code needs more testing across multiple platforms or configurations. Sometime’s there’s an integration test suite that takes awhile to run. Sometimes releases need to be timed to coincide with scheduled downtime giving you time to implement a few features while waiting for your release window.
Second, it doesn’t scale. One branch that merges one feature is fine, but if
you have a team of developers working on a project you probably have multiple
things being worked on in parallel. Having them all branch off master
, all
bump their version number, and all coordinate for an octopus merge (or merge
and release separately) is a nightmare.
Having everyone branch and merge off of develop
provides a base that keeps in
sync with the rest of your code base. Your feature branch exists by itself,
and all it needs to do to stay in sync is occasionally merge develop
.
Compared to git-flow
This is very similar to the workflow called git-flow. There are a few differences.
If my memory serves, it used to call for branch names with the author’s name
in it (a re-reading of it now doesn’t show that though). That’s what remote
repositories are for, so I don’t want to use that.
Correction, nvie just confirmed that it’s never been there, so one of my biggest gripes with it wasn’t founded. Oops. :-/
Next, hot fixes or bug fixes in git-flow are merged to master
and develop
instead of only master
. I want the versions going through master
then back
out to develop
. To me, it’s a cleaner conceptual model.
Versions, a thing I’ve written about, are important. I want
develop
to be installable, but I don’t want it confused with any released
version. There should only be one commit, a tagged commit at that, in each
repository that can be built for any given version.
I don’t call out release branches in my description because my hope is that they aren’t necessary. Of course, if your project has a long QA cycle that’s independent of development or you’re trying to chase down a stray bug or two before a release, then a release branch is great, I just don’t make them required.
In Closing
The most important thing is to create some process to how code moves through
your repository, document it, and stick to it. Everyone always committing
directly to master
is not sustainable. It also makes it much harder to
revert changes if something makes it in by accident as you have to go find all
the relevant commits instead of reverting one merge commit.
Worst than a free-for-all in master
is the hybrid. Committing some of the
time directly to master
and other times to a feature branch means there’s no
pattern to how your code is used. What’s the threshold for creating a feature
branch? Is it based on how big the feature is, or how long it’s going to take?
Answering these questions distracts you and future contributors. Providing a
solid pattern of how contributions flow through your repository is an important
step in making your project more accessible to fellow contributors regardless
of whether those are in the open-source community or an office down the hall.
Some of the things outlined here might seem like a lot of overhead, but in the end they save you time. Most importantly, they’ll scale beyond just you.