About a year and a half ago I started looking for my next thing in a post-Tribune world and my first email was to my friend Peter Wang. A few months after we closed down Quickie Pickie talking about the future of Continuum Analytics and data science I joined as the Web and UX Architect. During my time there I’ve had the opportunity to contribute to almost every product with a UI that the company ships. Tools like Conda and Bokeh are changing the way people deal with packaging and visualization. Under Peter and Travis’ leadership I’m sure the brain trust that is assembled at Continuum will continue to redefine the space, but an opportunity has come up that I can’t pass up.
I was once asked in an interview to give advice to people starting in data journalism. I said, “become an expert, then start over.” I’m taking my own advice. I’m not starting over completely, but I am stepping out of my comfort zone. Starting the end of June I’m leaving the world of programming and design to become the Campus Director of The Iron Yard in Austin.
The team at TIY is full of some great people (including my good friend SamKap) and is doing something really important, providing an alternate route for becoming a professional programmer or designer. To say I’m stoked is an understatement. I’m sure I’ll have plenty to say over the coming months, but for now I’ll leave it with, see ya in Austin in a couple weeks!
I’ve been toying with my Git workflow the past year at Continuum and have
come up with a good workflow for handling semantically versioned software
inside Git. This post is my attempt to catalog what I’m doing.
Here’s the TL;DR version:
master is always releases that are tagged
Code gets merged back in to develop before master, all work happens in
feature branches off of develop
Bug fixes are handled in branches created from tags and merged directly back
in to master, then master is merged to develop.
That’s the high level overview. Below is that information in more depth.
master of code
The master branch always contains the latest released code. At any time,
you can checkout that branch, build it, install it, and know that it was the
same code you would have gotten had you installed it via npm, pypi, or conda.
Merges into master are always done with --no-ff and --no-commit. The
--no-ff ensures a merge commit so you can revert the commit if you ever need
to. Using --no-commit gives you a chance to adjust the version numbers in
the appropriate meta data files (conda recipe, setup.py, package.json, and
so on) to reflect the new version before committing. For most of my commit
releases, I’m simply removing the alpha suffix from the version number.
There should only be one commit in the repository for any given version
number and every commit that’s in master is considered to be released. Keep
in mind, that means you can’t use GitHub’s built-in Merge Pull Request
functionality for releases, but that’s ok by me. You have to go to the command
line to tag anyhow.
With the appropriate changes for versions, the next step is to create the
commit and then tag it as vX.Y.Z immediately. From there, you build the
packages and upload them or kick off your deployment tools and the code with
the new version is distributed.
Managing Development with develop
Now you need to start working on the next feature release. All work happens in
the develop branch and it should have a new version number. The first thing
you should do is merge master in, then bump the version number to the next
minor release with a suffix of some sort. I use alpha, but you can change
that as needed depending on your language / tools.
For example, I just released v0.8.0 of an internal tool for testing yesterday
(no, it’s not being used in production yet, thus the 0 major version).
Immediately after tagging the new version, I checked out develop, merged
master into it via a fast-forward merge, then bumped the version number to
v0.9.0alpha. Now, every commit from that point forward will be the next
version with the alpha suffix so I can immediately see that it was built from
Everything is developed in branches. New features, refactoring, code cleanup,
and so on happens off of the develop branch, bug fixes happen in branches
created directly from the tagged release that the bug fix is being applied to.
Let’s deal with feature branches first, they’re more fun.
I’ve gotten into the happen of adding prefixes to my branch names. New
features have feature/ tacked on at the start, refactor/ is used whenever
the branch is solely based on refactoring code, and fix/ is used when I’m
fixing something. The prefixes provide a couple of benefits:
They communicate the intent of the branch to other developers. Reviewing a
new feature requires a slightly different mindset than reviewing a set of
changes meant solely to refactor code.
They help sort branches. With enough people working on a code base, we’ll
end up with a bunch of different types of changes in-flight at any given time.
Having prefixes lets me quickly sort what’s happening, where it’s happening,
and prioritize what I should be looking at. I generally don’t want any fix/
branches sitting around for very long.
Some people like having the developer name in the branch as well to provide a
namespace. I can understand this approach, but I think its wrong. First, Git
is distributed, so if you truly need a namespace for your code to live where it
doesn’t interact with other’s code, create a new repository (or fork if you’re
The second, and much more important, reason I don’t like using names in
branches is that they promote code ownership. I’m all for taking ownership of
the codebase and particularly your changes. It’s part of a being a
professional: own up to the code you created and all its flaws. What I’m not
for is fiefdoms in a codebase.
I worked at one company where I found a bug in the database interaction from
the calendar module. I fixed the bug in MySQL, but didn’t have the know-how to
fix the bug in the other databases. I talked to the engineering manager and
was directed to the developer that owned the calendar. I explained the bug, my
fix, and what I thought was needed for the other databases to work and they
were to fix it. When I left the company six months later, my fix still wasn’t
applied and none of the other databases had been fixed. All because the person
who owned the calendar code didn’t bother to follow through.
Having a branch called tswicegood/fix/new-calendar-query gives the impression
that I now own the new calendar fix. Removing the signature from that is a
small step toward increasing the team ownership of a code base and removing the
temptation to think of that feature as your own.
So what about bugs? You want the bug fix to originate as close to the
originally release code as possible. To do this, create the branch directly
from the tag, bump the version number, then work on your fix. For example,
let’s say you need to find a bug in v1.2.0 that you need to fix.
$ git checkout -b v1.2.1-prep v1.2.0
... adjust version number to v1.2.1alpha, then commit
The -b v1.2.1-prep tells Git to create a branch with that name, then check it
out. The v1.2.0 at the end tells Git to use that as the starting point for
the branch. The next commit adjusts the version number so anything you build
from this branch is going to be the alpha version of the bug fix. With that
bookkeeping out of the way, you’re ready to fix the code.
For projects that have a robust test suite (which unfortunately isn’t all of
them, even mine), the very next commit should be a failing test case by itself.
Even when you know the fix to make the test pass, you should create this commit
so there’s a single point in the history that you and other developers can
check out and run the tests to see the failure. The next commit then shows the
actual code that makes the test pass again.
Once the fix has been tested and is ready for release it’s time to merge back
in to master. You should do this with --no-ff, and --no-commit and
remove the alpha suffix before committing just like making a feature release.
Once you’ve merged and tagged the code, you need to get develop up-to-date
with the bug fix. Since master and develop have now diverged — remember,
develop has at least one commit bumping the version number — you have to deal
with a merge conflict.
Hopefully, the merge conflict is limited to the version number. If that’s the
case, you can just tell git merge to ignore those changes by with this
$ git checkout develop
$ git merge -X ours master
The -X command tells git merge which strategy option to use when merging,
and using ours tells it that the code in the branch you’re merging into wins.
You need to be careful with this, however. It means that any real conflicts
would be swallowed up. Hopefully you know the changes well enough to realize
if there’s a larger conflict, but if for some reason you don’t know, you can
always try this approach:
$ git merge master
… ensure that the only conflicts are around the version … numbers and that the develop branch code should be used$ git reset --hard ORIG_HEAD
$ git merge -X ours master
You’ll have to manage any merge conflicts manually (or use git mergetool) if
the conflicts are larger than the version number change. If you do confirm
that you don’t need any of the conflicted changes, you can use git reset
--hard ORIG_HEAD to reset the working tree back to its pre-merge state, then
the git merge -X ours master to pull the changes in ignoring the conflicts
On develop versus master
I’ve gone back and forth on this. My preference is to release often.
Sometimes multiple times a day. In that case, master is just a quick staging
ground. Create a branch, bump the version, write one feature, merge it, bump
the version number, rinse, then repeat.
There are a few problems with this approach. First, not every team or for that
matter project can work that way. Sometimes the code needs more testing across
multiple platforms or configurations. Sometime’s there’s an integration test
suite that takes awhile to run. Sometimes releases need to be timed to
coincide with scheduled downtime giving you time to implement a few features
while waiting for your release window.
Second, it doesn’t scale. One branch that merges one feature is fine, but if
you have a team of developers working on a project you probably have multiple
things being worked on in parallel. Having them all branch off master, all
bump their version number, and all coordinate for an octopus merge (or merge
and release separately) is a nightmare.
Having everyone branch and merge off of develop provides a base that keeps in
sync with the rest of your code base. Your feature branch exists by itself,
and all it needs to do to stay in sync is occasionally merge develop.
Compared to git-flow
This is very similar to the workflow called git-flow. There are a few
If my memory serves, it used to call for branch names with the author’s name
in it (a re-reading of it now doesn’t show that though). That’s what remote
repositories are for, so I don’t want to use that.
Correction, nvie just confirmed that it’s never been there, so one of my
biggest gripes with it wasn’t founded. Oops. :-/
Next, hot fixes or bug fixes in git-flow are merged to master and develop
instead of only master. I want the versions going through master then back
out to develop. To me, it’s a cleaner conceptual model.
Versions, a thing I’ve written about, are important. I want
develop to be installable, but I don’t want it confused with any released
version. There should only be one commit, a tagged commit at that, in each
repository that can be built for any given version.
I don’t call out release branches in my description because my hope is that
they aren’t necessary. Of course, if your project has a long QA cycle that’s
independent of development or you’re trying to chase down a stray bug or two
before a release, then a release branch is great, I just don’t make them
The most important thing is to create some process to how code moves through
your repository, document it, and stick to it. Everyone always committing
directly to master is not sustainable. It also makes it much harder to
revert changes if something makes it in by accident as you have to go find all
the relevant commits instead of reverting one merge commit.
Worst than a free-for-all in master is the hybrid. Committing some of the
time directly to master and other times to a feature branch means there’s no
pattern to how your code is used. What’s the threshold for creating a feature
branch? Is it based on how big the feature is, or how long it’s going to take?
Answering these questions distracts you and future contributors. Providing a
solid pattern of how contributions flow through your repository is an important
step in making your project more accessible to fellow contributors regardless
of whether those are in the open-source community or an office down the hall.
Some of the things outlined here might seem like a lot of overhead, but in the
end they save you time. Most importantly, they’ll scale beyond just you.
What version of Chrome are you using? Beyond the major version number, what
version of your operating system are you on? If you deploy using Linux code,
what version is your Linux Kernel?
My answer to those questions: I don’t know. Or didn’t. I just checked and I’m
on version 42.0.2311.39 beta for Chrome, 10.10.2 for OS X, and
3.16.7-tinycore64 for my Docker VM I use for testing images. My life isn’t
better for knowing that information, though.
The same is true for most of the software you create. The version number
doesn’t matter, but to this day software developers don’t want to mark their
software as version 1.0. 1.0 carries a lot of weight. To a lot of developer’s
it means you’re done. It means you’re confident in it. It means things aren’t
going to drastically change.
The Python community is afraid of 1.0. The only reason I can understand why is
because it’s the largest case of collective imposter’s syndrome I’ve ever seen.
Don’t believe me? There are 61,564 Python packages that have been released
according to this page. Of those, 40,489 have a version number that
begins with 0. That’s two-thirds of the packages that I can’t tell
anything from those version numbers.
For example, is virtual-touchpad more stable than Werkzeug? The former
is at version 0.11 while the latter is only at 0.10.1. Of course, Werkzeug is
almost certainly more stable. The download numbers seem to tell me that with
it’s more than 20,000 downloads in the last day. Werkzeug runs a huge chunk of
the web that’s powered by Python. Flask doesn’t exist without it.
Statements like the one in the previous paragraph that begin with “of course”,
however, are only obvious with the correct reference. If you’re coming from
world outside of the Python community, you don’t have that reference.
Versions begin at 0.x. Anything in 0.x hasn’t been deployed anywhere and
you’re still turning it into something useful. You make no guarantees about
The first code that’s used in production is 1.0.0. Production means it’s
being used and not just written.
Versions follow Major.Minor.Bugfix.
Major version numbers are for backward compatibility. If this number
changes, the API has changed and it means that code written against the old
version won’t work with the new version in at least one case.
Minor versions are for new features. Nothing should break between versions
1.0.0 and 1.1.0 or 1.101.0.
Bugfix versions are for bugfixes. No new features are added here, just
corrections to code to make sure it does what it’s supposed to.
It’s really that simple. When I install your software package at version 1.2.0
I know that I can run anything before version 2.0.0 and it should all continue
There are some devils hiding in the details. For example, how many back
versions do you support? If you find a bug in version 1.3.0 that was present
all the way back to 1.0.0, do you patch versions 1.0.x, 1.1.x, and 1.2.x as
well? Does each new feature mean a minor version bump?
That’s up to you as a maintainer. There are no right answers to those
questions: the main point is to make sure that code that works in one release
doesn’t break in the next. If it does, and sometimes it needs to, bump the
major version number.
Also, it’s ok to break. SemVer gives you the opportunity to convey to the
users of your code that something needed change in ways that weren’t compatible
with the previous code.
To the Python Community
Please consider adopting SemVer. What’s stopping you? Is it because you don’t
think your code is ready to be called 1.0? I promise you, it is. It’s
All I want is for you to quit worrying about getting it perfect. Get it close
to right, make it so people can use it. Then release it. If you get something
wrong or need to fundamentally change the API, do it, but bump the major
version number so everyone knows at what point their code might not just work™.
Software is just that: soft. It can, and should change. Don’t be afraid of
v1.0 or v2.0 or v20.0.
This past fall a (new) good friend offered to marry Brandi and I as we traveled
to Terlingua to share our vows with each other, our families, and close
friends. As Sharron prepared, she asked for a favorite author or two of each
of ours so she could find a quote to use at the ceremony.
There are few things that will make you question your reading than to be
marrying a professional writer and being asked who your favorite author is. I
read a ton, but have had very few authors who are my go-to when looking for
inspiration. I’m also horrible with specifics. I remember general themes, but
things like names don’t stick with me. Since I drew a blank on inspiring
writers, I went with my gut: Terry Pratchett.
Regardless of the where I’ve been in life the past handful of years after
discovering him, I’ve reached for Terry Pratchett’s books as my release of the
previous day’s activities. It’s been the thing that lets the energy expended
or pent up during the day relax into a soothing sleep. His humor and view on
the world is calming.
I told Sharron that Pratchett was my favorite author, not expecting her to find
much of anything. His humor is great, I knew that. But something that would
fit in a wedding? That’s a different story.
The day before the wedding, we arrived and she told us what she had found:
Why do you go away? So that you can come back. So that you can see the place
you came from with new eyes and extra colors. And the people there see you
differently, too. Coming back to where you started is not the same as never
Having just left Austin, having just left my family and friends, having grown
up as a rolling stone, and having returned to a place dear to my heart for this
special occassion, this quote carried special meaning for me.
It’s been with me ever since, and even more so this last 24 hours. #RIPTerryPratchett
Writing usable, functioning code can be hard enough. Now imagine writing code
that you need to make extensible enough that other developers can extend without
simply copy-n-pasting your source code and making their own modifications. That
can be rough. There are some patterns that you occasionally find in frameworks
like Django, however, that I haven’t seen documented. This morning, I
contributed a bugfix to werkzeug based on a pattern I’ve seen before.
I’m calling it the kwargs helper method.
You have a method that returns an object or the result of a function, both of
which are variable. Through other parts of your code, other developers can
change what your function will return. Examples of this include Django’s
ListView.get_queryset, and Werkzeug’s Rule.empty (as of v0.10).
You need to allow other developers to control what gets passed into the objects
and functions as they’re called. Without such a mechanism, developers are
forced to override the entire method and in the worse case re-implement part of
your code. I want you to stop that.
Example of Problem Code
Here’s a contrived example of the code in question.
Note the type(self) call here. That returns Sheep in this example, but
returns whatever type the subclass is. So when we create a BionicSheep like
the one below, we have a problem:
classBionicSheep(Sheep):def__init__(self,turbo_legs=None,**kwargs):self.turbo_legs=turbo_legssuper(BionicSheep,self).__init__(**kwargs)# what do do about cloning?
At this point, BionicSheep is broken if you try to clone it. The clone
method won’t pass in the turbo_legs value. You now have two options:
copy-n-paste the whole clone method to remove Sheep.clone from the equation
entirely or call super to get the result of Sheep.clone, then add your own
values and duplicate the assignment in __init__. The latter option isn’t
horrible in this case, but if __init__ provided different functionality based
on that kwarg you would be forced to copy-n-paste clone and provide your own
The solution is to provide a helper method that provides the kwargs outside of
the actual function. I’m calling this pattern the kwargs helper method. This
provides a granular hook for other developers to change the arguments that are
provided without having to override the main method and possibly duplicate code.
You need to modify the Sheep.clone method to work like this to use this
I started by saying that writing functioning code is hard. Making sure your
code is extensible for every other developer to use is ten times harder. Think
not only about what each line of code in your codebase does, but also how
it’s used and extended. I promise, some developer somewhere is going to want to
change just about every line of your code. Be nice and make it easy on them.
This morning I read an article on what the ideal operating system should
look like. I devoured all allthree parts and it got me thinking
about my thought process and how I approach development. This post is a loose
collection of those thoughts.
One thing that I’ve discovered about my thought process is how I approach
problems. Too many times, it’s easiest to start from where I am right now and
how I can modify the existing tool / code / product to do what I need. This
provides a good starting point for context of what’s immediately possible, but
not for solving the problem.
For example, let’s consider the text editor. The main purpose of a text editor
is writing things down. You want to be extremely good at that if you’re going
to be an editor that people want to use. Based on this description you can
build an editor that’s a joy to use and makes the process of getting
information into the editor easy and intuitive. There’s a problem with it:
what happens when a user is done with new document that they’ve created? My
original description did not include anything about saving or exporting the
documents that are created.
Realizing that you’ve left saving out as a feature, you might write up a
job story that looks something like this:
When writing a story I want to ensure that it’s been saved so that I can
share the saved document with other people.
If you start from where you are, you might think to add a Save feature and tie
that to a menu item, a keyboard shortcut, and maybe even a toolbar to provide
multiple options to your user. This is a valid concern, but it overlooks one
key thing. The user doesn’t care about saving, they just want it saved.
The user’s job is to write, not to save something. Explicitly saving
something is a task. User’s aren’t interested in performing a task unless they
have to. Auto-save is what the user needs. At this point in the process the
only thing they need to know is that their work is saved. Instead of focusing
on the job at hand and how this feature supports that job, adding a Save
feature focuses on the task.
I’ve fallen victim to thinking that focuses on the task instead of focusing on
the overall job, but I guard against it now. This causes me to think
differently than a lot of developers: rather than focus on fixing one
particular thing, I focus on what the underlying (or overarching?) problem or
job is. This means I talk past people sometimes because I forget that we’re
talking about different things.
How to fix a problem
On a recent open source project that I work on I opened a pull request that
introduces a new higher level concept to the project in the service of fixing
one discrete bug. To me, the discrete bug was a manifestation of the lack of
that higher level structure. Without that common vocabulary, different parts
of the code were touched by different developers at different times and there
was a discrepancy between how the concept was represented.
To me, that larger problem was what needed fixing. To other developers, the
bug needed fixing. Thinking about that larger problem, I tackled that and
fixed the bug. Another developer on the project focused on the explicit
problem and added the one-line fix to that code path that solved that one bug
that manifested itself. On the surface, the one-line fix seems simpler because
less code was involved (my fix was a little more than 30 lines). The one-line
solution was only simpler when viewed as the task “fix this bug” not “fix the
problem that gave rise to this bug.”
To be fair, both are legitimate ways to approach the problem. The one-line fix
that focuses on the task at hand fixes the bug and avoids possible
over-engineering that might happen by thinking about the bigger picture. It
also runs the risk of having the same problem solved in different ways
throughout the code base as each “just one-line” fix adds another branch into
the complexity of the program.
Thinking like a developer vs like a designer
This all ties back to the story that started this post because of the way
the problem was approached. Most developers I know would balk at the idea of
creating an operating system, then starting by removing the file system and
applications. “But where will I store my files and how will access them?!” I
hear them all exclaim at once. Most designers I know would hear that idea,
think for a second, then say “ok, so what replaces it?” followed closely by
“and what was the user trying to do when they accessed those files?”
Designers tend to think in terms of solutions to general problems. Developers
tend to think in terms of solutions to explicit problems. This is still a
nascent revelation to me, but starts to explain to me while I’ve always felt
slightly out of place in the development world.
It’s also making me question my description: am I still a developer with a bit
of design knowledge or a designer that happens to program?
Listening to @pragdave talk about Exlir’s pipes he was talking about
how these two styles, while fundamentally the same, have vastly different
Try to explain that line of code to someone who doesn’t program. You start by
telling them to just skip over everything until they hit the center, that’s the
starting point. Then, you work you way back out, with each new function adding
one more layer of functionality.
As programmers, we’ve taught ourselves how to read that way, but it isn’t
natural. Consider this pseudo code:
"cat" | list | sorted | join
This code requires that you simply explain what | does, then it goes naturally
from one step to the next to the next and the final result should be the joined
Seeing that code example got me thinking about some of the discussions I’ve had
with new programmers as I explain how Django works. I start explaining the
view, to which I’m almost always asked “ok, how does the request know what view
to execute?” I follow this up by moving over to URL route configuration.
After that’s explained, I’m asked “ok, so how do requests come in and get
passed through that?” And this goes on, until we’re standing on top of 20
turtles looking down at the simple Hello World we wrote.
In that vein, what would a web framework look like that started with the
premise that a regular, non-programmer should be able to read it. Here’s an
So, you define an application function that takes a request, that request
is then run through a get function with a route, and if that matches it would
finally pass off to a final function that does something that would generate
To that end, I’ve hacked up this simple script that uses werkzeug to
do a simple dispatch. The implementation is a little odd and would need to be
cleaned up to actually be useful, but I think I could be on to something. Just
imagine this syntax:
At this point, require_login can return early if you’re not logged, and
display_admin could repeat the entire application style and be “mounted” on
top of the /admin route and respond to request.path that is slightly
I’ve been told I should check out Docker for over a year. Chris Chang and
Noah Seger at the Tribune were both big proponents. They got excited enough I
always felt like I was missing something since I didn’t get it, but I haven’t
had the time to really dig into it until the last few weeks.
After my initial glance at it, I couldn’t see how it was better/different than
using Vagrant and a virtual machine. Over the last few weeks I’ve started
dipping my toes in the Docker waters and now I’m starting to understand what
the big deal is about.
Docker versus VM
I’ve been a longtime fan of Vagrant as a way to quickly orchestrate virtual
machines. That fits my brain. It’s a server that’s run like any other box,
just not on existing hardware. Docker goes a different route by being more
about applications, regardless of the underlying OS. For example, let’s talk
about my npm-cache.
Using this blog post as a base, I wanted to create an easily
deployable nginx instance that would serve as a cache for npmjs.org. The
normal route for this is to get nginx installed on a server and set it up with
the right configuration. You could also add it to an existing nginx server if
you have one running.
Docker views something like this npm-cache less as the pieces of that
infrastructure (nginx and the server its on) and more as an application unto
itself with an endpoint that you need to hit. Its a subtle shift, but
important in a service-oriented world.
Docker has been described as Git for deployment, and there’s a
reason. Each step of a deployment is a commit unto itself that can be shared
and re-orchestrated into something bigger. For example, to start my
npm-cache, I started by using the official nginx container.
The nginx container can be configured by extending it and providing your own
configuration. I used in the configuration from yammer, created a few empty
directories that are needed for the cache to work, then I was almost ready
to go. The configuration needed to know how to handle rewriting the responses
to point to the caching server.
Parameterizing a Container
This is where things got a little tricky for me as a Docker newbie. nginx
rewrites the responses from npm and replaces registry.npmjs.org with your
own host information. Starting the container I would know that information,
but inside the running container, where the information was needed, I wouldn’t
know unless I had a way to pass it in.
I managed this by creating a simple script called runner that checks for two
environment variables to be passed in: the required PORT and the optional
HOST value. HOST is optional because I know what it is for
boot2docker (what I use locally). PORT is required because you have to
tell Docker to bind to a specific port so you can control what nginx uses.
My runner script outputs information about whether those values are
available, exiting if PORT isn’t, modifies the /etc/nginx.conf file, then
starts nginx. The whole thing is less than 20 lines of code and could
probably be made shorter.
Deploying with Docker
I got all of this running locally, but then the thought occurred to me that
this shouldn’t be that hard to get running in the cloud. We use
Digital Ocean a lot at Continuum, so I decided to see what support
they have for Docker out-of-the-box. Turns out, you can
launch a server with Docker already configured and ready to run.
With that, deploying is ridiculously easy. I started a small box with Docker
installed, then used ssh to connect to the box, and ran the following
That’s it! Including network IO downloading the npm-cache, I spent less than
five minutes from start to finish to get this deployed on a remote server.
The best part, I can now use that server to deploy other infrastructure too!
Making deployment of a piece of infrastructure this easy is not a simple
problem. I’m sure there are all sorts of edge cases that I haven’t hit yet,
but kudos to the Docker team for making this so easy.
… we must begin by understanding that every place is given its character by
certain patterns of events that keep on happening there.
The above quote is in the opening chapter of one of my favorite books of all
time, The Timeless Way of Building by Christopher Alexander. Alexander is
famed in programming circles as the author of A Pattern Language which set
the stage for programming design patterns some 40 years before the Gang of
Four wrote the book.
The Timeless Way is the lesser known of his two-volume set. It sets up his
pattern book by defining why patterns are important. It is a more thorough
explanation of quality than Zen and the Art of Motorcycle Maintenance
without the personal account of a descent into madness and a focus on quality
through the lens of architecture and places. It is on my list of must read
books for anyone who takes themselves seriously as a programmer.
If you’ve ever had one of my code reviews, you’ve probably seen something like
All functions need two \n characters between them
Or this gem:
Syntax of 'key' : 'value' in dictionaries will raise a flag on pyflakes.
Best to avoid.
Both of these are from a commit message this past week with some simple
cleanup, code gardening if you will, on code. My change didn’t affect what
the code did at all, but it did make sure that it was more idiomatic Python.
Pythonistas pride themselves on a certain style so much that there is even a
coined term for this: Pythonic.
The importance of these small changes is summed up in the opening quote from
this post. To paraphrase:
Things keep happening the way they happen.
By focusing on producing clean, readable, simple, uncomplicated code, you
create an environment where more clean, readable, simple, uncomplicated code
Tools I Use
You can stop here if you’re not interested in specific tools, otherwise, here
are a few things I use to help keep my code clean.
The editor I use the majority of the time is Sublime Text 3 (though I will
always have a soft spot in my heart for Vim). I start with these language-specific
settings in Python, which you can use by opening a .py file, then going to
Sublime Text 3 > Preferences > Settings - More > Syntax Specific - User and
copying this JSON blob into that file.
Beyond some basic settings that cause spaces instead of tabs to be used and
setting the tab size correctly, the most important part of those settings is
the rulers. There are two lines that are displayed at character 72 and 80
in every Python file I open.
Docblock comments in Python are supposed to be less than 72 characters. This
allows the docblock to be displayed indented in Python’s built-in help and not
wrap to the next line. I try hard to ensure all docblocks I write stop before
I hit that mark. The second line at 80 characters shows the point where my
Python code needs to stop.
I know many developers think that the 80-character limit is too limiting. “I
have a big monitor” I hear you say. The optimal character length
for a line of text is around 60 characters. Going much beyond that makes it
harder for the human brain to process what it’s seeing without scanning back
and forth. Plus, take your code and increase it so someone at a meet-up can
see your code sitting 20 feet away from the screen, then see how your 120
There’s an even more practical consideration when thinking about line length.
Forcing this constraint on yourself causes you to think really hard about
what is the most effective use of those characters. Is that line really best
expressed with an 80 character string in the middle, or can that be hidden
behind a variable? Do all of those and conditions in your if statement
make your code more readable, or would an intent-revealing function help this
code? Constraints, even annoying ones, can really help hone your code design
Next up, I use the Python Flake8 Lint. This tool scans your code using
pyflakes and flags errors for you. Out of the box, it can be a little
annoying (especially when you’re learning pep8’s rules). It displays a pop-up
when you save your file and tells you all the places your code has errors.
This is really useful on your own projects, as it causes you to pay attention
to make sure that your code doesn’t raise these errors. But when you’re
working with other developer’s code, you might want to reduce the chattiness.
You can tweak the settings under Preferences > Package Settings > Python
Flake8 Lint > Settings - User. Here are the settings I use for it:
// run flake8 lint on file saving
// run flake8 lint on file loading
// popup a dialog of detected conditions?
// show a mark in the gutter on all lines with errors/warnings:
// - "dot", "circle" or "bookmark" to show marks
// - "" (empty string) to do not show marks
This adds a mark to the gutter on each line that has an error, suppresses the
popup, and makes sure that pyflakes is run when I open a file so I can see the
errors immediately. To see the actual error, I move my cursor to a line
that’s marked and this plugin displays the error message in the status bar.
These might seem like draconian tools that get in the way of coding quickly.
Coding fast and coding sloppy are not synonymous. Spend a little time working
within these constraints and your fellow developers will thank you.
Plus, you’ll be making sure that the code you write helps to create a better
codebase by increasing the quality of the patterns that keep happening there.
I get asked a lot where to start if you’re looking to python for web backed work. A lot of people look at Django and Flask and feel that Flask is where they should start. It’s nice and small, very simple, and after all they’re not doing anything big and complicated, so why start with a big, complicated framework?
This reminds me if something that happens in the running world. People get started running then either a) read Born to Run, or b) hear someone talking about the benefits of so-called barefoot running. (For the record, I’ve only seen a few people actually run barefoot. Most run with minimalist shoes like Vibram FiveFingers™.)
There are many benefits to running with minimal shoes. Proponents point to studies that show lower injury rates amongst bare footers. They talk about our natural instinct to run and how the modern shoe with all of its support and cushioning is actually doing more harm than good.
The next part of their pitch is ignored by many of the so-called Born-to-Runners: it takes a lot of practice to be able to get to the point where you can run 10k, much less an ultra-marathon with minimal shoes. You practically have to start over and slowly build. There is a huge payoff, but it takes time. Otherwise, you’re more likely to injure yourself.
I’m speaking from experience. I didn’t read Born to Run, but I know the claims. When I started running a few years ago, I switched on and off from a minimal pair of running shoes and a pair of FiveFingers™. I figured since I was just starting out I wouldn’t have any bad habits to break.
There was one snag in my plan: I wasn’t ready for them. I hadn’t built up the running specific muscles. My form wasn’t there yet. I quickly started having plantar fasciitis issues. They weren’t debilitating, but enough to make me take a week off to rest and work on stretching. It flared right back up as soon as I started running again. I had a half marathon a few months out so something had to give. A trip to the running store and about $100 later I had a pair of running shoes that felt like pillows on my feet and a week later the pain was completely gone.
The same thing applies to web frameworks. It might seem like a good idea to stick with frameworks that can be coded in one file, or ones that don’t do everything. Those frameworks are built on top of a lot of hard won lessons.
When you’re starting out, you don’t know what a properly factored web application looks like (yet). You don’t know where to draw the line between your model and controller layers (yet). You don’t really know the trade-offs involved in going with a relational database and a NoSQL database. And that’s ok. Micro frameworks assume you do, though. They give you a lot (or a little, depending on how you look at it) of rope and it’s really easy to end up with your app looking an awful lot like a noose.
So skip the minimalist when starting out, whether that’s shoes or web frameworks. Build on the experience of others, then start stripping away those layers once you’ve got a solid base.