POSTS
Using Basketweaver with GitHub
Last month I blogged about using Travis CI with Armstrong. Things have been going along fine until the last few weeks. Tests were failing due to network timeouts while talking to PyPI. Never one to take failing tests lightly, I set out to fix it.
From local testing, it appeared that there was some sort of selective filtering happening at the server level on PyPI that was causing our tests to fail. All of our tests in the CI environment follow these tests:
- Install all of the development requirements with
pip install -r requirements/dev.txt
- Install the local package
- Execute the tests using
fab test
I could follow these steps to the letter locally in a fresh virtualenv, but the second they hit the Travis-CI server they would time out while trying to install everything. We’ve seen similar behavior at the Tribune when we roll out new servers. PyPI appears to be up, but installs fail due to timeouts.
Once I confirmed this, I started looking at alternatives to pypi.python.org as our main index for testing. My initial thought was to have a dynamic server that would act as a proxy to PyPI and cache everything locally. This requires the least amount of work long-term—assuming the server stays up. The problem was that nothing worked quite the way I wanted. The closest I found was collective.eggproxy. It felt a little odd and wasn’t very configurable without going the Paster route, so I decided to fall back on basketweaver.
Basketweaver builds a static index suitable for using with pip via the
--index-url
option. It takes a directory of files, then generates the HTML
that pip can scrape to determine if the package exists. This HTML can be
hosted anywhere that can serve a static HTML page, such as GitHub Pages.
Working with GitHub
There’s a few hoops to jump through when deploying to GitHub Pages. First,
make sure you include an empty .nojekyll
file. GitHub assumes everything you
want to publish is in Jekyll, but this file tells GitHub to not parse your
files.
Next, and I can’t count the number of times I’ve done this, GitHub Pages
doesn’t give you directory indexes. Basketweaver generates its index in the
/index/
directory so you can’t hit the plain GitHub Pages URL and expect to
see anything more than an error message. Make sure to add the /index/
after
your GitHub Pages URL to view the it once you’ve published your changes.
The next thing I do is rework where basketweaver looks for files to build the
indexes. I really don’t want to look at a full directory of files at my root
directory, instead I want all of the files stored in the creatively named
./files/
directory. Basketweaver installs a file called makeindex
which I
can never remember, so I created a run.py file that remembers it for me.
The last thing to do is to use the newly created index when installing packages. For Armstrong, we do this with:
pip install -i http://armstrong.github.com/pypi.armstrongcms.org/index/ \
-r requirements.txt
I haven’t gone to the trouble of setting up a CNAME
for
pypi.armstrongcms.org
yet, so we’re using the main github.com-based address.
There’s one final gotcha: PyPI uses routing that treats
http://pypi.python.org/pypi/South/ and
http://pypi.python.org/pypi/south/ as the same URL. That’s why
pip install Django
and pip install django
both work even though the former
is the correct package name. The URL spec is ambigous as to whether this is
correct, but most web servers are case sensitive, including GitHub Pages.
This will get you if you have dependencies on packages that don’t use all
lowercase names, such as South, Fabric, or Django. All three of these are
dependencies of Armstrong. The fix is to make sure that your
install_requires
and requirements files have the correct case. The easiest
way to determine this is to look at the output of pip freeze
and make sure
you’re using the same package name as it generates.
Conclusion
At the end of the day, this keeps our tests from being held hostage whenever PyPI goes on the fritz or starts randomly filtering requests as it seemed to do this past week. All that said, we’re still borrowing other people’s infrastructure. GitHub had a little blip while I was writing this post, underlining that you get what you pay for.
While you can use Basketweaver and GitHub to create a mirror of sorts for your packages, make sure you control the infrastructure if its mission critial that everything always stay up. That, or pay for it so there’s someone to call when it goes down.