Etsy Logo

Code as Craft

Did you 'try' it before you committed? main image

Did you 'try' it before you committed?

  image

At Etsy, we deploy often, from head, and everything that is committed is to trunk and must be ready for production immediately. This makes it very important to test your code before committing; otherwise, you will be holding up everyone else from committing and deploying since trunk always needs to be clean.

Over a year ago, everyone had at least one common gripe: You could not run the entire test suite on a developer virtual machine (VM), and if you thought you could run the entire suite on a developer VM, then you thought it would take at least 3 hours.

Fortunately, on the Continuous Integration (CI) cluster the tests would take roughly 30 minutes. But really, the chances that the tests would pass were so low that 'rebuilding reds' (aka rebuilding a test failure to make sure it was not just a flaky test) and re-running the test once there was a 'fix' would actually turn that 30 minutes into an hour, hour and a half, two hours, or more.

Why were the chances of the tests passing so low on CI? because almost no one would ever run a full test suite before the integration step.

The problem was: developers could not run tests in a reasonable amount of time with the resources they were given, so they would not run the tests until they had access to the shared resources that had the 'Oomph' to run the tests in a more reasonable time.

So one day, someone told one of the developers, that he could test on one of the behemoths in the CI cluster. That developer would SSH into the machine. Edit some code. Run the tests on the much more robust hardware, and PROFIT! But then came along another developer. He began to do the same on the machine. Then another came, and most likely a couple more. There were maybe a handful of developers, and the machine was spec-ed high enough to handle the load, but that was not the issue.

The problem was: developers were testing, but they had no way to orchestrate who was using the machine at the time. Also, the manner in which the test suite was architected involved a lot of shared fixtures which would inevitably cause collusions during concurrent test runs. Sigh...

While this situation was brewing, I was working with one developer on how to get his changes onto the machine in the CI cluster. We figured out what options we needed for svn diff to make the patch file that we wanted. Then we figured out which options we needed for patch to patch in the patch file we wanted to the svn working copy on the CI machine. Eureka!

Here’s the simple solution to allowing the developers to utilize the awesome resources in the CI cluster without stepping on each other’s toes:

  1. Create a new Jenkins Freestyle Project (or copy an existing) and

    • Select Parameterized Build
      • File Parameter: patch.diff
      • String Parameter: username
    • Set up the SCM as usual
    • Add an Execute Shell build step: Apply the patch.diff
    • Use $username in the recipient list of the e-mail publisher
  2. Write a short bash script that
    • Creates a patch
    • Sends a cURL request with the patch and $USERNAME to start a build of the Jenkins job

The original script looked something like this:


#!/bin/bash  
HUDSON="ci.example.com" 
LOCATION="/home/$USER/working_copy" 
PATCH='patch.diff' 
cd $LOCATION 
svn diff > $PATCH  
file_param="{'name': 'patch.diff', 'file': 'file0'}" 
user_param="{'name': 'executor', 'value': '$USER'}" 
args=("$@") 
for ((i=0;i<${#args[@]};i++)); do 
      curl -F file0=@$LOCATION/$PATCH 
             -F json="{'parameter': [$file_param, $user_param]}" 
             -F Submit=Build http://$HUDSON/job/try-${args[i]}/build 
done

We called this new service try. The day we introduced try to the team, the number of deploys went from maybe a handful a day to more than 20 deploys a day, and we have not really looked back.

Every new deployer at Etsy, deploys code on his or her first day, and each one of them is told by someone, "Make sure you use try before you commit."

try has evolved with the rest the rest of our CI infrastructure. If you read Divide and Concur, you can probably imagine how we handle so many test jobs. try has also been our guinea pig for integrating Jenkins and Deployinator. We will save the details of all of this for another post.

In the meantime, please take a gander at other try implementations: https://wiki.mozilla.org/ReleaseEngineering/TryServer http://buildbot.net/buildbot/docs/0.8.4/try.html#try