Etsy Logo

Code as Craft

Nagios, Sleep Data, and You main image

Nagios, Sleep Data, and You

  image

Gettin' Shuteye

Ian Malpass once commented that "[i]f Engineering at Etsy has a religion, it's the Church of Graphs."  And I believe!  Before I lay me down to sleep during an on-call shift, I say a little prayer that should something break, there's a graph somewhere I can reference.  Lately, a few of us in Operations have begun tracking our sleep data via Jawbone UPs.  After a few months of this we got to wondering how this information could be useful, in the context of Operations.  Sleep is important.  And being on call can lead to interrupted sleep.  Even worse, after being woken up, the amount of time it takes to return to sleep varies by person and situation.  So, we thought, "why not graph the effect of being on call against our sleep data?"

Gathering and Visualizing Data

We already visualize code deploys against the myriad graphs we generate, to lend context to whatever we're measuring.  We use Nagios to alert us to system and service issues.  Since Nagios writes consistent entries to a log file, it was a simple matter to write a Logster parser to ship metrics to Graphite when a host or service event pages out to an operations engineer.  Those data points can then be displayed as "deploy lines" against our sleep data.

For the sleep data we used, and extended, Aaron Parecki's 'jawbone-up' gem to gather sleep data (summary and detail information) via Jon Cowie's handy 'jawboneup_to_graphite' script on a daily basis.  Those data are then displayed on personal dashboards (using Etsy's Dashboard project).

Results

So far, we've only just begun to collect and display this information.  As we learn more, we'll be certain to share our findings.  In the meantime, here are examples from recent on-call shifts.

This engineer appeared to get some sleep!
Here, the engineer was alerted to a service in the critical state in the wee hours of the morning. From this graph we can tell that he was able to address the issue fairly quickly, and most importantly, get back to sleep fast.

NOTE:  Jawbone recently opened up their API.  Join the party and help build awesome apps and tooling around this device!