Visualizing "Productivity" with Elasticsearch, Logstash and D3


With data visualizations you transform a bunch of numbers and labels into story. You prove a point. You might even learn something you wouldn't have otherwise. That's why I decided I'd like to make it easy for myself to put visualizations and illustrations into my blog posts.

I'm not a graphic artist by any means, but I've gotten fairly familiar with JavaScript and SVG while helping build SuperTCP, so I figured it can't be too hard to start adding some simple animations and graphics to my little blog. Which led to this post, which gives you a rare look into the life of my keyboard. That's right, I've decided to visualize my key combos and application usage. So first, here is what I found:

My top key combinations

Okay, so surprisingly the thing I do more than anything else on my computer is cycle right through my iTerm tabs (). I'm not smart enough to cycle left through them so I probably end up wrapping around fairly often (3% more right-shifts than left). Then it's copy/paste, refresh (probably mostly from writing this darn blog) and undo (if only life had ⌘-Z...). I also use ALT-SPACE a lot to bring up Alfred, because it's awesome. "it" get's me iTerm from the Alfred launcher once it's in front.

Key combos per 30 minutes

Here is a graph of the number of key combos per half hour by time of day. You can see essentially when I'm sleeping, and when I take a break at work:

Unique applications per 30 minutes

I thought it would be interesting to see if there are any trends or patterns in the amount of context switching I'm doing at different times of the day. Here are the number of unique applications I used in 30 minute intervals for the last few days:

Most of my time is spent switching between several applications, at least within a 30 minute interval.

Top applications

No surprises here. By far I spend most of my time in Google Chrome, especially while writing this post (I was both coding and testing primarily in Chrome).

Okay, so now let's get into how I captured and drilled through the data...

Data Capture and Visualization

Getting some data

First things first, we need some data. I thought it would be interesting (as well as relatively easy) to visualize something day-to-day, so I decided to start capturing my keystrokes and active applications over time. I thought it could be interesting to see what keys I press most often at paticular times of the day. I figured it could also be cool to compare that to the applications I most often have in the foreground, so here is what I used:

Data storage: Elasticsearch

Someday I'll write a deeper post on Elasticsearch, but for this blog I basically used a completely unconfigured Elasticsearch 1.5.2 install. If you're using Homebrew you can just brew install elasticsearch to get it.

Data investigation: Kibana

In the end my goal was to get some pretty visualizations into my blog, but to make sure I had the data formatted correctly and to figure out which views were interesting I used Kibana. Kibana also makes it easy to create a table of data from Elasticsearch based on a set of filters and aggregations that you can export for use elsewhere (like on my blog). It also lets you view the requests that were made for the individual visualizations you build, which I took and tweaked to get the final datasets for this blog.

Again, basically no configuration, just visualization building once it's running.

Data ingestion: Logstash

Okay, now we've got a search index and a generic visualization engine, let's feed it some data! Logstash is awesome at this. Once you know what you're doing you can get it ingesting just about any time of data in just a few minutes. I configured Logstash with a default Elasticsearch output, and a bunch of inputs and filters I'll go into next.

Keystroke logging: logKext

To capture keystroke data on Mac OS I forked logKext and made a few minor changes to get the keylog data into a format that was easy to ingest with Logstash. The main thing I'm focused on is getting key combinations–not just raw characters–which meant reworking the format a bit.

Then I just configured the Logstash file input and I was ready to ingest key data. Because Logstash logs every line as an event by default and I had changed logKext to break things up with newline delimeters this gives me my key combinations per minute and an idea which keys and combos I use most often.

Active application logging: Python+Bash

It turns out that if you have Python you can get the active application pretty easily, just a couple of lines.

Then I just call that in a loop, and use netcat to get the data into Logstash as seen here.

I wanted to leave the application field "un-analyzed" in Elasticsearch so it wouldn't split the fields up. Instead of spending time developing a custom index template (which is the right way of doing this) I just used "mutate { add_field => {"app" => "%{message}" } }" in my Logstash filter configuration to copy the message into a new field.

Now I have an "app" field in my index that contains the non-analyzed application name, every 10th of a second or so. The raw counts should give me a pretty good estimate of where I spend my time.

Visualize it!

D3js seems to be the standard for JavaScript visualization, so I figured it was a good place to start. I ended up using C3 for line/bar charts and d3pie for pie charts (although I'm not really a fan of d3pie, it's pretty finicky and has a few bugs).

In Summary

Although the data I've put here doesn't mean much (and the data itself isn't perfectly accurate), I was surprised how easy it was to use the ELK stack to ingest and visualize a dataset. I'm looking forward to finding interesting things to analyze with this setup in the future. The ability to ingest and view data like this is really handy... any recommended data projects I should take on?

comments powered by Disqus