Hackathons – Software Improv

There’s lots of great content floating around about how to be a better programmer; from books and blogs to talks and tutorials, everyone seems to have their own ideas on what it takes to get to the proverbial next level. One common theme which comes out is that it simply takes time. Time to learn the fundamentals, time to learn best practices and engineering principles, time to learn design patterns, time to learn your tools, time time time.

Time sucks though – especially once you reach industry. With ever-encroaching deadlines, customer escalations, mounting technical debt, and that silly real-life stuff, few people have or set aside the time to study up on their design principles or dive into and review existing system architectures unless its directly on the critical path of the crisis of the day. That’s why I think hackathons are so great – they specifically limit the amount of time you have to spend on it! Nearly anyone can scrounge up a spare Saturday if they really want to.

Now before we jump into why hackathons are so great for improving your software engineering judgement, we need to first take off our analytical software hats and slide over into the wild world of musical improvisation. I’m a firm believer in the benefits and cross pollination of ideas and techniques that interdisciplinary and extradisciplinary activities can provide.

So what’s musical improvisation all about? To put it as vaguely as possible, it’s about expression. Depending on the style, more or less structure is imposed, although typically a general chord structure is all that’s provided as a guide to the improvisor. The piece might frame the solo to give an overall flow and style, but ultimately leaves it to the soloist to take the piece where they feel it should go – literally on the spot, wherever their inner muse takes them. Naturally, great improv requires a level of familiarity with musical fundamentals – basic music theory knowledge, and instrumental technique – but more so relies on a stiff upper lip and a willingness to see what happens. Getting good at improv means performing lots and lots of really bad improv solos; I’m talking crash and burn-type solos. Ask any musician with any sort of improvisation experience, and they’ll regale you with their tales of shitty improv; where they crashed and burned, and then rolled over and burned some more. And the best part? The audience probably didn’t even notice. The great thing about Jazz improv in particular, is that there’s no such thing as a wrong note in Jazz – just abundant artistic license and confidence to boot.

OK, so how does this all tie in to Hackathons and being a better programmer? Well, Hackthons embody much of the improv spirit. They start with a basic structure – maybe a theme, set of constraints, or a vague idea. Combine that with a very restricted time limit to necessitate that willingness to see what happens, and you have Software Improv. Maybe you have vague ideas of a better way to communicate with friends. Maybe you just tried out the oculus rift and want to hack together some VR magic. Whatever your idea is, a few engineers can probably throw together something vaguely resembling your ideas within a day, and that’s where the beauty of Software Improv comes in.

So often, software engineers fall into analysis paralysis where they try to design for and solve the worlds problems before they even know what it is they’re really trying to build. This isn’t necessarily the fault of any one group in particular, but it a natural consequence of working in a vague and changing world where final development costs are high and engineering momentum is a nontrivial force. This is where hackathons and prototypes provide so much value. They allow engineers to explore their space and really throw ideas around and see what holds up. They allow engineers to make mistakes and throw them away, to see their code crash and burn and get a sense of what does and doesn’t work. Design principles and best practices can only get you so far – after that it’s all about experience and improvisation.

Ultimately, the Software Improv approach will also only get you so far. Hacky demos and prototypes leave out much of the hard work of finalizing and polishing the final product, but they’re a fantastic way to get a leg up on all that time investment necessary to becoming a better programmer. So much of engineering judgement is based on experience and intuition, and until you’ve gained that experience building and seeing what floats and what sinks, you’ll never quite reach that next level.

A glimpse of undefined behavior in C

A few weeks ago a coworker of mine stopped by my desk with a coding question.  We’d recently been quizzing each other over our C knowledge, so I smiled and braced myself for the hell which was clearly to come.

He wrote a few lines of code up on the whiteboard and asked what the program output:

#include <stdio.h>

int main(){
    int i = 0;
    int a[] = {10,20,30};

    int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];
    printf("%d\n", r); 
    return 0;
}

It seemed pretty straightforward. I explained the order of operator precedence – postfix binds before multiplication which binds before addition and also that multiplication and addition’s associativity was left to right, so I grabbed the marker and started writing out the arithmetic.

int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];
//    =    a[0]    + 2 * a[1]  + 3 * a[2];
//    =     10     +     40    +    90;
//    = 140

After I smugly scribbled down the answer, my coworker replied a simple “nope”. I thought for a few minutes, and was stumped. I couldn’t quite remember the order of the postfix operators’ associativity. Furthermore, I knew that wouldn’t even change the order of evaluation here as associativity rules only apply between operators of the same precedence level, but thought I’d give it a shot and walk through the arithmetic as if the postfix operators were all evaluated right to left.

int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];
//    =    a[2]    + 2 * a[1]  + 3 * a[0];
//    =     30     +     40    +    30;
//    = 100

Again, my coworker replied it was still wrong. At this point I threw up my hands and asked what the answer was. It turned out that this small sample was actually reduced from a larger code snippet he’d put together. To verify his question, he’d compiled and run the larger sample but was surprised to see it not behave as he’d thought. After reducing the unexpected behavior to this illuminating example, he’d compiled it with gcc 4.7.3 and it output a startling result: “60”.

At this point I was intrigued. I remembered that, in C, the order of function argument evaluation was undefined, so we thought maybe the postfix operators were just being evaluated in some random non-left-to-right order. Still convinced that postfix had a higher operator precedence than addition and multiplication, we quickly proved to ourselves that there was no order we could evaluate the i++’s such that the three elements of the array were added/multiplied together to get “60”.

Now I was hooked. My first thought was to look at the disassembly of the snippet and try and track down what was actually going on. I compiled the sample with debugging symbols and after a quick objdump we had the source-annotated x86_64 disassembly:

Disassembly of section .text:

0000000000000000 <main>:
#include <stdio.h>

int main(){
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 83 ec 20          	sub    $0x20,%rsp
    int i = 0;
   8:	c7 45 e8 00 00 00 00 	movl   $0x0,-0x18(%rbp)
    int a[] = {10,20,30};
   f:	c7 45 f0 0a 00 00 00 	movl   $0xa,-0x10(%rbp)
  16:	c7 45 f4 14 00 00 00 	movl   $0x14,-0xc(%rbp)
  1d:	c7 45 f8 1e 00 00 00 	movl   $0x1e,-0x8(%rbp)
    int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];
  24:	8b 45 e8             	mov    -0x18(%rbp),%eax
  27:	48 98                	cltq   
  29:	8b 54 85 f0          	mov    -0x10(%rbp,%rax,4),%edx
  2d:	8b 45 e8             	mov    -0x18(%rbp),%eax
  30:	48 98                	cltq   
  32:	8b 44 85 f0          	mov    -0x10(%rbp,%rax,4),%eax
  36:	01 c0                	add    %eax,%eax
  38:	8d 0c 02             	lea    (%rdx,%rax,1),%ecx
  3b:	8b 45 e8             	mov    -0x18(%rbp),%eax
  3e:	48 98                	cltq   
  40:	8b 54 85 f0          	mov    -0x10(%rbp,%rax,4),%edx
  44:	89 d0                	mov    %edx,%eax
  46:	01 c0                	add    %eax,%eax
  48:	01 d0                	add    %edx,%eax
  4a:	01 c8                	add    %ecx,%eax
  4c:	89 45 ec             	mov    %eax,-0x14(%rbp)
  4f:	83 45 e8 01          	addl   $0x1,-0x18(%rbp)
  53:	83 45 e8 01          	addl   $0x1,-0x18(%rbp)
  57:	83 45 e8 01          	addl   $0x1,-0x18(%rbp)
    printf("%d\n", r);
  5b:	8b 45 ec             	mov    -0x14(%rbp),%eax
  5e:	89 c6                	mov    %eax,%esi
  60:	bf 00 00 00 00       	mov    $0x0,%edi
  65:	b8 00 00 00 00       	mov    $0x0,%eax
  6a:	e8 00 00 00 00       	callq  6f <main+0x6f>
    return 0;
  6f:	b8 00 00 00 00       	mov    $0x0,%eax
}
  74:	c9                   	leaveq 
  75:	c3                   	retq 

The first few and last few instructions just set up the stack frame, initial variable values, and do the printf call and return from main, so we really only cared about instructions 0x24 through 0x57. That’s where the interesting behavior was. Let’s walk through the instructions a few at a time:

  24:	8b 45 e8             	mov    -0x18(%rbp),%eax
  27:	48 98                	cltq   
  29:	8b 54 85 f0          	mov    -0x10(%rbp,%rax,4),%edx

The first three instructions were as we expected. First, it loads the value of i (0) into register eax, sign-extends it to 64 bits, then loads a[0] into register edx. The 1 * was obviously optimized out by the compiler, but everything looked normal here. The next few started out much the same:

  2d:	8b 45 e8             	mov    -0x18(%rbp),%eax
  30:	48 98                	cltq   
  32:	8b 44 85 f0          	mov    -0x10(%rbp,%rax,4),%eax
  36:	01 c0                	add    %eax,%eax
  38:	8d 0c 02             	lea    (%rdx,%rax,1),%ecx

The first mov loads the value of i (still 0) into eax, sign-extends it to 64-bits, then loads a[0] into eax. That’s intersting – we expected the i++ to have been performed before these three instructions again, but maybe the last two instructions will do some sort of assembly magic to get us to the expected result (2 * a[1]). The two instructions add eax to itself, effectively performing 2 * a[0], then add that result to the previous computation and stores it in ecx. At this point the instructions have evaluated a[0] + 2 * a[0]. Things are starting to look a little strange but again, maybe there’s some compiler magic going on.

  3b:	8b 45 e8             	mov    -0x18(%rbp),%eax
  3e:	48 98                	cltq   
  40:	8b 54 85 f0          	mov    -0x10(%rbp,%rax,4),%edx
  44:	89 d0                	mov    %edx,%eax

These next few instructions are starting to look pretty familiar. They load the value of i (still 0), sign extend it to 64-bits, load a[0] into edx, then copy edx into eax. Hmm, ok let’s take a look at a few more:

  46:	01 c0                	add    %eax,%eax
  48:	01 d0                	add    %edx,%eax
  4a:	01 c8                	add    %ecx,%eax
  4c:	89 45 ec             	mov    %eax,-0x14(%rbp)

Here, it adds a[0] to itself 3 times, adds it the previous computation, then stores it into variable ‘r’. Now that’s wild – our variable r now contains a[0] + 2 * a[0] + 3 * a[0]. Sure enough, that’s what the program output: “60”. But what happened to those postfix operations? They’re all there at the end:

  4f:	83 45 e8 01          	addl   $0x1,-0x18(%rbp)
  53:	83 45 e8 01          	addl   $0x1,-0x18(%rbp)
  57:	83 45 e8 01          	addl   $0x1,-0x18(%rbp)

It seemed our compiled version of the code just seemed completely wrong! Why had the postfix operations been thrown down at the end after the assignment had already happened? With my faith in reality dwindling, I decided to go straight to the source. No, not the compiler’s source – that’s just an implementation – I grabbed the C11 language specification.

The issue lay within the details of the postfix operations. In our case, we were thrice performing a postfix increment to our array index in a single expression. When evaluating a postfix operator it returns the initial value of the variable. The assignment of the new value back to the variable is known as a side effect. It turns out that side effects are only defined to have been committed between sequence points. See section 5.1.2.3 of the standard for the specifics of where sequence points are defined, but in our case our expression exhibits undefined behavior. It’s entirely up to the compiler when the side-effect of the assignment of the new value back to the variable is performed with respect to the other parts of the expression.

In the end, we both learned a bit more about the C language. It’s well-known best practice to avoid constructing complex expressions with prefix/postfix operators, and this is a great example of why that is the case.

Pupil Tracking in Iris

Last week I gave an overview of my experience at MHacks and the Computer Vision Twitter Helmet two friends and I built.  In this post, I’ll go more in-depth into the pupil tracking system of the helmet.

We had originally envisioned the helmet to run on a raspberry pi, and because of this we decided to delegate the optical character recognition portion of the system to Amazon AWS.  This meant that we couldn’t just naively grab any and all frames from the outward facing camera and extract any text from them because we wouldn’t be able send the frames to AWS for processing fast or cheaply enough.  Instead, we had to develop a heuristic to allow us to increase the probability that the frame we grabbed really did have text in it, and only when we were fairly confident of success would we send the frames to AWS for text extraction and processing.

The essence of the heuristic we developed was that sustained horizontal movement of the pupil meant the user was reading text and thus the front-facing frame was a good candidate for text extraction.  If you recall from my previous post, our helmet had two cameras attached to it – an eye-tracking and a front-facing camera.  With the eye-tracking camera fixed to the helmet we could simply track the pupil’s location in the camera’s view over time, and that would act as a proxy for the gaze analysis and allow us to discern a reading motion from ordinary pseudo-random eye motion.

With this framework, the main functional loop of the eye tracker essentially became:

eye_cam = feed from the eye camera
front_cam = feed from the front camera

pupil_locations = []
while True:
    eye_frame = eye_cam.get_frame()

    pupil_locations.append(get_pupil_location(eye_frame))

    if is_horizontal(pupil_locations):
        send_to_aws(front_cam.get_frame())
        #Give our tracking a fresh slate
        pupil_locations[:] = []
    else:
        #Only keep track of pupil locations in the past few seconds
        trim_stale_locations(pupil_locations)

To find the location of the pupil, we used opencv to apply a series of filters on the eye frame to convert it to a more refined black and white frame. From that, and a few assumptions about what form the pupil now took in this new representation, we calculated the center of the pupil.

To get the black and white version of the frame we performed four operations on the it:

  1. Converted the frame to greyscale while ignoring the red channel.  By removing the red channel from the greyscale version, much of the distracting effects of patches of slightly-too-dark skin were removed.
  2. Smoothed the frame to further gloss over noise and increased the contrast using opencv’s histogram equalization. This helped harden the edge between the pupil/iris and the sclera/eyelids.
  3. Applied a threshold filter to floor/ceiling the pixel data to be either white or black.
  4. Applied opencv’s dilate filter.  At this point we had the black and white frame, but the eyebrow and eyelashes sometimes remained as a patchy structure which could dominate the pupil within the frame. To remove them we used opencv’s dilate functionality to erode away much of their dark-pixel mass and then smoothed the frame once more to fully remove them.

The image below progressively shows each of the filters with the final step being the centroid calculation.

Screenshot from 2013-02-26 23:12:06

Once we had the black and white version of the frame, we did a two-pass centroid calculation to find the center of the pupil. First we found the centroid of all the black pixels in the frame. This worked pretty well, but despite our dilation efforts eyelashes and other features around the eye sometimes crept into the black-realm of the image and threw off the centroid. To alleviate that error, we then found the distribution of distances from the black pixels to that first center we’d just calculated. From that, we trimmed away all of the black pixels which were more than a standard deviation away from the first center and recalculated a new center. This reliably gave us the location of the pupil.

With a method to reliably locate the pupil we we’re then able to track it’s motion and discern if it was reading text.  To do that, we remembered the pupil locations over the previous second (one location every 0.05 seconds for a total of 20).  On that set of locations we calculated the Pearson Correlation Coefficient to determine the strength of their linearity. If it fell into certain bounds then we concluded that the user was indeed reading.

28.79

The image above shows the whole system in action.  On the left is the raw eye camera frame and on the right is the processed version of it with the red dot being the current location of the pupil, the purple circles being the past 1-second’s locations, and the large red circle estimating the entire extent of the pupil.

The whole system tracked the pupil really well and reliably fired off frames to AWS when we were reading.  It wasn’t perfect though, as during our demo I was frequently looking between multiple people’s eyes while explaining the functionality and that incorrectly fired off quite a few reading events.  To alleviate that issue, we introduced a cooldown which limited the rate at which reading events fired, thus bringing the false events under control.

Once again, the code can be found on github.

In my next post I’ll go more in depth into the AWS framework for extracting the text and tweeting the result.

MHacks Iris

A few weeks ago, the University of Michigan hosted MHacks – “The most epic hackathon ever!”  Over 500 hackers attended the 36 hour event, drank over 1800 cans of redbull, and produced some impressive hacks.  MHacks was the first hackathon I attended and I, too, had a great time putting together an awesome hack.  Two of my good CS friends at Purdue also went, and we put together a computer vision helmet which would tweet what you read.

Image

The original plan was to make a fully-contained helmet using a raspberry pi.  The setup would have had the raspberry pi plugged in to a battery-powered cellphone charger, and a usb hub allowing the use of the two webcams and the wireless usb network adapter.  Unfortunately, while the raspberry pi’s onboard usb ports were able to power the webcams, when plugged in through our usb hub the webcams crashed continually.  We were still able to get the system up and running through a laptop though, and everyone who saw us demo it loved the project.

The high level workflow of the system was to have two webcams on the helmet.  One tracked the subject’s eye movements, trying to discern if they were reading.  If they were, then a frame would be grabbed from the outward facing camera and sent to Amazon AWS where any text would be extracted from it. The resulting text and image would then be tweeted.

The system was all written in python using opencv, tesseract, and the python interfaces to twitter and AWS.  The helmet had two webcams duct taped to it, one facing the eye, and the other facing outward.  Using opencv, we performed a number of transformations on the eye feed to get a black and white frame where the pupil stood out as a black circle.  From that we were able to find the centroid of the circle and reliably track the eye positions.  With the last few seconds worth of eye positions, we performed a simple linear regression test to see if they trended as horizontal movement.  If so, we deduced (not always as correctly as we’d have liked) that the user was reading, and we’d grab the frame from the front facing camera.  We’d then upload the frame to S3 and put the corresponding bucket and key information into an SQS queue.  From there an EC2 instance would pull the queue information, download the frame from S3, and run optical character recognition on it using tesseract.  Then, depending on the results of the OCR, we would tweet any text we read from the frame with a link to the frame on S3.

Here’s an example of our attempt to read the newspaper.  I’d say the system performed quite well, seeing as we’d thrown it all together in under 36 hours.  We still plan to mess around with the system more, improve performance, and eventually get it actually running on the raspberry pi.

The code can be found on github.

In a few upcoming posts, I’ll walk through the subsystems and how they performed.