Monthly Archives: February 2013

Pupil Tracking in Iris

Last week I gave an overview of my experience at MHacks and the Computer Vision Twitter Helmet two friends and I built.  In this post, I’ll go more in-depth into the pupil tracking system of the helmet.

We had originally envisioned the helmet to run on a raspberry pi, and because of this we decided to delegate the optical character recognition portion of the system to Amazon AWS.  This meant that we couldn’t just naively grab any and all frames from the outward facing camera and extract any text from them because we wouldn’t be able send the frames to AWS for processing fast or cheaply enough.  Instead, we had to develop a heuristic to allow us to increase the probability that the frame we grabbed really did have text in it, and only when we were fairly confident of success would we send the frames to AWS for text extraction and processing.

The essence of the heuristic we developed was that sustained horizontal movement of the pupil meant the user was reading text and thus the front-facing frame was a good candidate for text extraction.  If you recall from my previous post, our helmet had two cameras attached to it – an eye-tracking and a front-facing camera.  With the eye-tracking camera fixed to the helmet we could simply track the pupil’s location in the camera’s view over time, and that would act as a proxy for the gaze analysis and allow us to discern a reading motion from ordinary pseudo-random eye motion.

With this framework, the main functional loop of the eye tracker essentially became:

eye_cam = feed from the eye camera
front_cam = feed from the front camera

pupil_locations = []
while True:
    eye_frame = eye_cam.get_frame()


    if is_horizontal(pupil_locations):
        #Give our tracking a fresh slate
        pupil_locations[:] = []
        #Only keep track of pupil locations in the past few seconds

To find the location of the pupil, we used opencv to apply a series of filters on the eye frame to convert it to a more refined black and white frame. From that, and a few assumptions about what form the pupil now took in this new representation, we calculated the center of the pupil.

To get the black and white version of the frame we performed four operations on the it:

  1. Converted the frame to greyscale while ignoring the red channel.  By removing the red channel from the greyscale version, much of the distracting effects of patches of slightly-too-dark skin were removed.
  2. Smoothed the frame to further gloss over noise and increased the contrast using opencv’s histogram equalization. This helped harden the edge between the pupil/iris and the sclera/eyelids.
  3. Applied a threshold filter to floor/ceiling the pixel data to be either white or black.
  4. Applied opencv’s dilate filter.  At this point we had the black and white frame, but the eyebrow and eyelashes sometimes remained as a patchy structure which could dominate the pupil within the frame. To remove them we used opencv’s dilate functionality to erode away much of their dark-pixel mass and then smoothed the frame once more to fully remove them.

The image below progressively shows each of the filters with the final step being the centroid calculation.

Screenshot from 2013-02-26 23:12:06

Once we had the black and white version of the frame, we did a two-pass centroid calculation to find the center of the pupil. First we found the centroid of all the black pixels in the frame. This worked pretty well, but despite our dilation efforts eyelashes and other features around the eye sometimes crept into the black-realm of the image and threw off the centroid. To alleviate that error, we then found the distribution of distances from the black pixels to that first center we’d just calculated. From that, we trimmed away all of the black pixels which were more than a standard deviation away from the first center and recalculated a new center. This reliably gave us the location of the pupil.

With a method to reliably locate the pupil we we’re then able to track it’s motion and discern if it was reading text.  To do that, we remembered the pupil locations over the previous second (one location every 0.05 seconds for a total of 20).  On that set of locations we calculated the Pearson Correlation Coefficient to determine the strength of their linearity. If it fell into certain bounds then we concluded that the user was indeed reading.


The image above shows the whole system in action.  On the left is the raw eye camera frame and on the right is the processed version of it with the red dot being the current location of the pupil, the purple circles being the past 1-second’s locations, and the large red circle estimating the entire extent of the pupil.

The whole system tracked the pupil really well and reliably fired off frames to AWS when we were reading.  It wasn’t perfect though, as during our demo I was frequently looking between multiple people’s eyes while explaining the functionality and that incorrectly fired off quite a few reading events.  To alleviate that issue, we introduced a cooldown which limited the rate at which reading events fired, thus bringing the false events under control.

Once again, the code can be found on github.

In my next post I’ll go more in depth into the AWS framework for extracting the text and tweeting the result.

MHacks Iris

A few weeks ago, the University of Michigan hosted MHacks – “The most epic hackathon ever!”  Over 500 hackers attended the 36 hour event, drank over 1800 cans of redbull, and produced some impressive hacks.  MHacks was the first hackathon I attended and I, too, had a great time putting together an awesome hack.  Two of my good CS friends at Purdue also went, and we put together a computer vision helmet which would tweet what you read.


The original plan was to make a fully-contained helmet using a raspberry pi.  The setup would have had the raspberry pi plugged in to a battery-powered cellphone charger, and a usb hub allowing the use of the two webcams and the wireless usb network adapter.  Unfortunately, while the raspberry pi’s onboard usb ports were able to power the webcams, when plugged in through our usb hub the webcams crashed continually.  We were still able to get the system up and running through a laptop though, and everyone who saw us demo it loved the project.

The high level workflow of the system was to have two webcams on the helmet.  One tracked the subject’s eye movements, trying to discern if they were reading.  If they were, then a frame would be grabbed from the outward facing camera and sent to Amazon AWS where any text would be extracted from it. The resulting text and image would then be tweeted.

The system was all written in python using opencv, tesseract, and the python interfaces to twitter and AWS.  The helmet had two webcams duct taped to it, one facing the eye, and the other facing outward.  Using opencv, we performed a number of transformations on the eye feed to get a black and white frame where the pupil stood out as a black circle.  From that we were able to find the centroid of the circle and reliably track the eye positions.  With the last few seconds worth of eye positions, we performed a simple linear regression test to see if they trended as horizontal movement.  If so, we deduced (not always as correctly as we’d have liked) that the user was reading, and we’d grab the frame from the front facing camera.  We’d then upload the frame to S3 and put the corresponding bucket and key information into an SQS queue.  From there an EC2 instance would pull the queue information, download the frame from S3, and run optical character recognition on it using tesseract.  Then, depending on the results of the OCR, we would tweet any text we read from the frame with a link to the frame on S3.

Here’s an example of our attempt to read the newspaper.  I’d say the system performed quite well, seeing as we’d thrown it all together in under 36 hours.  We still plan to mess around with the system more, improve performance, and eventually get it actually running on the raspberry pi.

The code can be found on github.

In a few upcoming posts, I’ll walk through the subsystems and how they performed.