A few weeks ago, the University of Michigan hosted MHacks – “The most epic hackathon ever!” Over 500 hackers attended the 36 hour event, drank over 1800 cans of redbull, and produced some impressive hacks. MHacks was the first hackathon I attended and I, too, had a great time putting together an awesome hack. Two of my good CS friends at Purdue also went, and we put together a computer vision helmet which would tweet what you read.
The original plan was to make a fully-contained helmet using a raspberry pi. The setup would have had the raspberry pi plugged in to a battery-powered cellphone charger, and a usb hub allowing the use of the two webcams and the wireless usb network adapter. Unfortunately, while the raspberry pi’s onboard usb ports were able to power the webcams, when plugged in through our usb hub the webcams crashed continually. We were still able to get the system up and running through a laptop though, and everyone who saw us demo it loved the project.
The high level workflow of the system was to have two webcams on the helmet. One tracked the subject’s eye movements, trying to discern if they were reading. If they were, then a frame would be grabbed from the outward facing camera and sent to Amazon AWS where any text would be extracted from it. The resulting text and image would then be tweeted.
The system was all written in python using opencv, tesseract, and the python interfaces to twitter and AWS. The helmet had two webcams duct taped to it, one facing the eye, and the other facing outward. Using opencv, we performed a number of transformations on the eye feed to get a black and white frame where the pupil stood out as a black circle. From that we were able to find the centroid of the circle and reliably track the eye positions. With the last few seconds worth of eye positions, we performed a simple linear regression test to see if they trended as horizontal movement. If so, we deduced (not always as correctly as we’d have liked) that the user was reading, and we’d grab the frame from the front facing camera. We’d then upload the frame to S3 and put the corresponding bucket and key information into an SQS queue. From there an EC2 instance would pull the queue information, download the frame from S3, and run optical character recognition on it using tesseract. Then, depending on the results of the OCR, we would tweet any text we read from the frame with a link to the frame on S3.
Here’s an example of our attempt to read the newspaper. I’d say the system performed quite well, seeing as we’d thrown it all together in under 36 hours. We still plan to mess around with the system more, improve performance, and eventually get it actually running on the raspberry pi.
The code can be found on github.
In a few upcoming posts, I’ll walk through the subsystems and how they performed.