Eyepatch: Camera based interaction
Eyepatch is a design tool for camera based interaction. It allows designers to extract useful information from video input without any specialized computer vision programming. The extracted data can be sent as a live stream to other rapid prototyping tools like Flash and Visual Basic. Data is extracted from video using classifiers. In an example, a designer building an interactive whiteboard application has mounted a camera above the white board. The objective of this is to know when someone is in front of the board and whether or not they are looking at it. she can do this using some built-in classifiers.
Face detection classifier and background subtraction classifier – eye patch shows the detected regions. After activating an output, such as XML over TCP, eyepatch begins streaming the output over a network socket. A TelNet terminal can be opened to show the formatted output steam. This XML data can be read into flash with a short snippet of action script.
In many cases, users need to train their own classifiers that are tailored to their application and its context. They can do this by selecting examples from recorded videos and training a classifier from those examples. Eyepatch provides seven different types of classifiers.
The colour classifier: extracts a hue histogram from the captured examples. When a trained colour classifier is run on an input frame, it computes a back projection image in which the intensity of each pixel corresponds to its probability of coming from the hue distribution of the training set. The brightest contiguous regions fromt he image are then output from the classifier. This can be used to track objects that are distinctively coloured.
The brightness classifier: works in a similar fashion. It looks at the brightness distribution of the examples and finds regions of the image of a similar brightness. This can be used to track laser pointers or flashlights.
The shape classifier: uses canny edge detection followed by contour matching, using ParaWise geometrical histograms. In an example, the user can train a recognizer for a USB key. After adding it to the example set (highlighting the area) the contour of the object is extracted and when the classifier is run on input frame, it finds any contours in the frame that closely resemble the contours in the examples. This works best against simple backgrounds and objects with distinctive outer contours.
SIFT classifier: Users can also train classifiers based on scale and varient feature transforms (SIFTs). These classifiers have better invariance to object pose, scale and illumination. In an example, the SIFT features are extracted from the bottle logo (highlighting). SIFT can be more reliable that shape matching in situations when the background is messy and the object has less clearly defined contours. SIFT works well for recognizing specific objects but for training recognizers for a general class of objects, such as faces or cars, we provide a classifier based on the adaboost classifier: A machine learning algorithm that builds a boosted cascade of classifier based on simple features.
In an example, the user is training a car recognizer. She selects a collection of positive recognition examples by left clicking and dragging the mouse and negative examples by right clicking. In general, the more examples that are added to the training set, the better the classifier that will be produced. Extra examples can also be loaded from the collections of static images.
Motion classifiers: Dynamic events can be recognized with motion classifiers which use segmented motion-history images to characterize motion. In an example, someone is waving his hand back and forth. After selecting a portion of the video, in which the hand is moving to the right, the system detects motion to the right. The motion components of the input frame are compared against the motion directions in the examples. And regions that match are output from the classifier. Here, the classifier that was just trained to recognize motion to the right is run on live video. And you can see that the only motion to the right is shown in the classifier output frame in the image below.
Gesture Classifier: Sometimes, instantaneous motion direction is not enough and the user wishes to recognize patterns of motion over time. This could be accomplished with the Gesture Classifier which uses blob detection followed by motion trajectory matching using the condensation algorithm. Upon switching into gesture mode, the system quickly runs through the video, performs blob detection and captures the motion trajectories of the detected blobs. The user can then scroll through the video and see these motion trajectories as green trails overlaid on the video. A particular motion trajectory can be added to the set of training examples by highlighting that range of video frames and clicking “grab range” (button). In an example, the user trains two different gesture types: an up and down motion and a circular motion. When the train classifier is run on a sequence of input frames, the system shows the match probabilities for each gesture.
Eyepatch uses interactive machine learning to allow designers to create, test and refine their own customizes computer vision algorithms without writing any specialized code. It’s diversity of classification strategies makes it adaptable to a wide variety of camera-based applications.