|
|
|||||||
|
|
|||||||
Real-Time Gesture RecognitionWe have developed a high-performance real-time gesture recognition system. Our system builds a model of the upper body on every frame to analyze gestures; it can do this at 30 frames/sec on a mid-size PC. Our system combines new algorithms with efficient embedded software design. We classify video frames into human gestures in several steps:
The most novel algorithm in our system is the graph matching algorithm. We build a graph model of the regions in the image. We then match this model against a library of poses to determine the pose in each frame. We designed our software to run effiiciently on programmable platforms. Much of that work is described in our IEEE Computer article cited below. The choice of the proper algorithms, such as graph matching, is one critical element to high performance, but we also had to optimize the software at every level of abstraction to obtain true real-time, video rate performance. We also developed an efficient, accurate background subtraction algorithm that uses motion estimation to identify small movements in the background. This technology is being commercialized by Verificon Corporation. Selected papers:
Distributed Smart CamerasWe turned our single-camera gesture recognition system into a distributed smart camera system that performs gesture recognition across multiple cameras and a distributed computer network. This system does not rely on a centralized server to compare the results from multiple cameras. It uses peer-to-peer computing to perform the various steps in the gesture recognition system across the network. Distributed smart cameras make sense because data transfer is not free: it requires bandwidth, which may be particularly scarce in wireless networks, and it requires energy. Realistic installations of multiple-camera systems cannot rely on servers; distributed computing is a realistic choice for real-world multi-camera video analysis. In turning a single-camera system into a distributed system, two control problems must be solved. First, when a person stands between two cameras, the processing nodes associated with those cameras must be able to share information. In order to reduce bandwidth, they should not pass raw video. Instead, they should transmit partial analysis.results. Second, the locus of control must be passed from node to node as the target moves around the area covered by the collection of cameras. The locus of control determines what node will fuse data if necessary and perform the final recognition steps. In order to distribute control, we must be able to split the single-camera algorithm into parts, some of which are always done locally and others that may be done remotely. Region finding and ellipse fitting are two logical points at which to split our gesture recogntion system. These representations are compact but still local enough to avoid major problems during fusion. Our methodology can be used to turn other single-camera algorithms into distributed systems. This work is part of our NSF ITR project with the University of Maryland. Selected papers:
Smart AudioWe're starting to work on distributed smart audio networks. Stay tuned for details. |
|||||||