Vision-Based Gesture Recognition: An Ideal Human Interface for Industrial Control Applications


By Brian Dipert
Embedded Vision Alliance
Senior Analyst

This article was originally published in Digi-Key's Microcontroller TechZone. An excerpt of it is reprinted here with the permission of Digi-Key.

Embedded vision, the evolution and extrapolation of computer-based vision systems that process and interpret meaning from still and video images, is poised to be the next big technology success story. Consider, for example, the image sensors and processors now commonly found in cellular phones, tablets, laptop computers and dedicated computer displays. Originally intended for video conferencing and photography, they are now being harnessed for additional applications, such as augmented reality.

Similarly, consider the burgeoning popularity of consumer surveillance systems, driven by steady improvements in cameras and their subsystems, as well as the increasingly user-friendly associated surveillance software and services. Also, as anyone who has recently shopped for an automobile already knows, image sensors are increasingly found in numerous locations around a vehicle, leveraged for parking assistance, rear-view safety, impending-collision alert, lane-departure warning, and other functions.

The same robust-featured and cost-effective image sensors, processors, memory devices, I/O transceivers, and other ICs used in the earlier-mentioned systems are equally available to developers of vision-inclusive industrial automation applications. Gesture-based human interfaces are ideal in many respects, and therefore increasingly common, in such environments. For one thing, they are immediately intuitive; why click on a mouse, or a button, or even slide your finger across a touch screen to flip pages or move within a menu page, when you can instead just sweep your hand through the air?

A gesture-based UI also dispenses with the environmental restrictions that often hamper a touch-based interface; water and other fluids, non-conductive gloves, dirt and germs, etc. However, a first-generation motion implementation such as that utilized by the Nintendo® Wii™ game console system has limitations of its own. An easy-to-lose, breakable, in-hand controller is required to implement the scheme. Additionally, the interface between the controller and the system, usually implemented via Bluetooth®, ZigBee® or some other RF wireless technology, is (like a touchscreen interface) vulnerable to functional degradation due to environmental EMI.

Instead, consider an image sensor-inclusive design. Vision-based gesture interfaces use the human body as the controller versus a dedicated piece of extra hardware, interpreting hand, arm, and other body movements. They are comparatively EMI-immune; all that you need to ensure is sufficient operator-to-equipment distance along with adequate ambient lighting. In addition to gesture-based control, and as with the earlier mentioned computers and cell phones, you can use facial recognition technology to not only "unlock" the system in response to the presence of a valid operator's visage but also custom-configure the system on the fly for any particular operator, logging into a specific user account, for example. They can also offer a more extensive suite of user control options than does a coarser-grained accelerometer- or gyroscope-based motion interface.

A Kinect case study

If your system employs a dual-image-sensor (i.e. stereo or 3-D) arrangement, your range of available gestures becomes even richer, encompassing not only horizontal and vertical movements but also depth discernment. Stereo sensor setups also enable facial recognition software to more accurately discern between a real-life human being and a photograph of a person. Microsoft® took a different approach, called structured light, to discern depth with the Kinect peripheral for the Xbox® 360 (see Figure 1).

Figure 1: Microsoft's Kinect peripheral for the Xbox 360 game console, a well-known embedded vision success story (a), combines both monochrome and Bayer-patterned full color image sensors, along with an infrared transmitter for structured light depth discernment (b). Further dissection by iFixit revealed additional component details (c). (Courtesy Microsoft and iFixit, respectively).

Kinect is one of the best-known embedded vision examples, selling eight million units in its first 60 days on the market beginning early November 2011. It is not currently an industrial automation device, at least officially, although hackers' efforts have notably broadened its usefulness beyond the game console origins. Microsoft plans to unveil an official SDK for the Windows® 7 operating system this year, along with a PC-optimized product variant. Regardless, the design trade-offs and decisions made by Microsoft are instructive to others developing vision-based user interface hardware and software.

For the remainder of this article, please visit Digi-Key's website.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.



1646 N. California Blvd.,
Suite 360
Walnut Creek, CA 94596 USA

Phone: +1 (925) 954-1411
Scroll to Top