Computer Vision Evolves Towards Ubiquity

This column was originally published at Vision Systems Design's website. It is reprinted here with the permission of PennWell.

For most of its history, computer vision was a topic of academic research, gaining its first sizable commercial success in factory automation applications, where it has become an essential technology. Nevertheless, vision has remained a niche technology – one that most people do not directly use or interact with on a daily basis.

Key to an understanding of how computer vision will further evolve is the realization that it is an enabling technology, not an end in itself. As with other technologies such as speech recognition, vision will eventually become ubiquitous and, in the process, "invisible." Similar to other technologies that have already become success stories, as computer vision technology improves, new applications are enabled. Some of these applications become successful, and this success encourages further industry investment to further improve the underlying technology.

Now, however, thanks to the emergence of cost-effective processors, image sensors and other semiconductor devices, along with robust algorithms, computer vision can be incorporated into a range of systems – including cost, size and power-constrained devices. The Embedded Vision Alliance, a worldwide organization of computer vision hardware, software and service providers founded in 2011, uses the term "embedded vision" to refer to this growing use of computer vision technology in a range of embedded systems, mobile devices, PCs and the cloud.

Algorithms have been developed and refined for decades and are widely available in both proprietary suites and open-source collections such as OpenCV. These sometimes work well, especially when used for the specific tasks for which they were originally designed. However, classical computer vision algorithms are challenged by numerous real-life factors. Many potential applications are plagued by infinitely varying inputs, which combined with the lack of underlying theoretical models of visual perception, lead to the need for exhaustive experimentation to create robust solutions. Uncontrolled environmental conditions – lighting, orientation, motion and occlusion – translate into ambiguity, leading to complex, multi-layered algorithms.

Deep neural networks promise to assist. Originally developed for object classification tasks, their use has now expanded to include detection, segmentation and other vision functions. While an underlying, all-encompassing theory of visual perception is still lacking, deep learning provides more general solutions to solve various computer vision problems. In the past, large numbers of computer programmers were required to code image-processing algorithms. In the emerging deep learning era, in contrast, a large amount of data is required to train algorithms to classify and identify objects. One thing that has not changed is the need for lots of runtime compute horsepower to execute vision algorithms. Now, however, we also need a lot of compute horsepower for the pre-deployment training required by deep learning algorithms.

For many future vision applications, algorithms will likely converge around various (and in some cases multiple) deep neural networks. Classic vision algorithms will not disappear, but they will likely converge on a smaller range of functions for specific tasks and the processing architectures that run them will similarly evolve. Much industry debate centers on whether "local" or "cloud" processing will dominate. In the era of increasingly pervasive and fast network connectivity, the most common answer will be "both." Both local and cloud processors will become increasingly heterogeneous, harnessing combinations of CPUs, GPUs, DSPs, FPGAs, and specialized imaging, vision and neural network co-processors.

Thankfully, APIs such as OpenCL enable the efficient use of such heterogeneous processors. Even higher-level APIs, such as OpenVX promise to further abstract both the processors used and the underlying algorithms. Enabled by higher levels of abstraction, the focus of vision software development will shift from implementation to integration, which will enable the development of a larger number of applications, helping computer vision to become both ubiquitous and "invisible." In the process, vision will create value both for technology suppliers and the implementers who leverage the technology in their applications.

Jeff Bier
Founder, Embedded Vision Alliance
Co-founder and President, BDTI (Berkeley Design Technology, Inc.)

If you're building AI or vision-enabled products, you've come to the right place.

Computer Vision Evolves Towards Ubiquity

Pages

Topics

Contact

Address

Phone