Processors for Embedded Vision


This technology category includes any device that executes vision algorithms or vision system control software. The following diagram shows a typical computer vision pipeline; processors are often optimized for the compute-intensive portions of the software workload.

ev pipeline

The following examples represent distinctly different types of processor architectures for embedded vision, and each has advantages and trade-offs that depend on the workload. For this reason, many devices combine multiple processor types into a heterogeneous computing environment, often integrated into a single semiconductor component. In addition, a processor can be accelerated by dedicated hardware that improves performance on computer vision algorithms.

General-purpose CPUs

While computer vision algorithms can run on most general-purpose CPUs, desktop processors may not meet the design constraints of some systems. However, x86 processors and system boards can leverage the PC infrastructure for low-cost hardware and broadly-supported software development tools. Several Alliance Member companies also offer devices that integrate a RISC CPU core. A general-purpose CPU is best suited for heuristics, complex decision-making, network access, user interface, storage management, and overall control. A general purpose CPU may be paired with a vision-specialized device for better performance on pixel-level processing.

Graphics Processing Units

High-performance GPUs deliver massive amounts of parallel computing potential, and graphics processors can be used to accelerate the portions of the computer vision pipeline that perform parallel processing on pixel data. While General Purpose GPUs (GPGPUs) have primarily been used for high-performance computing (HPC), even mobile graphics processors and integrated graphics cores are gaining GPGPU capability—meeting the power constraints for a wider range of vision applications. In designs that require 3D processing in addition to embedded vision, a GPU will already be part of the system and can be used to assist a general-purpose CPU with many computer vision algorithms. Many examples exist of x86-based embedded systems with discrete GPGPUs.

Digital Signal Processors

DSPs are very efficient for processing streaming data, since the bus and memory architecture are optimized to process high-speed data as it traverses the system. This architecture makes DSPs an excellent solution for processing image pixel data as it streams from a sensor source. Many DSPs for vision have been enhanced with coprocessors that are optimized for processing video inputs and accelerating computer vision algorithms. The specialized nature of DSPs makes these devices inefficient for processing general-purpose software workloads, so DSPs are usually paired with a RISC processor to create a heterogeneous computing environment that offers the best of both worlds.

Field Programmable Gate Arrays (FPGAs)

Instead of incurring the high cost and long lead-times for a custom ASIC to accelerate computer vision systems, designers can implement an FPGA to offer a reprogrammable solution for hardware acceleration. With millions of programmable gates, hundreds of I/O pins, and compute performance in the trillions of multiply-accumulates/sec (tera-MACs), high-end FPGAs offer the potential for highest performance in a vision system. Unlike a CPU, which has to time-slice or multi-thread tasks as they compete for compute resources, an FPGA has the advantage of being able to simultaneously accelerate multiple portions of a computer vision pipeline. Since the parallel nature of FPGAs offers so much advantage for accelerating computer vision, many of the algorithms are available as optimized libraries from semiconductor vendors. These computer vision libraries also include preconfigured interface blocks for connecting to other vision devices, such as IP cameras.

Vision-Specific Processors and Cores

Application-specific standard products (ASSPs) are specialized, highly integrated chips tailored for specific applications or application sets. ASSPs may incorporate a CPU, or use a separate CPU chip. By virtue of their specialization, ASSPs for vision processing typically deliver superior cost- and energy-efficiency compared with other types of processing solutions. Among other techniques, ASSPs deliver this efficiency through the use of specialized coprocessors and accelerators. And, because ASSPs are by definition focused on a specific application, they are usually provided with extensive associated software. This same specialization, however, means that an ASSP designed for vision is typically not suitable for other applications. ASSPs’ unique architectures can also make programming them more difficult than with other kinds of processors; some ASSPs are not user-programmable.

New Arm IP Delivers True Digital Immersion For the 5G Era

News Highlights: Arm Cortex-A78 CPU: Transforming next-generation smartphone experiences with 20% sustained performance gains Arm Cortex-X Custom program: New program offers custom Cortex solutions and delivers the Arm Cortex-X1 with a 30% increase in peak performance Arm Mali-G78 GPU: The highest-performing Mali GPU based on Valhall architecture with a 25% increase in performance Arm Ethos-N78 NPU:

Read More »

Upcoming Virtual Seminar Series Discusses Smart Devices That Understand and Seamlessly Communicate With the World Around Them

On June 2 (Communication), 3 (Connectivity and Sensing), 9 (AI and Vision), and 10 (Wireless Audio), Edge AI and Vision Alliance Member company CEVA will deliver a four-part virtual seminar series focusing on how to create smart devices that understand the world around them and seamlessly communicate. Also included in the event, and available in

Read More »

Unified Programming Model Critical to Uncompromised Application Performance, Saves Time and Money, Study Finds

What’s New: New computing accelerators are rapidly emerging, and organizations need to examine time and financial considerations associated with developing performance-sensitive applications that can run on both new and existing computing platforms. Commissioned by Intel, a recent research report from J.Gold Associates, “oneAPI: Software Abstraction for a Heterogeneous Computing World,” discusses the importance of application

Read More »

MediaTek’s New Dimensity 820 Chip Brings Incredible 5G Experiences to Smartphones

The Dimensity 820 delivers, ultra-fast 5G speeds, impressive power-efficiency and seamless connectivity TAIPEI, Taiwan, May 18, 2020 /PRNewswire/ — MediaTek today announced the Dimensity 820 system-on-chip (SoC) which is optimized for premium user experiences. The Dimensity 820 delivers ultra-fast 5G speeds, and is feature-packed with MediaTek’s latest multimedia, AI and imaging innovations. “Our Dimensity 1000

Read More »

NVIDIA’s New Ampere Data Center GPU in Full Production

New NVIDIA A100 GPU Boosts AI Training and Inference up to 20x; NVIDIA’s First Elastic, Multi-Instance GPU Unifies Data Analytics, Training and Inference; Adopted by World’s Top Cloud Providers and Server Makers SANTA CLARA, Calif., May 14, 2020 (GLOBE NEWSWIRE) — NVIDIA today announced that the first GPU based on the NVIDIA® Ampere architecture, the

Read More »

NVIDIA Releases Jetson Xavier NX Developer Kit with Cloud-Native Support

Cloud-Native Support Comes to Entire Jetson Platform Lineup, Making It Easier to Build, Deploy and Manage AI at the Edge Thursday, May 14, 2020—GTC 2020—NVIDIA today announced availability of the NVIDIA® Jetson Xavier™ NX developer kit with cloud-native support — and the extension of this support to the entire NVIDIA Jetson™ edge computing lineup for

Read More »

Foxconn Partners with Socionext and Hailo to Launch Next-Generation AI Processing Solution for Video Analytics at the “Edge”

Foxconn’s Enhanced “BOXiedge” Edge Computing Solution Offers Market-leading Energy-efficiency for Standalone AI Inference Nodes [Taipei, Taiwan, Yokohama, Japan, and Tel Aviv, Israel. May 12, 2020] – Foxconn, a global leader in smart manufacturing, is joining Socionext, a major provider of advanced SoC solutions for video and imaging systems, and leading artificial intelligence (AI) chipmaker Hailo

Read More »

Qualcomm Addresses Growing Demand for 5G by Announcing New Snapdragon 768G Mobile Platform

Snapdragon 768G is Designed to Deliver Immersive Gaming Experiences and Truly Global 5G May 10, 2020 – San Diego – Qualcomm Technologies, Inc. announced the Qualcomm® Snapdragon™ 768G Mobile Platform, a follow-on to the Snapdragon 765G. Snapdragon 768G is designed to bring next-level performance that enables smart, immersive gaming experiences with the integration of truly

Read More »

MediaTek Unveils 5G-Integrated Dimensity 1000+ Chip for Smartphones

Dimensity 1000+ packs 5G connectivity and gaming, video and power-saving technology enhancements for flagship-grade user experiences HSINCHU, Taiwan – May 7, 2020 – MediaTek today announced enhancements to its Dimensity 5G chipset family with the Dimensity 1000+, an enhanced 5G-integrated chip with a number of leading technologies and upgrades for gaming, video and power-efficiency. Dimensity 1000+ is based

Read More »

Khronos Group Releases OpenCL 3.0

Provisional Specifications publicly available today for industry feedback Enhanced deployment flexibility sets stage for new pervasively available core functionality IWOCL – April 27, 2020 – 6:00 AM GMT – Today, The Khronos® Group, an open consortium of industry-leading companies creating advanced interoperability standards, publicly releases the OpenCL™ 3.0 Provisional Specifications. OpenCL 3.0 realigns the OpenCL

Read More »

Codeplay Implements MKL-BLAS for NVIDIA GPUs Using SYCL and DPC++

This blog post was originally published at Codeplay Software’s website. It is reprinted here with the permission of Codeplay Software. Software developers are looking more than ever at how they can accelerate their applications without having to write optimized processor specific code. SYCL is the industry standard for C++ acceleration, giving developers a platform to

Read More »

Chips&Media Now Reveals c.WAVE120 – New Generation of Super-Resolution HW IP

Deep learning-based neural network SR IP Capable of processing 8K 60fps output images at 550MHz High performance and low power consumption SEOUL, April 23rd, 2020 – Chips&Media, the leading global hardware IP provider, today announced the launch of c.WAVE120, which is a deep learning-based neural network, super-resolution IP that upscales the low-resolution data into high-resolution

Read More »

Speeding Up Deep Learning Inference Using TensorRT

This article was originally published at NVIDIA’s website. It is reprinted here with the permission of NVIDIA. This is an updated version of How to Speed Up Deep Learning Inference Using TensorRT. This version starts from a PyTorch model instead of the ONNX model, upgrades the sample application to use… Speeding Up Deep Learning Inference

Read More »

Accelerating WinML and NVIDIA Tensor Cores

This article was originally published at NVIDIA’s website. It is reprinted here with the permission of NVIDIA. Every year, clever researchers introduce ever more complex and interesting deep learning models to the world. There is of course a big difference between a model that works as a nice demo in… Accelerating WinML and NVIDIA Tensor

Read More »

CEVA Announces Industry’s First High Performance Sensor Hub DSP Architecture

SensPro™ family serves as hub for processing and fusing of data from multiple sensors including camera, Radar, LiDAR, Time-of-Flight, microphones and inertial measurement units Highly-configurable and self-contained architecture brings together scalar and parallel processing for floating point and integer data types, as well as deep learning training and inferencing MOUNTAIN VIEW, Calif., – April 7,

Read More »

BrainChip Introduces Company’s Event-Based Neural-Network IP and NSoC Device at Linley Processor Virtual Conference

AKD1000 is the first event-based processor for Edge AI with ultra-low power consumption and continuous learning APRIL 2, 2020–SAN FRANCISCO–(BUSINESS WIRE)– BrainChip Holdings Ltd. (ASX: BRN), a leading provider of ultra-low power, high performance edge AI technology, today announced that it will be introducing its AKD1000 to audiences at the Linley Fall Processor Virtual Conference

Read More »

Speeding Up Deep Learning Inference Using TensorFlow, ONNX, and TensorRT

This article was originally published at NVIDIA’s website. It is reprinted here with the permission of NVIDIA. Starting with TensorRT 7.0,  the Universal Framework Format (UFF) is being deprecated. In this post, you learn how to deploy TensorFlow trained deep learning models using the new TensorFlow-ONNX-TensorRT workflow. Figure 1 shows… Speeding Up Deep Learning Inference

Read More »

Application Processor Unit (APU) Quarterly Market Monitor

Application processor: All-in-one solution for the computing challenges of the next decade MARKET DYNAMICS: 2019 APU market closed with total revenue of $31B. Seasonally weak Q1-20 expected to remain above $7B even as COVID-19 stresses the supply chain. Cost & ASP declines at ~20% per year through 2021; slowing to ~10% per year for 2022+.

Read More »

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.



1646 North California Blvd.,
Suite 360
Walnut Creek, CA 94596 USA

Phone: +1 (925) 954-1411
Scroll to Top