Should deep learning-based computer vision processing take place in the cloud, at the edge, or both? This seemingly simple question has a complicated answer: "it depends." This article provides perspectives on the various factors you should consider, and with what priorities, when making this implementation decision for your particular project's requirements.
THe Edge-Centric Inference Evolution
Neural network acceleration is moving to the edge at a rapid pace. What used to require racks of cloud-based compute to run neural network inference can now operate directly on embedded devices such as cameras, smartphones and within vehicles. With the evolution of dedicated hardware, thereby bringing greater compute power per watt and per dollar, networks are increasingly being optimized to run predominantly-to-completely at the edge. The resultant edge-based inference delivers numerous benefits, such as reduced latency and increased privacy (by only passing relevant metadata information on to the cloud), along with low power consumption and bandwidth requirements.
Markets such as smart camera surveillance, advanced driver-assistance systems (ADAS), autonomous vehicles and mobile devices all have a need for data-driven decision-making based on inputs from real-time visual events. An autonomous vehicle, for example, contains multiple cameras for computer vision, object recognition, lane warning, driver monitoring for fatigue and other functions, as well as additional sensors (thermal imaging, radar and LiDAR for example) for fusion purposes. Processing each of these sensors' data at the edge, at least to a degree, minimizes the bandwidth required to move data around the vehicle.
"Smart Cities" are one of many examples of deep learning-based computer vision applications that benefit from an edge-centric processing approach that also harnesses the cloud.
Recently the Financial Times, McKinsey and Deloitte (among others) have published reports and articles concluding that AI is moving to the edge, creating pervasive or ambient AI. They see the future of AI as being at the edge, on embedded devices and with limitless applications. More generally, the high-level advantages and potential downsides of edge-based vision processing include the following:
- Small but powerful, reaching if not exceeding the performance of a cloud-based alternative
- High performance per area, therefore lower silicon cost
- Low incremental cost for compute density, thereby encouraging implementation ubiquity even in extremely cost-sensitive applications such as IoT devices
- Does not necessarily require any cloud connectivity, making the approach ideal for devices with low power consumption requirements, as well as those with low bandwidth and/or erratic connectivity capabilities (even completely disconnected standalone devices)
- Models must be explicitly compiled for the target hardware
- The edge approach is not suitable for initial training
- May still require significant power consumption depending on configuration
- Limited flexibility for task-specific solutions
Neural network inference originally ran solely on CPUs. As a next step, some implementations have used DSPs, FPGAs or GPUs as accelerators. Although GPU-based acceleration, for example, can be 10 or more times faster than a CPU, a dedicated hardware-based neural network accelerator can be exponentially faster than either a CPU or a GPU (Figure 2).
Figure 2. A dedicated hardware-based neural network accelerator can be exponentially faster than a GPU.
The fundamental advantage of dedicated IP for edge devices lies in its compute density, exponentially more so than with other less specialized offerings. Neural networks have high bandwidth and computation requirements, but neural network processor cores' designs allow for optimal power, performance and area (PPA), thereby maximizing the number of inferences per second with minimal silicon area.
Regardless of the processing architecture(s) chosen, deploying neural networks at the edge requires an understanding of the hardware on which they will run and the software frameworks that need to be extended, converted or designed for any available edge-based deep learning accelerators (Figure 3). Converting to a fixed-point format, for example, offers a reduction in model sizes, compute and bandwidth, often with little to no corresponding degradation in accuracy. Google's 8-bit quantized format found in Google’s TensorFlow Lite and Android is an example.
Figure 3. Efficiently deploying neural networks at the edge requires both an understanding of the hardware on which they will run and the software and data that will require optimization.
The Role of The Cloud
The cloud plays a complementary role to embedded edge processing – indeed they are symbiotic. Training, validation and retraining often run best in the cloud as these are "big tin" tasks requiring thousands of processing units. Meanwhile, the edge implementation can run a network that is task-specific, or multiple devices can run a series of neural networks – for sensor fusion, for example. So, it’s not “either or”, it’s “both and”.
That being said, whenever there are challenges to be overcome regarding latency, transmission, security or cost, edge devices can help. And by running inference on a device with a neural network accelerator smaller than a pinhead, the resultant small SoCs have big applications across the entire world of markets, including but not limited to security, retail, connected homes, education, agriculture and health.
Looking ahead, requirements for edge device capability are likely to increase. This increase will be paced by the rollout of 5G cellular data services, enabling these devices to ‘phone home’ to update their neural network models and transfer important findings to the central computer – a vital function for effective ambient/pervasive computing. Performance, power consumption and memory implications are all focuses of ongoing work, with numerous companies vying to produce the optimal device at the lowest possible cost.
We are on the edge of a future in which devices "see" and (importantly) "recognise" – as a step forward towards a data-driven future. The fourth industrial revolution won’t just be in the datacenters, it will also be in the streets, the fields, the shopping malls and the factories, in devices, robots and vehicles. It will be all-pervasive and all but invisible – and it will be ubiquitous.
Senior Director, PowerVR AI, Imagination Technologies