This article excerpt is published in full form at Prophesee's website. It is reprinted here with the permission of Prophesee.
Event-based sensing is a new paradigm in imaging technology inspired by human biology. It promises to enable a smarter and safer world by improving the ability of machines to sense their environments and make intelligent decisions about what they see.
This technology will take over from the frame-based approach used by traditional film, digital and mobile-phone cameras in many machine-vision applications. Event-based solutions will make machine vision more accessible and more effective, enabling a world in which production lines run more quickly, humans and machines work together more safely, autonomous vehicles are accident-free, and drones intuitively avoid collision at high speeds.
Among the advantages of event-based sensing are:
- Fast sensing at low cost.
- Ability to operate in poor and/or variable lighting.
- Low computational, memory and communications overhead.
- Low energy use.
These advantages will enable event-based sensing for:
- Fast detection enabling faster automatic emergency braking.
- Adaptive sensing in rapidly changing lighting conditions and improved light flicker mitigation.
- Lower costs and computational overhead will enable the use of more cameras for increased redundancy and safety.
- Fast object recognition enabling higher throughputs of production lines and a shift from discrete to continuous processes.
- Fast visual feedback loops for robotics systems, to enable safer co-working between robots and humans.
- Complex multi-object tracking that is resilient to occlusions and so enables improved operational safety and optimization.
- Low-power ‘always on’ features to enable next-generation user interfaces on smart mobile devices.
- Low-power simultaneous localization and mapping for AR/VR devices.
- Fast change detection that highlights regions of interest in surveillance and monitoring scenes, reducing the amount of data that need to be analysed further.
The Strengths and Weaknesses of Conventional Imaging
We have been using conventional imaging strategies for decades, so why shift to an event-based approach now?
Conventional imaging uses a frame-based approach, in which all the pixels in a sensor measure the light falling upon them at the same time, and report their values to the support circuitry.
Do this once and, in the right lighting conditions, you get a good-quality still image. Do it more rapidly and you can fool the human brain into thinking that the sequence of still images with which it is presented is actually continuous movement.
This approach works well for humans, but is not ideal for machine-vision applications. One reason for this is that if a conventional camera observes a dynamic scene, the common frame rate it applies to all its pixels is likely to be wrong for some parts of the image.
Machine-vision systems are therefore burdened with processing large amounts of useless or bad data, using expensive, power-hungry processors, high-bandwidth communications links and memory, to no useful effect.
This brute-force approach works, within limits, for some current applications, but may not be suitable for new vision tasks that need to understand scenes in real time, or in environments with limited power, bandwidth and computing resources.
Consider a scene with a fast-moving object in front of a static background, such as a golfer addressing the ball.
When acquiring such a scene with a conventional video camera, important information about the fast movement of the arm, the club and the ball may be lost (as in the third frame), because the scene is not being sampled quickly enough, while static parts of the image, such as the tree, are oversampled, generating data that do not contain any new information.
The universal frame rate of conventional sensors means that information is often lost from areas of a scene that have been undersampled.
Over-sampling tree, grass.
Under-sampling golf club, ball
THE EVENT-BASED ALTERNATIVE
The development of event-based sensing has been inspired by our evolving understanding of how the human vision system works.
Our vision gives us a huge evolutionary advantage, at the cost of sustaining a brain powerful enough to interpret the vast amount of data it produces. Evolution’s frugal nature led to the emergence of shortcuts in the visual-processing centres of our brains to cope with this data deluge.
The photoreceptors in our eyes only report back to the brain when they detect a change in some feature of the visual scene, such as its contrast or luminance. Evolutionarily, it is far more important for us to be able to concentrate on the movement of a predator within a scene than to take repeated, indiscriminate inventories of the scene’s every detail.
Recent research on human’s ability to recognise objects suggests that humans can gather useful data from a scene that is changing at rates of up to 1000 times a second – a far higher rate than the 24, 30 or 60 frame/s that we use to represent movement on television or in movies. A huge amount of useful information is encoded in these changes, which most fixed frame-rate cameras obscure due to their low sampling rates.
Event-based sensing doesn’t use a fixed frame rate but instead relies upon each pixel to only report what it sees when it senses a significant change in its field of view.
This approach reduces the amount of redundant data transmitted by the sensor, saving processing power, bandwidth, memory and energy. It enables sensors to be built with much higher dynamic ranges than is usually the case, because each pixel automatically adapts to the incident light. And it allows relatively low-cost sensors to record events that would otherwise require conventional cameras running at up to tens of thousands of frame/s.
The Prophesee event-based sensor has further advantages. Its output is a time-continuous data stream that represents a visual event as a sequence of addresses for each pixel that senses it, and the exposures measured by each pixel at that time. This spatio-temporal data stream provides a more direct way of representing change in the event sensor’s field of view than inferring it from frame-to-frame comparisons of a standard sensor’s output.
These characteristics create opportunities to rethink today’s imaging and machine-vision tasks, and to address emerging computer-vision strategies, such as machine learning, in a new way.
For the remainder of this article, please visit Prophesee's website.