This article was originally published at Tryolabs’ website. It is reprinted here with the permission of Tryolabs.
In the past few years, video analytics, also known as video content analysis or intelligent video analytics, has attracted increasing interest from both industry and the academic world. Thanks to the enormous advances made in deep learning, video analytics has introduced the automation of tasks that were once the exclusive purview of humans.
Recent improvements in video analytics have been a game-changer, ranging from applications that count people at events, to automatic license plate recognition, along with other more well-known scenarios such as facial recognition or smart parking.
CCTV surveillance camera detecting vehicles in real-time in order to recognize specific events, such as car accidents, and trigger alerts accordingly.
This kind of technology looks great, but how does it work and how can it benefit your business?
In this guide, you’ll discover the basic concept of video analytics, how it’s used in the real world to automate processes and gain valuable insights, and what you should consider when implementing an intelligent video analytics solutions in your organization.
What is intelligent video analytics?
The main goal of video analytics is to automatically recognize temporal and spatial events in videos. A person who moves suspiciously, traffic signs that are not obeyed, the sudden appearance of flames and smoke; these are just a few examples of what a video analytics solution can detect.
Real-time video analytics and video mining
Usually, these systems perform real-time monitoring in which objects, object attributes, movement patterns, or behavior related to the monitored environment are detected. However, video analytics can also be used to analyze historical data to mine insights. This forensic analysis task can detect trends and patterns that answer business questions such as:
- When is customer presence at its peak in my store and what is their age distribution?
- How many times is a red light run, and what are the specific license plates of the vehicles doing it?
Some known applications
Some applications in the field of video analytics are widely known to the general public. One such example is video surveillance, a task that has existed for approximately 50 years. In principle, the idea is simple: install cameras strategically to allow human operators to control what happens in a room, area, or public space.
In practice, however, it is a task that is far from simple. An operator is usually responsible for more than one camera and, as several studies have shown, upping the number of cameras to be monitored adversely affects the operator’s performance. In other words, even if a large amount of hardware is available and generating signals, a bottleneck is formed when it is time to process those signals due to human limitations.
Video analysis software can contribute in a major way by providing a means of accurately dealing with volumes of information.
Video analytics with deep learning
Machine learning and, in particular, the spectacular development of deep learning approaches, has revolutionized video analytics.
The use of Deep Neural Networks (DNNs) has made it possible to train video analysis systems that mimic human behavior, resulting in a paradigm shift. It started with systems based on classic computer vision techniques (e.g. triggering an alert if the camera image gets too dark or changes drastically) and moved to systems capable of identifying specific objects in an image and tracking their path.
Bicycle detection with the deep learning toolkit Luminoth.
For example, Optical Character Recognition (OCR) has been used for decades to extract text from images. In principle, it could suffice to apply OCR algorithms directly to an image of a license plate to discern its number. In the previous paradigm, this might work if the camera was positioned in such a way that, at the time of executing the OCR, we were certain that we were filming a license plate.
A real-world application of this would be the recognition of license plates at parking facilities, where the camera is located near the gates and could film the license plate when the car stops. However, running OCR constantly on images from a traffic camera is not reliable: if the OCR returns a result, how can we be sure that it really corresponds to a license plate?
In the new paradigm, models based on deep learning are able to identify the exact area of an image in which license plates appear. With this information, OCR is applied only to the exact region in question, leading to reliable results.
Historically, healthcare institutions have invested large amounts of money in video surveillance solutions to ensure the safety of their patients, staff, and visitors, at levels that are often regulated by strict legislation. Theft, infant abduction, and drug diversion are some of the most common problems addressed by surveillance systems.
In addition to facilitating surveillance tasks, video analytics allows us to go further, by exploiting the data collected in order to achieve business goals. For example, a video analytics solution could detect when a patient has not been checked on according to their needs and alert the staff. Analysis of patient and visitor traffic can be extremely valuable in determining ways to shorten wait times, while ensuring clear access to the emergency area.
At-home monitoring of older adults or people with health issues is another example of an application that provides great value. For instance, falls are a major cause of injury and death in older persons. Although personal medical devices can detect falls, they must be worn and are frequently disregarded by the consumer. A video analytics solution can process the signals of home cameras to detect in real time if a person has fallen. With proper setup, such a system could also determine if a person took a given medication when they were supposed to, for instance.
Mental healthcare is another area in which video analytics can make significant contributions. Systems that analyze facial expressions, body posture, and gaze can be developed to assist clinicians in the evaluation of patients. Such a system is able to detect emotions from body language and micro-expressions, offering clinicians objective information that can confirm their hypotheses or give them new clues.
The University at Buffalo developed a smartphone application designed to help detect autism spectrum disorder (ASD) in children. Using only the smartphone camera, the app tracks facial expression and gaze attention of a child looking at pictures of social scenes (showing multiple people). The app monitors the eye movements and can accurately detect children with ASD since their eye movements are different from those of a person without autism.
Smart cities / Transportation
Video analytics has proven to be a tremendous help in the area of transport, aiding in the development of smart cities.
An increase in traffic, especially in urban areas, can result in an increase in accidents and traffic jams if adequate traffic management measures are not taken. Intelligent video analysis solutions can play a key role in this scenario.
Traffic analysis can be used to dynamically adjust traffic light control systems and to monitor traffic jams. It can also be useful in detecting dangerous situations in real time, such as a vehicle stopped in an unauthorized space on the highway, someone driving in the wrong direction, a vehicle moving erratically, or vehicles that have been in an accident. In the case of an accident, these systems are helpful in collecting evidence in case of litigation.
Vehicle counting, or differentiating between cars, trucks, buses, taxis, and so on, generates high-value statistics used to obtain insights about traffic. Installing speed cameras allows for precise control of drivers en masse. Automatic license plate recognition identifies cars that commit an infraction or, thanks to real-time searching, spots a vehicle that has been stolen or used in a crime.
Real-time parking spot detection.
Instead of using sensors in each parking space, a smart parking system based on video analytics helps drivers find a vacant spot by analyzing images from security cameras.
These are just some examples of the contributions that video analysis technology can make to build safer cities that are more pleasant to live in.
A great example of video analytics used to solve real-world problems is the one of the city of New York. In order to better understand major traffic events, the New York City Department of Transportation used video analytics and machine learning to detect traffic jams, weather patterns, parking violations and more. The cameras capture the activities, process them and send real-time alerts to city officials.
The use of machine learning, and video analytics in particular, in the retail sector has been one of the most important technological trends in recent years.
Brick and mortar retailers can use video analytics to understand who their customers are and how they behave.
State-of-the-art algorithms are able to recognize faces and determine people’s key characteristics such as gender and age. These algorithms can also track customers’ journeys through stores and analyze navigational routes to detect walking patterns. Adding in the detection of direction of gaze, retailers can identify how long a customer looks at a certain product and finally answer a crucial question: where is the best place to put items in order to maximize sales and improve customer experience?
Demo of storefront attention time.
A lot of actionable information can be gathered with a video analytics solution, such as: number of customers, customer’s characteristics, duration of visit, and walking patterns. All of this data can be analyzed while taking into account its temporal nature, in order to optimize the organization of the store according to the day of the week, the seasons of the year, or holidays . In this way, a retailer can get an extremely accurate sense of who their customers are, when they visit their store, and how they behave once inside.
Video analytics is also great for developing anti-theft mechanisms. For instance, face recognition algorithms can be trained to spot known shoplifters or spot in real-time a person hiding an item in their backpack.
What is more, information extracted from video analytics can serve as input data for training machine learning models, which aim to solve larger challenges. As an example, walking patterns and the number of people in the store, can be useful information to add to machine learning powered solutions for demand forecasting, price optimization and inventory forecasting.
Marine Layer is a clothing retailer headquartered in San Francisco that deployed an intelligent video analytics solution to gain insights about customer traffic in their stores. The system they implemented automatically counts store visitors and reveals evidence about the traffic per hour or a certain day. While the company was estimating these numbers prior to the implementation of the video analytics solution, it now has 100% certainty about the them and saves time in analyzing traffic manually.
Video surveillance is an old task of the security domain. However, from the time that systems were monitored exclusively by humans to current solutions based on video analytics, much water has passed under the bridge.
Facial and license plate recognition (LPR) techniques can be used to identify people and vehicles in real-time and make appropriate decisions. For instance, it’s possible to search for a suspect both in real-time and in stored video footage, or to recognize authorized personnel and grant access to a secured facility.
Crowd management is another key function of security systems. Cutting edge video analysis tools can make a big difference in places such as shopping malls, hospitals, stadiums, and airports. These tools can provide an estimated crowd count in real time and trigger alerts when a threshold is reached or surpassed. They can also analyze crowd flow to detect movement in unwanted or prohibited directions.
Real-time people detection.
In the video above, a surveillance system was trained to recognize people in real-time. This lays the groundwork for obtaining other results. The most immediate: a count of the number of people passing by daily. More advanced goals, based on historical data, might be to determine the “normal” flow of people according to the day of the week and time of day, and generate alerts in case of unusual traffic. If the monitored area is pedestrian-only, the system could be trained to detect unauthorized objects such as motorcycles or cars and, again, trigger some kind of alert.
This is one of the great advantages of these approaches: video content analysis systems can be trained to detect specific events, sometimes with a high degree of sophistication. One such example is to detect fires as soon as possible. Or, in the case of airports, to raise an alert when someone enters a forbidden area or walks against the direction intended for passengers. Another great use case is the real-time detection of unattended baggage in a public space.
As for classic tasks such as intruder detection, they can be performed robustly, thanks to algorithms that can filter out motion caused by wind, rain, snow, or animals.
The functionality offered by intelligent video analysis grows day by day in the security domain, and this is a trend that will continue in the future.
The Danish football club Brondby was the first soccer club to officially introduce facial recognition technology in 2019 to improve safety on matchdays at its stadium. The system identifies banned people from attending games and enables staff to prevent them from entering the stadium.
How does video analytics work?
Let’s take a look at a general scheme of how a video analytics solution works. Depending on the particular use case, the architecture of a solution may vary, but the scheme remains the same.
Video content analysis can be done in two different ways: in real time, by configuring the system to trigger alerts for specific events and incidents that unfold in the moment, or in post processing, by performing advanced searches to facilitate forensic analysis tasks.
Feeding the system
The data being analyzed can come from various streaming video sources. The most common are CCTV cameras, traffic cameras and online video feeds. However, any video source that uses the appropriate protocol (e.g. RTSP: real-time streaming protocol or HTTP) can generally be integrated into the solution.
A key goal is coverage: we need to have a clear view of the entire area, and from various angles, where the events being monitored might occur. Remember, more data is better, given that it can be processed.
Central processing vs edge processing
Video analysis software can be run centrally on servers that are generally located in the monitoring station, which is known as central processing. Or, it can be embedded in the cameras themselves, a strategy known as edge processing.
The choice of cameras should be carefully considered when designing a solution. A lot of legacy software was developed with central processing capabilities only. In recent years, though, it is not uncommon to come across hybrid solutions. In fact, a good practice is to concentrate, whenever possible, real-time processing on cameras and forensic analysis functionalities on the central server.
With a hybrid approach, the processing performed by the cameras reduces the data being processed by the central servers, which otherwise could require extensive processing capabilities and bandwidth as the number of cameras increases. In addition, it is possible to configure the software to only send data about suspicious events to the server over the network, reducing network traffic and the need for storage.
Meanwhile, centralizing the data for forensic analysis allows for multiple search and analysis tools to be used, from general algorithms to ad-hoc implementations, all utilizing different sets of parameters that help to balance the noise and silence in the results obtained. Essentially, you can enter in your own algorithms to get the desired results, which is a particularly flexible and attractive scheme.
Defining scenarios and training models
Once the physical architecture is planned for and installed, it is necessary to define the scenarios on which you want to focus and then train the models that are going to detect the target events.
Vehicle crashes? Crowd flow? Facial recognition at a retail store to recognize known shoplifters? Each scenario leads to a series of basic tasks that the system must know how to perform.
An example: detect vehicles, eventually recognize their type (e.g. motorcycle, car, truck), track their trajectory frame by frame, and then study the evolution of those paths to detect a possible crash.
The most frequent, basic tasks in video analytics are:
- Image classification: select the category of an image from among a set of predetermined categories (e.g. car, person, horse, scissors, statue).
- Localization: locate an object in an image (generally involves drawing a bounding box around the object).
- Object detection: locate and categorize an object in an image.
- Object identification: given a target object, identify all of its instances in an image (e.g. find all soccer players in the image).
- Object tracking: track an object that moves over time in a video.
To know more about the basic tasks performed and the types of algorithms that are used to develop video analysis software, we recommend you read this introductory guide to computer vision.
Example of image classification.
Training models from scratch requires considerable effort. Luckily, there are a fair amount of resources available that make this a less burdensome task.
Image datasets such as ImageNet or Microsoft Common Objects in Context (COCO) are key resources that simplify the training of new models.
There are several pre-trained models available for tasks such as image classification, object detection, and facial recognition, which, thanks to transfer learning techniques, allow for the adaptation (fine tuning) of a model for a given use case. This is much less expensive than a complete training.
Finally, open source projects have been increasingly published in recent years by the community to facilitate the building of custom video analysis systems. Relying on computer vision libraries, such as the ones presented in the following paragraph, greatly helps build solutions faster and with more accuracy.
In virtually all cases, a human is needed to monitor the alerts generated by a video analysis system and decide what should be done, if anything. In this sense, these systems act as valuable support for operators, helping them to detect events that might otherwise be overlooked or take a long time to detect manually.
Open source projects
There’s no well-established library for video analytics at the moment. The ones that exist are usually some implementation of a research paper, so they tend to be hard to use in a practical context. In other cases, the libraries are meant to be easy to use but perform poorly.
The best option is to hunt for object-tracking or pose-tracking libraries and create something custom.
At Tryolabs, we use image-level algorithms like object detection and pose estimation to perform video analytics, then add our own tracking algorithm layer over them and proceed from there.
The Open Source Computer Vision Library (OpenCV) is the most well-known computer vision library. It contains a comprehensive set of machine learning algorithms to perform common tasks such as image classification, face recognition, and object detection and tracking. It is widely used by companies and research groups, as it can be used via its native C++ interface, or though Java and Python wrappers.
Since it is a general computer vision library, it is possible to implement a video analysis system with OpenCV. However, as it is not a specialized video analytics library, it may be more interesting to turn to other available libraries (depending on the use case).
As mentioned above, we’ve built our own tool at Tryolabs to perform video analytics. Luminoth is an object-detection library built in Python using TensorFlow. We’ve employed it along with OpenCV for video analytics cases such as storefront and crowd-flow analyses in the retail sector.
Demo of pose estimation and tracking at Times Square with Luminoth.
At the moment, we are working on version 2.0 of the library, which is based on PyTorch, and adds support for human pose estimation and instance segmentation. We are also planning on adding video support and tracking to the library soon.
In 2019, Facebook AI Research open sourced Detectron 2, which is the PyTorch rewrite of their well known Caffe library Detectron.
The library focuses on object detection, segmentation and pose estimation. The advantage of Detectron is its ability to not only draw bounding boxes around individual objects, but to create pixel level masks, which define the boundaries of an object.
Identifying the object boundaries with Detectron.
Detectron is a great library for research and to train custom computer vision models.
Video analytics solutions
There is a plethora of off-the-shelf solutions in video analytics, from classic security systems to more complex scenarios such as smart home or healthcare applications.
If your use case is satisfied by one of these standard solutions, they may be an option for you. Be aware that, in general, some kind of adaptation or parameterization of the software has to be done and these solutions only allow customization to a certain degree.
However, most companies aim to gain specific insights to reach individual goals with a video analytics solution, which requires more optimized software. In this case, the ideal solution is to turn to a company specializing in video analytics services, such as we do here at Tryolabs. A custom solution is likely to be more accurate and can address unusual or extremely particular use cases.
Video analytics solutions are invaluable in helping us in our daily tasks. There are a vast number of sectors that can benefit from this technology, especially as the complexity of potential applications has been growing in recent years.
From smart cities, to security controls in hospitals and airports, to people tracking for retail and shopping centers, the field of video analytics enables processes that are simultaneously more effective and less tedious for humans, and less expensive for companies.
We hope you enjoyed this post, and that you gained a better understanding of what video analytics is all about, how it works, and how you can leverage it in your organization in order to automate processes and gain valuable insights to make better decisions.
Here at Tryolabs, we have been developing machine learning solutions since 2010. Partnering up with companies in different industries let us better understand their challenges and how they can use data to drive business results. Please don’t hesitate to drop us a line if you have any questions or comments about it.