What Is Object Detection?

This article was originally published at MathWorks’ website. It is reprinted here with the permission of MathWorks.

3 Things You Need to Know

Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When humans look at images or video, we can recognize and locate objects of interest within a matter of moments. The goal of object detection is to replicate this intelligence using a computer.

Why Object Detection Matters

Object detection is a key technology behind advanced driver assistance systems (ADAS) that enable cars to detect driving lanes or perform pedestrian detection to improve road safety. Object detection is also useful in applications such as video surveillance or image retrieval systems.

How It Works

Object Detection Using Deep Learning

You can use a variety of techniques to perform object detection. Popular deep learning–based approaches using convolutional neural networks (CNNs), such as R-CNN and YOLO v2, automatically learn to detect objects within images.

You can choose from two key approaches to get started with object detection using deep learning:

Create and train a custom object detector. To train a custom object detector from scratch, you need to design a network architecture to learn the features for the objects of interest. You also need to compile a very large set of labeled data to train the CNN. The results of a custom object detector can be remarkable. That said, you need to manually set up the layers and weights in the CNN, which requires a lot of time and training data.
Use a pretrained object detector. Many object detection workflows using deep learning leverage transfer learning, an approach that enables you to start with a pretrained network and then fine-tune it for your application. This method can provide faster results because the object detectors have already been trained on thousands, or even millions, of images.

Detecting a stop sign using a pretrained R-CNN. See example.

Whether you create a custom object detector or use a pretrained one, you will need to decide what type of object detection network you want to use: a two-stage network or a single-stage network.

Two-Stage Networks

The initial stage of two-stage networks, such as R-CNN and its variants, identifies region proposals, or subsets of the image that might contain an object. The second stage classifies the objects within the region proposals. Two-stage networks can achieve very accurate object detection results; however, they are typically slower than single-stage networks.

High-level architecture of R-CNN (top) and Fast R-CNN (bottom) object detection.

Single-Stage Networks

In single-stage networks, such as YOLO v2, the CNN produces network predictions for regions across the entire image using anchor boxes, and the predictions are decoded to generate the final bounding boxes for the objects. Single-stage networks can be much faster than two-stage networks, but they may not reach the same level of accuracy, especially for scenes containing small objects.

Overview of YOLO v2 object detection.

Object Detection Using Machine Learning

Machine learning techniques are also commonly used for object detection, and they offer different approaches than deep learning. Common machine learning techniques include:

Aggregate channel features (ACF)
SVM classification using histograms of oriented gradient (HOG) features
The Viola-Jones algorithm for human face or upper body detection

Tracking pedestrians using an ACF object detection algorithm. See example.

Similar to deep learning–based approaches, you can choose to start with a pretrained object detector or create a custom object detector to suit your application. You will need to manually select the identifying features for an object when using machine learning, compared with automatic feature selection in a deep learning–based workflow.

Machine Learning vs. Deep Learning for Object Detection

Determining the best approach for object detection depends on your application and the problem you’re trying to solve. The main consideration to keep in mind when choosing between machine learning and deep learning is whether you have a powerful GPU and lots of labeled training images. If the answer to either of these questions is no, a machine learning approach might be the better choice. Deep learning techniques tend to work better when you have more images, and GPUs decrease the time needed to train the model.

Introduction to Deep Learning: Machine Learning vs. Deep Learning (3:47)

Other Object Detection Methods

In addition to deep learning– and machine learning–based object detection, there are several other common techniques that may be sufficient depending on your application, such as:

Image segmentation and blob analysis, which uses simple object properties such as size, shape, or color
Feature-based object detection, which uses feature extraction, matching, and RANSAC to estimate the location of an object

Object detection in a cluttered scene using point feature matching. See example.

Object Detection with MATLAB

With just a few lines of MATLAB^® code, you can build machine learning and deep learning models for object detection without having to be an expert.

Automatically Label Training Images with Apps

MATLAB provides interactive apps to both prepare training data and customize convolutional neural networks. Labeling the test images for object detectors is tedious, and it can take a significant amount of time to get enough training data to create a performant object detector. The Image Labeler app lets you interactively label objects within a collection of images and provides built-in algorithms to automatically label your ground-truth data. For automated driving applications, you can use the Ground Truth Labeler app, and for video processing workflows, you can use the Video Labeler app.

Interactively Create Object Detection Algorithms and Interoperate Between Frameworks

Customizing an existing CNN or creating one from scratch can be prone to architectural problems that can waste valuable training time. The Deep Network Designer app enables you to interactively build, edit, and visualize deep learning networks while also providing an analysis tool to check for architectural issues before training the network.

With MATLAB, you can interoperate with networks and network architectures from frameworks like TensorFlow™-Keras, PyTorch and Caffe2 using ONNX™ (Open Neural Network Exchange) import and export capabilities.

Import from and export to ONNX. See example.

Automatically Generate Optimized Code for Deployment

After creating your algorithms with MATLAB, you can leverage automated workflows to generate TensorRT or CUDA^® code with GPU Coder™ to perform hardware-in-the-loop testing. The generated code can be integrated with existing projects and can be used to verify object detection algorithms on desktop GPUs or embedded GPUs such as the NVIDIA® Jetson or NVIDIA Drive platform.

If you're building AI or vision-enabled products, you've come to the right place.