Object Recognition: 3 Things You Need to Know

This article was originally published at MathWorks’ website. It is reprinted here with the permission of MathWorks.

What Is Object Recognition?

Object recognition is a computer vision technique for identifying objects in images or videos. Object recognition is a key output of deep learning and machine learning algorithms. When humans look at a photograph or watch a video, we can readily spot people, objects, scenes, and visual details. The goal is to teach a computer to do what comes naturally to humans: to gain a level of understanding of what an image contains.

Figure 1. Using object recognition to identify different categories of objects.

Object recognition is a key technology behind driverless cars, enabling them to recognize a stop sign or to distinguish a pedestrian from a lamppost. It is also useful in a variety of applications such as disease identification in bioimaging, industrial inspection, and robotic vision.

Object Recognition vs. Object Detection

Object detection and object recognition are similar techniques for identifying objects, but they vary in their execution. Object detection is the process of finding instances of objects in images. In the case of deep learning, object detection is a subset of object recognition, where the object is not only identified but also located in an image. This allows for multiple objects to be identified and located within the same image.

Figure 2. Object recognition (left) and object detection (right).

How Object Recognition Works

You can use a variety of approaches for object recognition. Recently, techniques in machine learning and deep learning have become popular approaches to object recognition problems. Both techniques learn to identify objects in images, but they differ in their execution.

Figure 3: Machine learning and deep learning techniques for object recognition.

The following section explains the differences between machine learning and deep learning for object recognition, and it shows how to implement both techniques.

Object Recognition Techniques

Object Recognition Using Deep Learning

Deep learning techniques have become a popular method for doing object recognition. Deep learning models such as convolutional neural networks, or CNNs, are used to automatically learn an object’s inherent features in order to identify that object. For example, a CNN can learn to identify differences between cats and dogs by analyzing thousands of training images and learning the features that make cats and dogs different.

There are two approaches to performing object recognition using deep learning:

Training a model from scratch: To train a deep network from scratch, you gather a very large labeled dataset and design a network architecture that will learn the features and build the model. The results can be impressive, but this approach requires a large amount of training data, and you need to set up the layers and weights in the CNN.
Using a pretrained deep learning model: Most deep learning applications use the transfer learning approach, a process that involves fine-tuning a pretrained model. You start with an existing network, such as AlexNet or GoogLeNet, and feed in new data containing previously unknown classes. This method is less time-consuming and can provide a faster outcome because the model has already been trained on thousands or millions of images.

Deep learning offers a high level of accuracy but requires a large amount of data to make accurate predictions.

Figure 4: Deep learning application showing object recognition of restaurant food.

Object Recognition Using Machine Learning

Machine learning techniques are also popular for object recognition and offer different approaches than deep learning. Common examples of machine learning techniques are:

HOG feature extraction with an SVM machine learning model
Bag-of-words models with features such as SURF and MSER
The Viola-Jones algorithm, which can be used to recognize a variety of objects, including faces and upper bodies

Machine Learning Workflow

To perform object recognition using a standard machine learning approach, you start with a collection of images (or video), and select the relevant features in each image. For example, a feature extraction algorithm might extract edge or corner features that can be used to differentiate between classes in your data.

These features are added to a machine learning model, which will separate these features into their distinct categories, and then use this information when analyzing and classifying new objects.

You can use a variety of machine learning algorithms and feature extraction methods, which offer many combinations to create an accurate object recognition model.

Figure 5: Machine learning workflow for object recognition.

Using machine learning for object recognition offers the flexibility to choose the best combination of features and classifiers for learning. It can achieve accurate results with minimal data.

Machine Learning vs. Deep Learning for Object Recognition

Determining the best approach for object recognition depends on your application and the problem you’re trying to solve. In many cases, machine learning can be an effective technique, especially if you know which features or characteristics of the image are the best ones to use to differentiate classes of objects.

The main consideration to keep in mind when choosing between machine learning and deep learning is whether you have a powerful GPU and lots of labeled training images. If the answer to either of these questions is No, a machine learning approach might be the best choice. Deep learning techniques tend to work better with more images, and a GPU helps to decrease the time needed to train the model.

Figure 6: Key factors for choosing between deep learning and machine learning.

Other Object Recognition Methods

Other more basic approaches to object recognition may be sufficient depending on the application.

Template matching – which uses a small image, or template, to find matching regions in a larger image
Image segmentation and blob analysis – which uses simple object properties, such as size, color, or shape

Typically, if an object can be recognized using a simple approach like image segmentation, it’s best to start by using the simpler approach. This can provide a robust solution that does not require hundreds or thousands of training images or an overly complicated solution.

Object Recognition with MATLAB

Deep Learning and Machine Learning

With just a few lines of MATLAB^® code, you can build machine learning and deep learning models for object recognition without having to be an expert.

Using MATLAB for object recognition enables you to be successful in less time because it lets you:

Use your domain expertise and learn data science with MATLAB:You can use MATLAB to learn and gain expertise in the areas of machine learning and deep learning. MATLAB makes learning about these fields practical and accessible. In addition, MATLAB enables domain experts to create object recognition models – instead of handing the task over to data scientists who may not know your industry or application.
Use apps to label data and build models:MATLAB lets you build machine learning and deep learning models with minimal code.With the Classification Learner app, you can quickly build machine learning models and compare different machine learning algorithms without writing code.
Using the Image Labeler app, you can interactively label objects within images and automate ground truth labeling within videos for training and testing deep learning models. This interactive and automated approach can lead to better results in less time.
Integrate object recognition in a single workflow:MATLAB can unify multiple domains in a single workflow. With MATLAB, you can do your thinking and programming in one environment. It offers tools and functions for deep learning and machine learning, and also for a range of domains that feed into these algorithms, such as robotics, computer vision, and data analytics.

MATLAB automates deploying your models on enterprise systems, clusters, clouds, and embedded devices.

If you're building AI or vision-enabled products, you've come to the right place.