This blog post was originally published by Teleidoscope. It is reprinted here with the permission of Teleidoscope.

Tracking: To follow a uniquely identified region of interest in a sequence of images.

Think about the first 15 minutes of your morning. You open your eyes, get out of bed. Maybe you’re making a cup of coffee, some breakfast or are getting started with your morning exercise.

Whatever this morning routine looks like for you, it’s certain you’ve had to navigate the world in a way that is extremely difficult for computers to replicate or understand. You know what room you’re in, and where you are in that room. You know which of the items in your bathroom is a toothbrush, and which is the toothpaste. Your eyes can even track the toothbrush in realtime as you move it towards your mouth. This is crazy stuff!

Our brains do this everyday, and conceptually tracking is a very simple idea, but it’s one of the most challenging problems in computer vision. With fast high-resolution cameras becoming the new norm and things like autonomous vehicles, and augmented reality taking hold, tracking is becoming even more important.

When people mention tracking, they’re usually referring to one of the three methods described below.

Continuous Detection Based Tracking

Using pre-trained data to search an image for regions containing a specific type of object.

This is one of the most widely used methods of tracking an object and is the same approach used when your phone highlights faces while taking a photo.

As the name implies, this method runs a detector on every frame, looking for a specific object. This can be very computationally expensive because detectors generally don’t preserve information from previous frames and have to search the entire image to find new object locations. This also means that if the tracked object suddenly changes in appearance and the detector doesn’t recognize it, the identity of the tracked object may be lost and re-assigned by the detector.

This approach can be appropriate for situations where the identity of an object is not important over long periods, the object is likely to be in the scene, the object can be ignored when occluded or the tracking duration is short.

In the video below, notice the lag and difference in actual vs desired FPS.

Continuous Detection-based Tracking — Run On CPU

Camera Position Tracking (AKA world tracking)

Using pixel movement between consecutive images to estimate how the camera is moving. This is often combined with an Inertial Measurement Unit(IMU) to perform Visual Inertial Odometry(VIO).

This is the approach used in many augmented reality applications, and is how you are able to move around digital objects. Augmented reality devices make a map of the space around you and can then understand where you are in relation to that space.

Unlike the other tracking methods that report an object’s location in an image, this method reports the camera location relative to how it’s moving in the world. An object’s location is just the position the camera was in when the object was detected. This means that you can move around an object, but you can’t have the object move around you without having an external detector or tracker that knows where the object is.

This is because this approach works by extracting interesting points all over the image and measuring how they move between frames and then fusing that data with the output of an IMU. This provides an extremely accurate and fast measurement of how the camera has moved relative to where it started.

This approach is best when you need extremely accurate tracking of arbitrary positions in space relative to the cameras movement around a stationary object, are not moving at high speeds and are in in a feature rich environment.

This tracking method is not well suited for tracking moving objects

Example of Camera Position Tracking

Region of Interest Tracking

Using the information about the object as it appears in one frame to estimate its position in the next frame.

This method works by using information from previous frames to estimate where the object is likely to be in the current frame. This allows the algorithm to only search a small area instead of searching the full image like a detector.

Position estimating algorithms are much less computationally expensive, more robust to appearance changes than a detector and do not require a large amounts of data to train. They can be initialized with a single patch from a live video and track it through successive frames.

This approach is best when you need to track arbitrary moving/stationary objects or regions of an image without requiring prior training data.

Region of Interest Tracking Running On CPU

Camera Position Tracking is becoming more common place. We’re now used to seeing machines accurately tracking their own movement around an object in the environment. The next logical step is to focus on building robust solutions that allow machines to track arbitrary objects moving around them.

how would you track something arbitrary without any pre-trained data?

This is exactly what Region of Interest Tracking aims to solve. It’s a problem that has existed for a few decades and still has a long way to go, but has become very relevant recently due to requirements in fields such as augmented reality, security, and autonomous transportation.

what if you want to track something specific like a key, a boat, or a face?

Some applications try to solve this by running a detector periodically and adjusting based on the camera location at the time of detection. This is fine if the object doesn’t move much but doesn’t work well with continuous motion. Continuous Detection Based Tracking is too slow and computationally expensive for many applications.

The optimal solution is a hybrid solution. A detector should be run once at the beginning of the session and pass the location of the object to a robust Region of Interest Tracker. This can also be combined with a Camera Position Tracker if available. Ultimately this solution is extremely reliant on a good Region of Interest Tracker.

Our tracking solution (TIO)

At Teleidoscope we recognize how important a good Region Of Interest Tracker is to the success of applications looking to track moving objects, whether they’re specific, or arbitrary. With this in mind, we’ve focused our time and effort on building an extremely fast and robust Region of Interest Tracker called TIO that can handle difficult situations that others can not.

Comparing Our Tracker to Other State Of the Art Trackers

Know someone who needs a robust Region Of Interest tracker, or a specific object tracking solution? Please feel free reach out to me at matt@teleidoscope.com.

Matt Rabinovitch
Founder and CEO, Teleidoscope

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.



1646 North California Blvd.,
Suite 360
Walnut Creek, CA 94596 USA

Phone: +1 (925) 954-1411
Scroll to Top