This blog post was originally published by Teleidoscope. It is reprinted here with the permission of Teleidoscope.
In our previous post we discuss the various types of computer vision based tracking. At Teleidoscope we’ve dedicated significant time and effort to building a fast and robust Region of Interest (ROI) tracker which we license as is, and build custom solutions on top of based on client needs.
This post examines and compares state of the art ROI trackers, including our own.
Recap: Region of Interest Tracking
ROI trackers are used for tracking arbitrary moving objects or regions of an image. This can be done with or without prior training data.
One example of this might be tracking something like a surfer with a consumer drone or using a gimbal to help a camera follow a person.
TIO — Tracking a surfer
This method works by using information from previous frames to estimate where the object is likely to be in the current frame. This allows the algorithm to only search a small area instead of searching the full image like a traditional detector based approach.
Comparisons
There are many ROI tracking solutions out there, but most don’t work well in every day situations. Below we have benchmarked and compared our tracking against 3 others that are considered state of the art. We will start with the video comparisons, then go into the details of each and the pros and cons.
Comparing our tracker (TIO) to others considered state of the art
Comparing our tracker (TIO) to others considered state of the art
CSRT
This tracker works by training a correlation filter with compressed features (HoG and Colornames). The filter is then used to search the area around the the last known position of the object in successive frames.
Pros: – Slower but more accurate than KCF – Robust to unpredictable motion – Trained on a single image patch – Can recover from failures when the object hasn’t moved much – Can tolerate intermittent frame drops – Reports unrecoverable failures – Adapts to scale, deformation and rotation – Manually adjustable parameters
Cons: – Does not recover well from failures due to full occlusion – Latches onto surrounding regions when partially occluded resulting in drift – Does not recover when objects are changed out of view – Does not recover from multiple consecutive failures – Does not incorporate motion into estimation
KCF
This tracker works by training a filter with patches containing the object as well as nearby patches that do not. This allows the tracker to search the area around the previous position and exploit the fact that nearby patches are likely to contain the object.
Pros: – 1.5–2x faster than CSRT and ~10x faster than TLD – Adapts to scale and rotation – Trained on a single image patch – Aggressive failure reporting – Manually adjustable parameters – Supports custom feature extractor
Cons: – Does not recover from failures well – Does not recover when objects are changed out of view – Does not recover from multiple consecutive failures – Does not incorporate motion into estimation
TLD
This tracker works by training a classifier that is used to re-detect the object and correct tracking errors.
Pros: – Recovers from from full occlusion – Trained on a single image patch – Adapts to scale and deformation – Searches the entire image on failures making it good for ‘general location’ reporting
Cons: – Very frequent false positives – Very unstable scale estimation – Does not report failures well – Very slow comparatively (60–100ms)
TIO (ours)
This tracker works by learning the texture, color, shape and surroundings of an object as they change over time.
Pros: – Recovers from full occlusion – Long term tracking – Remains stable under partial occlusion – Adapts to scale, deformation and rotation – Supports manual or automatic ideal parameter tuning – Supports manual feature selection – Is able to recover from changes occurring to an objects out of view – Trained on a single image patch – Reports when it is searching for a lost object after a soft failure – Supports multiple object initialization and tracking – Auto-detects drift or learning errors and corrects if needed – Can run in realtime on mobile hardware – Can take advantage of depth sensors or IMUs to improve accuracy – Robust to severe frame drops and long term target invisibility – Optimized to support high FPS and high resolution cameras – Can take advantage of camera focus and exposure events if available – Can be initialized on moving objects
Cons: – Large objects that take up more than 40% of the image may result in a drop in performance
TIO provides the accuracy of CSRT, the speed of KCF and is able to recover from failures without constant false positives better than TLD. Know someone that needs a robust tracking solution? Please reach out to me at [email protected].
Written with the help of our lead computer vision engineer Eric Lundquist
Matt Rabinovitch
Founder and CEO, Teleidoscope