“Data Collection in the Wild,” a Presentation from BMW Group

Vladimir Haltakov, Self-Driving Car Engineer at BMW Group, presents the “Data Collection in the Wild” tutorial at the May 2021 Embedded Vision Summit.

In scientific papers, computer vision models are usually evaluated on well-defined training and test datasets. In practice, however, collecting high-quality data that accurately represents the real world is a challenging problem. Developing models using a non-representative dataset will give high accuracy during testing, but the model will perform poorly when deployed in the real world.

In this presentation, Haltakov discusses the challenges, common pitfalls and possible solutions for creating datasets for real-world problems. He also discusses how to avoid typical biases while curating the data, and dives deep into imbalanced distributions and presents techniques on how to handle them. Finally, he discusses strategies to detect and deal with model drift after a model is deployed in production.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.



1646 N. California Blvd.,
Suite 360
Walnut Creek, CA 94596 USA

Phone: +1 (925) 954-1411
Scroll to Top