“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentation from Tryolabs

Nicolás Eiris, Machine Learning Engineer at Tryolabs, presents the “Data Versioning: Towards Reproducibility in Machine Learning” tutorial at the May 2022 Embedded Vision Summit.

Surprisingly in 2022, reproducibility is still a big pain point in most data science workflows. A critical element required for reproducibility is version control. Unfortunately, in machine learning there is a notorious lack of standards for version control, so developers typically resort to crafting ad-hoc workflows. And frequently, developers reinvent the wheel due to a lack of awareness of existing solutions.

In this talk, Eiris introduces DVC, short for “Data Version Control,” an open-source tool that Tryolabs has found can significantly alleviate the pain of reproducibility in data science workflows. He covers the motivation for such a tool, digs into its main features and will hopefully convince you that your life will be much better if you integrate it into your next project. Everything is illustrated through a real-world example of an end-to-end ML pipeline.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.



1646 N. California Blvd.,
Suite 360
Walnut Creek, CA 94596 USA

Phone: +1 (925) 954-1411
Scroll to Top