“Depth Estimation from Monocular Images Using Geometric Foundation Models,” a Presentation from Toyota Research Institute

Rareș Ambruș, Senior Manager for Large Behavior Models at Toyota Research Institute, presents the “Depth Estimation from Monocular Images Using Geometric Foundation Models” tutorial at the May 2025 Embedded Vision Summit.

In this presentation, Ambruș looks at recent advances in depth estimation from images. He first focuses on the ability to estimate metric depth from monocular camera images from different domains and camera parameters.

Next, Ambruș looks at extensions to the multi-view setting and covers an efficient diffusion-based architecture capable of encoding hundreds of images and rendering depth and RGB images from novel viewpoints. Throughout the presentation, he focuses on the interplay between architectural inductive bias, training data and optimization objectives and their combined effect on building geometric foundation models that estimate 3D structure from images.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top