Rareș Ambruș, Senior Manager for Large Behavior Models at Toyota Research Institute, presents the “Depth Estimation from Monocular Images Using Geometric Foundation Models” tutorial at the May 2025 Embedded Vision Summit.
In this presentation, Ambruș looks at recent advances in depth estimation from images. He first focuses on the ability to estimate metric depth from monocular camera images from different domains and camera parameters.
Next, Ambruș looks at extensions to the multi-view setting and covers an efficient diffusion-based architecture capable of encoding hundreds of images and rendering depth and RGB images from novel viewpoints. Throughout the presentation, he focuses on the interplay between architectural inductive bias, training data and optimization objectives and their combined effect on building geometric foundation models that estimate 3D structure from images.
See here for a PDF of the slides.

