Arun Kumar, Perception Engineer at Nemo @ Ridecell, presents the “Cost-Effective Approach to Modeling Object Interactions on the Edge” tutorial at the May 2022 Embedded Vision Summit.
Determining bird’s eye view (BEV) object positions and tracks, and modeling the interactions among objects, is vital for many applications, including understanding human interactions for security and road object interactions for automotive applications. With traditional methods, this is extremely challenging and expensive due to the supervision required in the training process. In this presentation, Kumar introduces a weakly supervised end-to-end computer vision pipeline for modeling object interactions in 3D.
Nemo @ Ridecell’s architecture trains a unified network in a weakly supervised manner to estimate 3D object positions by jointly learning to regress the 2D object detection and the scene’s depth in a single feed-forward CNN pass, and to subsequently model object tracks. The method learns to model each object as a BEV point, without the need for 3D or BEV annotations for training, and without supplemental (e.g., LiDAR) data. Nemo @ Ridecell achieves results comparable to the state-of-the-art while significantly reducing development costs and computation requirements.
See here for a PDF of the slides.