2024 Embedded Vision Summit Showcase: Keynote Presentation

Check out the keynote presentation “Learning to Understand Our Multimodal World with Minimal Supervision” at the upcoming 2024 Embedded Vision Summit, taking place May 21-23 in Santa Clara, California!

The field of computer vision is undergoing another profound change. Recently, “generalist” models have emerged that can solve a variety of visual perception tasks. Also known as foundation models, they are trained on huge internet-scale unlabeled or weakly labeled data and can adapt to new tasks without any additional supervision or with just a small number of manually labeled samples. Moreover, some are multimodal: they understand both language and images and can support other perceptual modes as well. In our 2024 Keynote, Professor Yong Jae Lee from the University of Wisconsin-Madison will present recent groundbreaking research on creating intelligent systems that can learn to understand our multimodal world with minimal human supervision. He will focus on systems that can understand images and text, and also touch upon those that utilize video, audio and LiDAR. Since training foundation models from scratch can be prohibitively expensive, Yong Jae will discuss how to efficiently repurpose existing foundation models for use in application-specific tasks. He will also discuss how these models can be used for image generation and, in turn, for detecting AI-generated images. He’ll conclude by highlighting key remaining challenges and promising research directions. Join us to learn how emerging techniques will address today’s neural network training bottlenecks, facilitate new types of multimodal machine perception and enable countless new applications.

Yong Jae Lee is an Associate Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. His research interests are in computer vision and machine learning, with a focus on robust visual recognition systems that learn to understand the visual world with minimal human supervision. Before joining UW-Madison in 2021, he spent one year as an AI Visiting Faculty at Cruise and six years as an Assistant and then Associate Professor at UC Davis. He received his PhD from the University of Texas at Austin in 2012 and was a postdoc at Carnegie Mellon University (2012-2013) and UC Berkeley (2013-2014). Professor Lee is co-author of the widely cited paper “Visual Instruction Tuning,” which proposes LLaVA (large language and vision assistant), an end-to-end trained large multimodal model that connects a vision encoder and an LLM for general-purpose visual and language understanding. He is also co-author of “Segment Everything Everywhere All at Once,” which proposes a novel decoding mechanism enabling diverse prompting for all types of segmentation tasks. Professor Lee is a recipient of the ARO Young Investigator Program Award (2017), UC Davis Hellman Fellowship (2017), NSF CAREER Award (2018), AWS Machine Learning Research Award (2018 and 2019), Adobe Data Science Research Award (2019 and 2022), UC Davis College of Engineering Outstanding Junior Faculty Award (2019), Sony Focused Research Award (2020 and 2023) and UW-Madison SACM Student Choice Professor of the Year Award (2022). He and his collaborators received the Most Innovative Award at the COCO Object Detection Challenge, ICCV 2019 and the Best Paper Award at BMVC 2020.

The Summit is the premier conference for innovators incorporating computer vision and edge AI in products. It attracts a global audience of technology professionals from companies developing computer vision and edge AI-enabled products including embedded systems, cloud solutions and mobile applications. Visit the Summit website for more information, and then register today! Don’t forget to also pass the word on to your colleagues. We look forward to seeing you there!

If you're building AI or vision-enabled products, you've come to the right place.

2024 Embedded Vision Summit Showcase: Keynote Presentation

Pages

Topics

Contact

Address

Phone