“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Presentation from Trevor Darrell

Trevor Darrell, Professor at the University of California, Berkeley, presents the “Future of Visual AI: Efficient Multimodal Intelligence” tutorial at the May 2025 Embedded Vision Summit.

AI is on the cusp of a revolution, driven by the convergence of several breakthroughs. One of the most significant of these advances is the development of large language models (LLMs) that can reason like humans, enabling them to make decisions and take actions based on complex, nuanced inputs. Another is the integration of natural language processing and computer vision through vision-language models (VLMs). In this keynote talk, Darrell shares his perspective on the current state and trajectory of research advancing machine intelligence. He presents highlights of his group’s groundbreaking work, including methods for training vision models when labeled data is unavailable and techniques that enable robots to determine appropriate actions in novel situations.

Particularly relevant to edge applications, much of Darrell’s work aims to overcome obstacles—such as massive memory and compute requirements—that limit the practical applications of state-of-the-art models. For example, he discusses approaches to making VLMs smaller and more efficient while retaining accuracy. He also shows how LLMs can be used as visual reasoning coordinators, overseeing the use of multiple task-specific models to enable superior performance. Darrell also demonstrates how multimodal AI, visual perception and prompt-tuned reasoning are enabling consumers to utilize visual intelligence at home while preserving privacy.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top