“Bridging Vision and Language: Designing, Training and Deploying Multimodal Large Language Models,” a Presentation from Meta Reality Labs

Adel Ahmadyan, Staff Engineer at Meta Reality Labs, presents the “Bridging Vision and Language: Designing, Training and Deploying Multimodal Large Language Models” tutorial at the May 2024 Embedded Vision Summit.

In this talk, Ahmadyan explores the use of multimodal large language models in real-world edge applications. He begins by explaining how these large multimodal models (LMMs) work and highlighting their key components, giving special attention to how LMMs merge understanding in the vision and language domains.

Next, Ahmadyan discusses the process of training LMMs and the types of data needed to tune them for specific tasks. Finally, he highlights some of the key challenges in deploying LMMs in resource-constrained edge devices and shares techniques for overcoming these challenges.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top