“How Large Language Models Are Impacting Computer Vision,” a Presentation from Voxel51

Jacob Marks, Senior ML Engineer and Researcher at Voxel51, presents the “How Large Language Models Are Impacting Computer Vision” tutorial at the May 2024 Embedded Vision Summit.

Large language models (LLMs) are revolutionizing the way we interact with computers and the world around us. However, in order to truly understand the world, LLM-powered agents need to be able to see.

Will models in production be multimodal, or will text-only LLMs leverage purpose-built vision models as tools? Where do techniques like multimodal retrieval-augmented generation (RAG) fit in? In this talk, Marks gives an overview of key LLM-centered projects that are reshaping the field of computer vision and discusses where we are headed in a multimodal world.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top