“How Large Language Models Are Impacting Computer Vision,” a Presentation from Voxel51

Algorithms, Edge AI and Vision Alliance, Multimodal, Software, Summit 2024, Tools, Videos / October 29, 2024

Jacob Marks, Senior ML Engineer and Researcher at Voxel51, presents the “How Large Language Models Are Impacting Computer Vision” tutorial at the May 2024 Embedded Vision Summit.

Large language models (LLMs) are revolutionizing the way we interact with computers and the world around us. However, in order to truly understand the world, LLM-powered agents need to be able to see.

Will models in production be multimodal, or will text-only LLMs leverage purpose-built vision models as tools? Where do techniques like multimodal retrieval-augmented generation (RAG) fit in? In this talk, Marks gives an overview of key LLM-centered projects that are reshaping the field of computer vision and discusses where we are headed in a multimodal world.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Topics

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone

Phone: +1 (925) 954-1411

If you're building AI or vision-enabled products, you've come to the right place.

“How Large Language Models Are Impacting Computer Vision,” a Presentation from Voxel51

Pages

Topics

Contact

Address

Phone