"Customizing Vision-language Models for Real-world Applications," a Presentation from NVIDIA

Monika Jhuria, Technical Marketing Engineer at NVIDIA, presents the “Customizing Vision-language Models for Real-world Applications” tutorial at the May 2025 Embedded Vision Summit.

Vision-language models (VLMs) have the potential to revolutionize various applications, and their performance can be improved through fine-tuning and customization. In this presentation, Jhuria explores the concept and shares insights on domain adaptation for VLMs. She discusses the factors to consider when fine-tuning a VLM, including dataset requirements and the resources available to developers.

Jhuria explores two key approaches for customization: VLM fine-tuning, encompassing memory-efficient fine-tuning methods such as low-rank adaptation (LoRA) and full fine-tuning, and retrieval-augmented generation (RAG) for enhanced adaptability. Finally, she discusses metrics for validating the performance of VLMs and best practices for testing domain-adapted VLMs in real-world applications. You will gain a practical understanding of VLM fine-tuning and customization and will be equipped to make informed decisions about how to unlock the full potential of these models in your own projects.

See here for a PDF of the slides.

If you're building AI or vision-enabled products, you've come to the right place.

“Customizing Vision-language Models for Real-world Applications,” a Presentation from NVIDIA

Pages

Topics

Contact

Address

Phone