“Customizing Vision-language Models for Real-world Applications,” a Presentation from NVIDIA

Monika Jhuria, Technical Marketing Engineer at NVIDIA, presents the “Customizing Vision-language Models for Real-world Applications” tutorial at the May 2025 Embedded Vision Summit.

Vision-language models (VLMs) have the potential to revolutionize various applications, and their performance can be improved through fine-tuning and customization. In this presentation, Jhuria explores the concept and shares insights on domain adaptation for VLMs. She discusses the factors to consider when fine-tuning a VLM, including dataset requirements and the resources available to developers.

Jhuria explores two key approaches for customization: VLM fine-tuning, encompassing memory-efficient fine-tuning methods such as low-rank adaptation (LoRA) and full fine-tuning, and retrieval-augmented generation (RAG) for enhanced adaptability. Finally, she discusses metrics for validating the performance of VLMs and best practices for testing domain-adapted VLMs in real-world applications. You will gain a practical understanding of VLM fine-tuning and customization and will be equipped to make informed decisions about how to unlock the full potential of these models in your own projects.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top