“Vision-language Models on the Edge,” a Presentation from Hugging Face

Algorithms & Models, Edge AI and Vision Alliance, Multimodal, Software, Summit 2025, Tools, Videos / September 22, 2025

Cyril Zakka, Health Lead at Hugging Face, presents the “Vision-language Models on the Edge” tutorial at the May 2025 Embedded Vision Summit.

In this presentation, Zakka provides an overview of vision-language models (VLMs) and their deployment on edge devices using Hugging Face’s recently released SmolVLM as an example. He examines the training process of VLMs, including data preparation, alignment techniques and optimization methods necessary for embedding visual understanding capabilities within resource-constrained environments.

Zakka explains practical evaluation approaches, emphasizing how to benchmark these models beyond accuracy metrics to ensure real-world viability. And to illustrate how these concepts play out in practice, he shares data from recent work implementing SmolVLM in an edge device.

See here for a PDF of the slides.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Topics

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone

Phone: +1 (925) 954-1411

If you're building AI or vision-enabled products, you've come to the right place.

“Vision-language Models on the Edge,” a Presentation from Hugging Face

Pages

Topics

Contact

Address

Phone