Edge AI and Vision Insights: April 23, 2025

LETTER FROM THE EDITOR

Dear Colleague,

The Edge AI and Vision Product of the Year Awards celebrate the innovation and achievement of the industry’s leading companies that are enabling and developing products incorporating edge AI and computer vision technologies.

This year’s award winners were announced on Monday, and an awards ceremony will take place on-stage during the Embedded Vision Summit, taking place May 20-22 in Santa Clara, California. Congratulations to the winners:

Watch the winners accept their awards at the Summit—use code SUMMIT25-NL for 15% off your pass when you register online!

Brian Dipert
Editor-In-Chief, Edge AI and Vision Alliance

VISION-LANGUAGE MODELS

Your Next Computer Vision Model Might be an LLM: Generative AI and the Move From Large Language Models to Vision Language Models

The past decade has seen incredible progress in practical computer vision. Thanks to deep learning, computer vision is dramatically more robust and accessible, and has enabled compelling capabilities in thousands of applications, from automotive safety to healthcare. But today’s widely used deep learning techniques suffer from serious limitations. Often, they struggle when confronted with ambiguity (e.g., are those people fighting or dancing?) or with challenging imaging conditions (e.g., is that shadow in the fog a person or a shrub?). And, for many product developers, computer vision remains out of reach due to the cost and complexity of obtaining the necessary training data, or due to lack of necessary technical skills.

Surprisingly, recent advances in large language models (and their close cousins, vision language models, which comprehend both images and text) hold the key to overcoming these challenges. In this video from a recent expert-led online symposium, you will learn:

  • What are vision language models, and how do they combine language and vision to create a unified representation?
  • What enables vision language models to generalize more effectively compared with classical vision models?
  • How can product developers leverage vision language models to reduce the need for training data for new vision applications?

The symposium features brief presentations from three expert speakers—István Fehérvári, Director of Data and ML at BenchSci; Carter Maslan, CEO of Camio; and Jeff Bier, Founder of the Edge AI and Vision Alliance and President of BDTI—followed by a Q&A session. (Want to learn more about vision language models and other multimodal large language models? Attend the 2025 Embedded Vision Summit! Check out the program here.)

Qwen2-VL, An Expert Vision-language Model for Video Understanding

Qwen2-VL, an advanced open-source vision language model built on the Qwen2 open-source LLM foundation, sets new benchmarks in image comprehension across varied resolutions and ratios, while also effectively tackling extended video content.‍ Though Qwen2-V excels at many fronts, this in-depth technical article from Tenyks explores the model’s innovative features and its potential applications for video queries and understanding. (Interested in learning more about both open- and closed-source vision language model options? Check out the 2025 Embedded Vision Summit program here, and then register today!)

VISION IN ROBOTICS

Advancing Robot Mobility and Whole-body Control with Novel Workflows and AI Foundation Models

Developing robust robots presents significant challenges, such as:

  • Data scarcity: Generating diverse, real-world training data for AI models.
  • Adaptability: Ensuring solutions generalize across varied robot types and environments, and adapt to dynamic, unpredictable settings.
  • Integration: Effectively combining mobility, manipulation, control, and reasoning.

NVIDIA addresses these challenges through an approach that combines cutting-edge research with engineering workflows, tested on its various AI and robotics platforms. The resulting models, policies and datasets serve as customizable references for the research and developer community to adapt to specific robotics needs.

In this premier edition of the NVIDIA Robotics Research and Development Digest, you’ll learn about the following robot mobility and whole-body control workflows and models, and how they address key robot navigation, mobility, and control challenges:

  • MobilityGen: A simulation-based workflow that uses Isaac Sim to rapidly generate large synthetic motion datasets for building models for robots across different embodiments and environments, as well as testing robots to navigate new environments, reducing costs and time compared to real-world data collection.
  • COMPASS (Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis): A workflow for developing cross-embodiment mobility policies, facilitating fine-tuning using Isaac Lab, and zero-shot sim-to-real deployment.
  • HOVER (Humanoid Versatile Controller): A workflow and a unified whole-body control generalist policy for diverse control modes in humanoid robots in Isaac Lab.
  • ReMEmbR (a Retrieval-augmented Memory for Embodied Robots): A workflow that enables robots to reason and take mobility action, using LLMs, VLMs and RAG (retrieval-augmented generation).

(For more insights on developing robust vision-enabled robots, attend the 2025 Embedded Vision Summit! Check out the program here.)

Optimizing Edge AI for Effective Real-time Decision Making

By enabling devices to process data locally, edge AI empowers robots to make autonomous decisions in critical situations. As robotics continues to evolve, the integration of edge computing and artificial intelligence is redefining real-time decision-making across industries. This analysis from Geisel Software explores the technical advantages of edge AI that are driving this transformation, such as:

  • Reduced latency
  • Bandwidth conservation
  • Improved reliability
  • Enhanced data security and privacy
  • Reduced power consumption, and
  • Real-time performance

(To learn more about the benefits of edge AI in robotics and other applications, attend the 2025 Embedded Summit! Peruse the full program here,and then register today!)

UPCOMING INDUSTRY EVENTS

Embedded Vision Summit: May 20-22, 2025, Santa Clara, California

More Events

FEATURED NEWS

Vision Components’ MIPI Bricks is a Modular System for Plug and Play Embedded Vision

Attend Jeff Bier’s Presentation and Discussion at at Andes Technology’s Upcoming RISC-V CON Silicon Valley Event

NVIDIA Announces the DGX Spark and DGX Station Personal AI Computers

VeriSilicon Unveils VC9000D_LCEVC, a High-efficiency LCEVC Video Decoder Supporting 8K Ultra HD

STMicroelectronics Releases STM32MP23 Microprocessors Equipped for Performance and Economy, Extends Support for OpenSTLinux Releases

More News

EMBEDDED VISION SUMMIT SPONSOR SHOWCASE

Attend the Embedded Vision Summit to meet these and other leading computer vision and edge AI technology suppliers!



Network Optix

Network Optix is revolutionizing the computer vision landscape with an open development platform that’s far more than just IP software. Nx Enterprise Video Operating System (EVOS) is a video-powered, data-driven operational management system for any and every type of organization. An infinite-scale, closed-loop, self-learning business operational intelligence and execution platform. An operating system for every vertical market. Just add video.



Qualcomm

Qualcomm is enabling a world where everyone and everything can be intelligently connected. Our one technology road map allows us to efficiently scale the technologies that launched the mobile revolution—including advanced connectivity, high-performance, low-power compute, on-device intelligence and more—to the next generation of connected smart devices across industries. Innovations from Qualcomm and our family of Snapdragon platforms will help enable cloud-edge convergence, transform industries, accelerate the digital economy and revolutionize how we experience the world, for the greater good.



EMBEDDED VISION SUMMIT PARTNER SHOWCASE



All About Circuits

All About Circuits is the premier digital publication and community forum for electronics engineers and technologies. Our rich offerings include technical articles, industry news, textbooks, videos, peer discussions, and more—all crafted for career engineers and students. We are the go-to site for the latest news on processors, sensors, analog components, and other electronics aimed at applications, in consumer devices, automove, IoT, wireless, and AI.



Inspect

inspect is the leading international magazine for applied machine vision and optical metrology with editions specifically for the German, European and North American markets. Inspect America is published four times a year in digital format and is aimed at machine vision users and integrators in North America. Every issue centers on the latest technologies, market trends and new products. Every issue reaches 135,000 users and integrators of machine vision in North America. The May issue is also sent to the recipients of the newsletter of our media partner, the Edge AI and Vision Alliance. For 25 years, inspect has been providing competent and comprehensive information on all important topics from the world of machine vision and thus offers the ideal platform for your content and advertising activities.



Vision Spectra

Vision Spectra magazine covers the latest innovations that are transforming today’s manufacturing landscape: neural networks, 3D sensing, embedded vision and more. Each quarterly issue includes rich content on a range of topics, with an in-depth look at how vision technologies are transforming industries from food and beverage to automotive and beyond, with integrators, designers, and end users in mind. Visit www.vision-spectra.com to subscribe for free.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top