OPTIMIZING DEEP LEARNING MODEL EFFICIENCY |
Quantization Techniques for Efficient Deployment of Large Language Models: A Comprehensive Review The deployment of large language models (LLMs) in resource-constrained environments is challenging due to the significant computational and memory demands of these models. To address this challenge, various quantization techniques have been proposed to reduce the model’s resource requirements while maintaining its accuracy. This 2025 Embedded Vision Summit talk from Dwith Chenna, MTS Product Engineer for AI Inference at AMD, provides a comprehensive review of post-training quantization (PTQ) methods, highlighting their trade-offs and applications in LLMs. Chenna explains quantization techniques such as gradient post-training quantization (GPTQ), activation-aware weight quantization (AWQ) and SmoothQuant, and evaluates their performance on popular LLMs like the Open Pre-trained Transformer (OPT) language model series and Meta’s Llama-2 LLM. His results demonstrate that these techniques can significantly reduce these models’ size and computational requirements while maintaining their accuracy, making them suitable for deployment in edge environments.
|
Introduction to Shrinking Models with Quantization-aware Training and Post-training Quantization In this 2025 Embedded Vision Summit presentation, Robert Cimpeanu, Machine Learning Software Engineer at NXP Semiconductors, explains two neural network quantization techniques, quantization-aware training (QAT) and post-training quantization (PTQ), and explains when to use each. He discusses what needs to be done for efficient implementation of each: for example, QAT requires preparation of models through layer fusion and graph optimization, while PTQ requires a suitable dataset. Cimpeanu highlights the advantages and limitations of each approach and explores model architectures that benefit from QAT and PTQ. He also presents strategies for combining these techniques and introduces tools such as Brevitas that enable quantization, demonstrating how to optimize neural networks for improved performance and efficiency.
|
OPEN TOOLSETS FOR SOFTWARE DEVELOPMENT |
The New OpenCV 5.0: Added Features, Performance Improvements and Future Directions In this 2025 Embedded Vision Summit presentation, Satya Mallick, CEO of OpenCV.org, delves into the latest version of OpenCV, the world’s most popular open-source computer vision library. He highlights the major innovations and improvements in OpenCV 5.0, including a revamped DNN inference engine with integrated format parsers, better support for popular DNN architectures and an improved model zoo. He also discusses the enhanced hardware acceleration layer with improved GPU acceleration and cross-platform support, as well as the introduction of FP16 support and universal intrinsics. Mallick also touches on the improved documentation, samples and developer experience, which make it easier for users to leverage the power of OpenCV. Finally, he provides a glimpse into the future road map for OpenCV, including support for embodied AI, 3D representations and the introduction of OpenCV Enterprise, a hardened version of the library with optimized binaries and 24-hour support. |
Deploying Accelerated ML and AI: The Role of Khronos Open Standards Accelerating machine learning and AI workloads often requires specialized hardware, but managing compatibility across diverse platforms can be challenging. The Khronos Group provides the industry’s only suite of nonproprietary API and language standards for ML and AI acceleration, enabling developers to build flexible, portable and scalable solutions. In this 2025 Embedded Vision Summit talk, Neil Trevett, President of the Khronos Group and Vice President of Developer Ecosystems at NVIDIA, explores how Khronos APIs such as Vulkan, OpenCL, SYCL and OpenVX empower developers to optimize AI workloads across GPUs, NPUs and FPGAs while ensuring portability across vendors. He also highlights recent advancements like the open-source Slang shading language for neural computation and the evolving OpenVX framework for cross-vendor inferencing. You’ll discover how Khronos is shaping the future of AI and ML acceleration by fostering open ecosystems that reduce deployment complexity, enhance hardware interoperability and lower development costs, including in embedded and safety-critical systems. |
UPCOMING INDUSTRY EVENTS |
Embedded Vision Summit: May 11-13, 2026, Santa Clara, California |
FEATURED NEWS |
Qualcomm’s Snapdragon 8 Elite Gen 5 Mobile System-on-a-chip Establishes New Consumer Experiences and Sets New Industry Benchmarks NVIDIA and Intel to Develop AI Infrastructure and Personal Computing Products e-con Systems Expands Camera Support for Renesas’ New RZ/G3E, Enabling Reliable Edge AI Vision Solutions FRAMOS Extends D400e 3D Camera Support to NVIDIA’s Holoscan Sensor Bridge Andes Technology’s AutoOpTune Applies Genetic Algorithms to Accelerate RISC-V Software Optimization |
EDGE AI AND VISION PRODUCT OF THE YEAR WINNER SHOWCASE |
Visidon Real-time Video Noise Reduction (Best Edge AI Software or Algorithm) Visidon’s Real-time Video Noise Reduction is the 2025 Edge AI and Vision Product of the Year Award Winner in the Edge AI Software or Algorithm category. Visidon’s AI-powered Video Noise Reduction technology significantly enhances low-light video quality for surveillance and security applications by addressing key shortcomings of traditional ISP-based methods. Conventional noise reduction often compromises essential image details, resulting in blurred footage that can obscure critical information, particularly in extremely low-light conditions. In contrast, Visidon’s advanced convolutional neural network (CNN) technology effectively overcomes these limitations, delivering clarity and detail even in environments as dark as 0.01 lux. Visidon employs a purpose-built, AI-optimized algorithm specifically designed for surveillance applications. This approach enhances object detection accuracy, improving recognition rates by up to 50% compared to footage processed with traditional ISP noise reduction. With real-time performance as a priority, the technology utilizes high-efficiency neural processing units (NPUs) in smart cameras, allowing for seamless and power-efficient deployment on embedded devices. This capability ensures high-quality noise reduction without compromising energy consumption or device compatibility. Unlike other solutions that focus on producing visually appealing outputs for consumer devices, Visidon’s technology prioritizes optimization for machine vision. By preserving crucial details essential for analytics and object recognition, it delivers unmatched precision for demanding applications such as smart surveillance. With the growing integration of high-performance NPUs in smart surveillance cameras, Visidon’s CNN-based approach can be implemented effortlessly, resulting in a transformative improvement in video quality during low-light scenarios. Please see here for more information on Visidon’s Real-time Video Noise Reduction. The Edge AI and Vision Product of the Year Awards celebrate the innovation of the industry’s leading companies that are developing and enabling the next generation of edge AI and computer vision products. Winning a Product of the Year award recognizes a company’s leadership in edge AI and computer vision as evaluated by independent industry experts. |