Bringing Edge AI Performance to PyTorch Developers with ExecuTorch 1.0

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

ExecuTorch 1.0, an open source solution to training and inference on the Edge, becomes available to all developers
Qualcomm Technologies contributed the ExecuTorch repository for developers to access Qualcomm® Hexagon™ NPU directly
This streamlines the developer workflow and unlocks the benefits of local AI inference, from personalization to performance, privacy, and reduced reliance on cloud infrastructure

The ExecuTorch Team has announced the general availability of ExecuTorch 1.0. enabling developers to create seamless and high-performance edge AI experiences.

We are excited to share that we at Qualcomm contributed directly to the ExecuTorch repository with the ExecuTorch delegate for Qualcomm Hexagon NPU that lets developers access the NPU, our piece of hardware specifically designed for performance and power efficient inference on-device. This lets developers offload AI/ML and Gen AI inference directly to the Hexagon NPU while streamlining their development workflow.

This effort builds off our long-term collaboration with the PyTorch Edge team in an effort to bring edge AI to revery developers and address the challenges of deploying AI and Gen AI on resource-constrained edge devices.

Streamlined development workflow

With ExecuTorch 1.0, developers can port any model – large language models (LLMs) and vision-language models (VLMs) to image detection and other AI/ML models – and apps across various computing platforms, while using the same toolchains for model authoring, conversion, debugging, and deployment.

The lightweight runtime leverages full hardware capabilities, including CPUs, GPUs, and NPUs, leading to experiences that can tap into the device’s context for more personalization, faster and more power efficient inference, and reduced cloud inference costs.

On-device AI performance on billions of devices

With the inclusion of the ExecuTorch Delegate for Hexagon NPU, developers can deploy or port AI-powered apps to billions of devices powered by Qualcomm hardware – including mobile phones, PCs, AI smart glasses, cars, and IoT devices.

Tapping into the power of the Hexagon NPU means developers will unlock not just performance and power efficiency gains, but many more benefits that respond to a growing demand from consumers and industries, such as:

Maintain data on devices for privacy and personalization, with access to contextual data
Reduce reliance to cloud computing, opening doors to offline use cases and more accessible products worldwide
Improve real time responsiveness, with improved latency and throughput. For example, running large language models on-device using the Hexagon NPU instead of the CPU delivers between 30 and 75% faster load time and 2-4x faster token rate. For traditional models, throughput is up to 92% faster, and memory footprint decreased by up to 47%.

Model Coverage

Models coverage include models across traditional AI use-cases such as object detection, image recognition, depth, OCR, ASR, segmentation and on-device text LLM and multimodal LLMs, for example:

Llama-3.2-3B-instruct
Roberta: FacebookAI’s xlm-roberta-base
Gemma3: Gemma-3-1b
Qwen3: Qwen3-1.7B
Phi4: Phi-4-mini-instruct
Whisper: OpenAI’s Whisper
SmolLM3-3B

Get started now!

Learn more here:

Learn more: https://pytorch.org/executorch
Get started: https://docs.pytorch.org/executorch/1.0/
Download: https://pypi.org/project/executorch/
Maven: https://mvnrepository.com/artifact/org.pytorch/executorch-android/1

Qualcomm AI Stack

The potential for on-device intelligence is growing rapidly thanks to optimizations across the AI stack, from model development to deployment. The ExecuTorch delegate for Hexagon NPU comes as an addition to our existing portfolio which already supports TFlite and ONNX.

Learn more at: Qualcomm AI Stack | Unified AI Software Portfolio | Qualcomm

Felix Baum
Sr. Director, Product Management, Qualcomm Technologies, Inc.

Charlotte Mallo
Product Marketing Manager, Qualcomm Technologies, Inc.

*Compared with the same models running on the CPU on Snapdragon 8 Elite Gen 4.

If you're building AI or vision-enabled products, you've come to the right place.

Bringing Edge AI Performance to PyTorch Developers with ExecuTorch 1.0

ExecuTorch 1.0, an open source solution to training and inference on the Edge, becomes available to all developers

Qualcomm Technologies contributed the ExecuTorch repository for developers to access Qualcomm® Hexagon™ NPU directly

This streamlines the developer workflow and unlocks the benefits of local AI inference, from personalization to performance, privacy, and reduced reliance on cloud infrastructure

Streamlined development workflow

On-device AI performance on billions of devices

Model Coverage

Get started now!

Qualcomm AI Stack

Pages

Topics

Contact

Address

Phone