OpenAI’s gpt-oss-20b: Its First Open-source Reasoning Model to Run on Devices with Snapdragon

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

At Qualcomm Technologies, we’ve long believed that AI assistants will be ubiquitous, personal and on-device.

Today, we’re excited to share a major milestone in that journey: OpenAI has open-sourced its first reasoning model, gpt-oss-20b, a chain-of-thought reasoning model that runs directly on devices with flagship Snapdragon processors. OpenAI’s sophisticated models have been previously confined to the cloud. Today marks the first time the company is making its model available for on-device inference.

Through early access to the model and integration testing with our Qualcomm AI Engine and the Qualcomm AI Stack, we have seen that this 20B parameter model is an incredibly impressive model that enables chain-of-thought reasoning entirely on-device.

We see this moment as a turning point: a glimpse into the future of AI where even rich assistant-style reasoning will be local. It also shows the maturity of the AI ecosystem, where open-source innovation from leaders like OpenAI can be harnessed in real-time by partners and developers utilizing Snapdragon processors. OpenAI’s gpt-oss-20b will enable devices to leverage on-device inference, offering benefits in terms of privacy and latency, while complementing cloud solutions via AI agents.

Through early access to gpt-oss-20b and integration testing with the Qualcomm AI Stack, we’ve seen firsthand the model’s impressive capabilities, enabling complex reasoning entirely on-device. Developers will be able to access this model and leverage its capabilities on devices with Snapdragon through popular platforms like Hugging Face and Ollama, with more details on deployment available soon on the Qualcomm AI Hub.

By integrating Ollama’s lightweight, open-source LLM servicing framework with powerful Snapdragon platforms, developers and enterprises can run the gpt-oss-20b directly on devices with Snapdragon compute platforms and also run web search and several other features default out of the box. Users can also explore turbo mode on Ollama to explore more functionalities of the model.

Over the next few-years, as mobile memory footprints continue to grow and software stacks get even more efficient, we believe that on-device AI capability will increase rapidly, opening the door to private, low-latency, personalized agentic experiences.

Download gpt-oss-20b from Ollama

Sachin Deshpande
Sr. Director, Business Development, Qualcomm Technologies, Inc.

Vinesh Sukumar
VP, Product Management of AI/GenAI, Qualcomm Technologies, Inc.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top