SqueezeBits Demonstration of On-device LLM Inference, Running a 2.4B Parameter Model on the iPhone 14 Pro

Algorithms, Software, SqueezeBits, Summit 2025, Tools, Videos / July 21, 2025

Taesu Kim, CTO of SqueezeBits, demonstrates the company’s latest edge AI and vision technologies and products at the 2025 Embedded Vision Summit. Specifically, Kim demonstrates a 2.4-billion-parameter large language model (LLM) running entirely on an iPhone 14 Pro without server connectivity.

The device operates in airplane mode, highlighting on-device inference using a hybrid approach that leverages the Apple Neural Engine and GPU for optimal performance. This setup enables fast, efficient language generation directly on resource-constrained devices, opening new possibilities for private, low-latency generative AI applications at the edge. Learn more about LLM optimization and deployment at https://squeezebits.com.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Topics

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone

Phone: +1 (925) 954-1411

If you're building AI or vision-enabled products, you've come to the right place.

SqueezeBits Demonstration of On-device LLM Inference, Running a 2.4B Parameter Model on the iPhone 14 Pro

Pages

Topics

Contact

Address

Phone