SqueezeBits Demonstration of On-device LLM Inference, Running a 2.4B Parameter Model on the iPhone 14 Pro

Taesu Kim, CTO of SqueezeBits, demonstrates the company’s latest edge AI and vision technologies and products at the 2025 Embedded Vision Summit. Specifically, Kim demonstrates a 2.4-billion-parameter large language model (LLM) running entirely on an iPhone 14 Pro without server connectivity.

The device operates in airplane mode, highlighting on-device inference using a hybrid approach that leverages the Apple Neural Engine and GPU for optimal performance. This setup enables fast, efficient language generation directly on resource-constrained devices, opening new possibilities for private, low-latency generative AI applications at the edge. Learn more about LLM optimization and deployment at https://squeezebits.com.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top