Taesu Kim, CTO of SqueezeBits, demonstrates the company’s latest edge AI and vision technologies and products at the 2025 Embedded Vision Summit. Specifically, Kim demonstrates a 2.4-billion-parameter large language model (LLM) running entirely on an iPhone 14 Pro without server connectivity.
The device operates in airplane mode, highlighting on-device inference using a hybrid approach that leverages the Apple Neural Engine and GPU for optimal performance. This setup enables fast, efficient language generation directly on resource-constrained devices, opening new possibilities for private, low-latency generative AI applications at the edge. Learn more about LLM optimization and deployment at https://squeezebits.com.