SqueezeBits is a leading startup specializing in AI optimization and model compression, enabling more efficient AI services without compromising performance. Our expertise lies in making AI models smaller and faster while optimizing them for a wide range of hardware platforms. Our flagship product, ‘OwLite’, is an easy-to-use quantization toolkit that empowers developers and engineers to compress AI models and profile their performance across various hardware platforms. Another product, ‘Fits on Chips’, provides innovative solutions for deploying large language models (LLMs) in edge environments, setting us apart as a pioneer in LLM optimization and serving technologies. Beyond offering customized consulting services, SqueezeBits excels in quantization and low-level hardware optimization, helping clients maximize the performance and efficiency of their AI models. We are committed to driving global AI innovation by delivering best-in-class solutions for any deployment environment.
SqueezeBits

Recent Content by Company
 
						
							OwLite Meets Qualcomm Neural Network: Unlocking On-device AI Performance
This blog post was originally published at SqueezeBits’ website. It is reprinted here with the permission of SqueezeBits. At SqueezeBits we have been empowering developers to efficiently deploy complex AI models while minimizing performance trade-offs with OwLite toolkit. With OwLite v2.5, we’re excited to announce official support for Qualcomm Neural Network (QNN) through seamless integration […]
 
						
							SqueezeBits Demonstration of On-device LLM Inference, Running a 2.4B Parameter Model on the iPhone 14 Pro
Taesu Kim, CTO of SqueezeBits, demonstrates the company’s latest edge AI and vision technologies and products at the 2025 Embedded Vision Summit. Specifically, Kim demonstrates a 2.4-billion-parameter large language model (LLM) running entirely on an iPhone 14 Pro without server connectivity. The device operates in airplane mode, highlighting on-device inference using a hybrid approach that […]
 
						
							“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,” a Presentation from SqueezeBits
Taesu Kim, Chief Technology Officer at SqueezeBits, presents the “Bridging the Gap: Streamlining the Process of Deploying AI onto Processors” tutorial at the May 2025 Embedded Vision Summit. Large language models (LLMs) often demand hand-coded conversion scripts for deployment on each distinct processor-specific software stack—a process that’s time-consuming and prone… “Bridging the Gap: Streamlining the […]

