This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.
8-bit integer models using the AI Model Efficiency Toolkit
Making neural network models smaller is crucial for the widespread deployment of AI. Qualcomm AI Research has been developing state-of-the-art quantization techniques that enable power-efficient fixed-point inference while preserving model accuracy, such as Data Free Quantization (DFQ) and AdaRound, which are post-training techniques that achieve accurate 8-bit quantization without data.
To make this research more accessible and contribute to the open-source community, Qualcomm Innovation Center (QuIC) launched the AI Model Efficiency Toolkit (AIMET) on GitHub in May 2020. AIMET’s goal is to enable power efficient integer inference by providing a simple library plugin for AI developers to utilize for state-of-the-art model efficiency performance. The AIMET project is flourishing with regularly updated quantization techniques based on work from Qualcomm AI Research and active use by the broader AI community, including multiple mobile OEMs, ISVs, and researchers in academia.
Leading quantization research is quickly being open sourced.
QuIC is now taking it a step further by contributing a collection of popular pre-trained models optimized for 8-bit inference to GitHub in the form of “AIMET Model Zoo.” Together with the models, AIMET Model Zoo also provides the recipe for quantizing popular 32-bit floating point (FP32) models to 8-bit integer (INT8) models with little loss in accuracy. The tested and verified recipes include a script that optimizes TensorFlow or PyTorch models across a broad range of categories from image classification, object detection, semantic segmentation, and pose estimation to super resolution, and speech recognition.
AIMET Model Zoo provides 8-bit quantized models for a variety of categories.
This will allow researchers and developers direct access to highly accurate quantized models, saving them time in achieving performance benefits like reduced energy consumption, latency, and memory requirements for on-target inference. For example, imagine you are a developer wanting to do semantic segmentation for image beautification or autonomous driving use cases by using DeepLabv3+ model. AIMET Model Zoo provides an optimized DeepLabv3+ model using the DFQ and Quantization Aware Training (QAT) features from AIMET. The corresponding AIMET Model Zoo recipe points to this optimized model and provides proper calls to the AIMET library to run INT8 simulation and assess performance. In fact, the AIMET quantized version has a Mean Intersection over Union (mIoU) score of 72.08%, which is virtually equivalent to the 72.32% provided by the original FP32 model. The image below visually shows how the quantized model in AIMET Model Zoo results in accurate semantic segmentation.
Side-by-side comparison of FP32 model, 8-bit quantized AIMET model, and 8-bit quantized baseline model for DeepLabv3+ semantic segmentation. AIMET quantization results in accurate quantization, while the baseline quantization method is inaccurate.
This is one example. The AIMET Model Zoo has many INT8 quantized neural network models that provide accurate inference comparable to FP32 models. With this initial contribution of 14 INT8 models to AIMET Model Zoo, we are easing the hurdles for the ecosystem in using quantized models in their AI workloads and thus marching toward making fixed-point power-efficient inference ubiquitous. You can get the best of both worlds — the high accuracy of a floating-point model and the model efficiency of 8-bit integer models.
Engineer, Principal/Manager, Qualcomm Technologies