SplitQuant: Layer Splitting for Low-bit Neural Network Quantization for Edge AI Devices
This blog post was originally published at Nota AI’s website. It is reprinted here with the permission of Nota AI. This study proposes an AI model preprocessing method for improved quantization accuracies on edge AI devices which do not support advanced quantization methods due to their limitations. By splitting layers based on parameter clustering, the […]
SplitQuant: Layer Splitting for Low-bit Neural Network Quantization for Edge AI Devices Read More +