Hoon Choi, Senior Director of Design Engineering at Lattice Semiconductor, presents the “Deep Quantization for Energy Efficient Inference at the Edge” tutorial at the May 2018 Embedded Vision Summit.
Intelligence at the edge is different from intelligence in the cloud in terms of requirements for energy, cost, accuracy and latency. Due to limits on battery power and cooling systems in edge devices, energy consumption is strictly limited. In addition, low cost and small size requirements make it hard to use packages with large numbers of pins, thus limiting the bandwidth to DRAM chips commonly used for storing neural network algorithm information. Despite these limitations, most applications require real-time operation. To tackle this issue, the industry has developed networks that heavily rely on deep quantization.
In this talk, Choi shows how to use the deep quantization in real applications without degrading accuracy. Specifically, he explains the use of different quantizations for each layer of a deep neural network and how to use deep layered neural networks along with deep quantization. He also explains the use of this deep quantization approach with recent lightweight networks.