Gian Marco Iodice, Software Engineer at ARM, presents the "Using SGEMM and FFTs to Accelerate Deep Learning" tutorial at the May 2016 Embedded Vision Summit.
Matrix Multiplication and the Fast Fourier Transform are numerical foundation stones for a wide range of scientific algorithms. With the emergence of deep learning, they are becoming even more important, particularly as use cases extend into mobile and embedded devices. In this presentation, lodice discusses and analyzes how these two key, computationally-intensive algorithms can be used to gain significant performance improvements for convolutional neural network (CNN) implementations.
After a brief introduction to the nature of CNN computations, Iodice explores the use of GEMM (General Matrix Multiplication) and mixed-radix FFTs to accelerate 3D convolution. He shows examples of OpenCL implementations of these functions and highlights their advantages, limitations and trade-offs. Central to the techniques explored is an emphasis on cache-efficient memory accesses and the crucial role of reduced-precision data types.