TensorFlow Lite for Microcontrollers: Recent Developments
TensorFlow Lite Micro (TFLM) is a generic inference framework designed to run TensorFlow models on digital signal processors (DSPs), microcontrollers and other embedded targets with small memory footprints and very low power usage. TFLM aims to be easily portable to various embedded targets, from those running an RTOS to bare-metal code. TFLM leverages the model optimization tools from the TensorFlow ecosystem and has additional embedded-specific optimizations to reduce the memory footprint. TFLM also integrates with a number of community-contributed optimized hardware-specific kernel implementations. In this talk, David Davis, Senior Embedded Software Engineer, and John Withers, Automation and Systems Engineer, both of BDTI, review collaboration between BDTI and Google over the last year, including porting nearly two dozen operators from TensorFlow Lite to TFLM, creation of a separate Arduino examples repository, improved testing and documentation of both Arduino and Colab training examples and transitioning TFLM’s open-source CI framework to use GitHub Actions.
Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-cost, Low-power Computer Vision and Machine Learning
The Arm Cortex-M processor family of microcontrollers is designed and optimized for cost- and energy-efficient devices, and can be found in a variety of applications such as internet of things devices, industrial equipment and consumer electronics. As awareness has grown about the benefits of incorporating visual intelligence into systems, there’s increased interest in implementing both classical computer vision and machine learning in cost- and energy-constrained devices such as video doorbells. In this talk, Stephen Su, Senior Product Manager at Arm, explores the computer vision and machine learning capabilities of Arm’s ultra-low power Cortex-M0+, Cortex-M4 and Cortex-M7 processors with DSP instruction set extensions, as well as the latest Cortex-M55, which delivers tremendous performance and efficiency via the M-profile vector instruction set extensions based on Arm Helium technology. He also showcases how Arm is empowering the entire CV/ML ecosystem—including silicon partners, ISVs, OSVs, developers and OEMs—with reference software and development tools.
DEVELOPMENT TOOLS AND TECHNIQUES
Is Your AI Data Pre-processing Fast Enough? Speed It Up Using rocAL
AMD’s rocAL (ROCm Augmentation Library) is an open-source library for decoding and augmenting images, video and audio to accelerate the loading and preprocessing of data for machine learning applications. In this talk, Rajy Rawther, PMTS Software Architect at AMD, presents key components of the data loading path and shows how to efficiently combine them to get the most performance for your machine learning workload. With rocAL, you can create flexible hybrid pipelines that can be executed on a CPU or GPU for load balancing. After introducing rocAL and its key features, Rawther dives deep into some use cases focused on training DNNs for MLPerf benchmarks. You’ll learn how rocAL can significantly speed up your data loading pipeline for both training and inference.
Empower Your Edge Device Using NetsPresso – No AI Engineer Required
Despite the outstanding achievements of deep learning models, commercializing deep learning applications remains challenging. AI implementation experts are rare, and AI models for production must meet demanding requirements, processing high-bandwidth data in real-time with limited memory and processing resources, while preserving privacy. In this talk, Tae-Ho Kim, Co-founder and CTO of Nota AI, introduces NetsPresso, a hardware-aware MLOps tool for creating production-level AI models. Different NetsPresso modules support each step of the development of a deployable AI model, including data preparation, model selection, optimization and deployment. Kim illustrates NetsPresso’s capabilities via use cases in intelligent transportation systems and driver monitoring systems.
EDGE AI AND VISION PRODUCT OF THE YEAR WINNER SHOWCASE
Blaize Pathfinder P1600 Embedded System on Module (Best Edge AI Processor)
Blaize’s Pathfinder P1600 Embedded System on Module (SoM) is the 2022 Edge AI and Vision Product of the Year Award winner in the Edge AI Processors category. The Blaize Pathfinder P1600 Embedded SoM, based on the Blaize Graph Streaming Processor (GSP) architecture, enables new levels of processing power at low power with high system utilization ideal for AI inferencing workloads in edge-based applications. Smaller than the size of a credit card, the P1600 operates with 50x lower memory bandwidth, 7 W of power at 16 TOPS, 10x lower latency, and 30x better efficiency than legacy GPUs – opening doors to previously unfeasible AI inferencing solutions for edge vision use cases including in-camera and machine at the sensor edge, and network edge equipment. The Pathfinder platform is 100% programmable via the Blaize Picasso SDK, a comprehensive software environment that accelerates AI development cycles, uniquely based on open standards – OpenCL, OpenVX, and supporting ML frameworks such as TensorFlow, Pytorch, Caffe2, and ONNX. The Picasso SDK permits building complete end-to-end applications with higher transparency, flexibility, and portability levels.
Please see here for more information on Blaize’s Pathfinder P1600 Embedded SoM. The Edge AI and Vision Product of the Year Awards celebrate the innovation of the industry’s leading companies that are developing and enabling the next generation of edge AI and computer vision products. Winning a Product of the Year award recognizes a company’s leadership in edge AI and computer vision as evaluated by independent industry experts.
Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.