Pete Warden, Google Research Engineer and technical lead on the company's mobile/embedded TensorFlow team, is a long-time advocate of the Embedded Vision Alliance. Warden has delivered presentations at both the 2016 ("TensorFlow: Enabling Mobile and Embedded Machine Intelligence") and 2017 ("Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP") Embedded Vision Summits, along with other events, as well as advising the Alliance on its series of full-day hands-on technical training courses, "Deep Learning for Computer Vision with TensorFlow".
Warden was formerly the CTO of Jetpac, acquired by Google in 2014 for its deep learning technology optimized to run on mobile and embedded devices. He's also previously worked at Apple on GPU optimizations for image processing, and has written several books on data processing for O'Reilly. Warden has another O'Reilly-published book, "Building Mobile Apps with TensorFlow," coming out soon; the Alliance recently spoke with him about it.
You're a perpetually busy guy. What's the motivation to add a book to your already-formidable to-do list?
Part of the reason I'm so busy is that converting TensorFlow models to run on mobile and embedded platforms is a complicated process that many people need help with, so I'm actually hoping that improving the documentation will actually give me some time back in the long run!
What's the reason for particular focus on mobile application implementations of TensorFlow?
I'm incredibly excited by the possibilities of deep learning on mobile devices. I think we've barely scratched the surface of what's possible so far, and I think once more mobile developers understand the basics of how to deploy models on their devices that we'll see some amazing applications emerge.
When's the book scheduled to be available, how can folks get it, and how much will it cost?
The date is TBD (hopefully within the next couple of weeks), but it will be available free on the O'Reilly site. [Editor note: I've reviewed a draft of the book and can concur that it's well along on the path to publication]
Who's your intended audience for the book, and why should they read it?
The guide is aimed at developers who have a TensorFlow model successfully working in a desktop environment, who want to integrate it into a mobile application.
Please explain what TensorFlow is. In particular, what's a "Tensor"?
TensorFlow is a framework that lets you define complicated math operations on large arrays of values. These large arrays are known as tensors, and the most common use of the framework is to train and run machine learning models.
What is TensorFlow useful for on mobile platforms?
Traditionally, deep learning has been associated with data centers and giant clusters of high-powered GPU machines. The key driver behind mobile adoption of the technique is that it can be very expensive and time-consuming to send all of the data a device has access to across a network connection.
Deep learning also makes it possible to deliver very interactive applications, in a way that’s not available when you have to wait for a network round-trip. Here are some common use cases I've seen:
It can be very useful for a mobile app to be able to make sense of a camera image. If your users are taking photos, recognizing what’s in those photos can help you apply appropriate filters, or label them so they’re easily findable. Image recognition is important for embedded applications, too, since you can use image sensors to detect all sorts of interesting conditions, whether it’s spotting endangered animals in the wild, or reporting how late your train is running. TensorFlow comes with several examples of how to recognize types of objects inside images, along with a variety of different pre-trained models, and they can all be run on mobile devices. I recommend starting with the “TensorFlow for Poets” codelab.
There are a lot of interesting applications that can be built with a speech-driven interface, and many require on-device processing. Most of the time a user isn’t giving commands, so streaming audio continuously to a remote server is a waste of bandwidth—you’d mostly record silence or background noises. To solve this problem, it’s common to have a small neural network running on-device, listening for a particular keyword. When that keyword is spotted, the rest of the conversation can be transmitted over to the server for further processing if more computing power is needed.
Translating from one language to another quickly and accurately, even if you don’t have a network connection, is an important use case. Deep networks are very effective at this sort of task, and you can find descriptions of a lot of different models in the literature. Often these are sequence-to-sequence recurrent models in which you’re able to run a single graph to do the whole translation, without needing to run separate parsing stages. Google Translate’s live camera view is a great example of how effective interactive on-device detection of text can be. You can also see this tutorial on building your own translation model in TensorFlow.
What are some of the biggest benefits, and biggest challenges, to building a mobile app with TensorFlow?
The biggest benefits come from the new kinds of applications you can build, once your mobile and embedded devices understand the world around them. You can think of it as giving your sensors superpowers, so that cameras can understand what's in pictures, microphones make sense of speech, and accelerometers can understand whether a user is walking, sitting, or driving.
The hardest challenges come from how young the whole field is. There's never enough documentation, knowledge, or example code demonstrating how to solve practical problems. This is improving (and I hope TensorFlow can be an important part of that), but you will need to be prepared to research and learn to get good results.
How does the "cloud" fit in with a mobile implementation of TensorFlow?
A common pattern I see is that some work will be done on-device, but will trigger a more in-depth cloud process once some conditions are met. For example, Android phones listen out for the "OK Google" phrase using an on-device model, but then send the subsequent speech to the cloud to be analyzed. In this way, we actually expect that running models on device will mean more usage of cloud services, since it will make sense to send a curated set of interesting data over the network.
Is TensorFlow on mobile applicable only to inference, or can it also be used for training?
Training is the process of building a machine learning model by supplying large amounts of example data and tweaking the parameters (known as weights) contained in the model so that it produces good answers. Inference means taking a model that's already been trained elsewhere, and just running the prediction step on new data.
Most current applications for TensorFlow on mobile focus on inference, since that's less computationally intensive, but it's definitely possible to perform training using the framework. For some ideas on what it could be used for, I'd recommend checking out the research on Federated Learning.
What operating system platforms are applicable for developing (i.e. building), and for deploying, a mobile TensorFlow application?
We support Linux, Windows, and MacOS for compiling mobile applications, and our key platforms for deploying are Android, iOS, and Raspberry Pi. We are keen to encourage other platforms too, though. For example, we collaborate with Qualcomm on HVX DSP support, and we aim to make the framework portable to as many devices as possible.
What are the minimum mobile platform hardware requirements, and do these parameters depend on whether you're doing only inference, only training, or both?
One nice property of deep learning models is that they can often be scaled to fit device capacities by trading off accuracy for file size and compute requirements. We've seen useful audio NN models as small as 12KB for example! On the vision side, MobileNet shows how you can shrink an image recognition model down to just 500KB in size with 28 million FLOPs and still produce useful results.
How can model size be optimized for available storage and memory capacity, balanced against desired accuracy, responsiveness, and other factors?
There is a set of techniques, ranging from designing models that can trade off accuracy for size at training time, like MobileNet, to compressing models that have already been trained to eight bits. A lot of this work is still very new though, so I've tried to identify a few of the more reliable techniques in the book and focus on those.
When I hear "mobile" I specifically think of smartphones and tablets. To what degree are the concepts discussed in your book applicable to other embedded platform implementations, which may have similar constraints along with a similar desire to operate partially-to-completely "offline" for training and/or inference purposes?
Within Alphabet there are a lot of products that use embedded devices, including Google Home, NestCam, Chromecast, Glass, and even the always-on mode of phones using a DSP. The TensorFlow team has learned a lot from supporting these kinds of products, and so we're aware of how tight the resource constraints can be sometimes. We don't have a perfect solution for this yet, but initiatives like TensorFlow Lite are aimed at shrinking the code footprint and simplifying the build process, which are two of the biggest challenges to deploying deep learning on an embedded platform.
Stay tuned for a news post on the Alliance website when Warden's book is available. And if you're planning to develop computer vision applications using deep learning and want to understand how to use TensorFlow to do it, then don't miss one of the Embedded Vision Alliance's full-day, hands-on training classes, "Deep Learning for Computer Vision with TensorFlow". The next scheduled class takes place in Hamburg, Germany on September 7; additional training dates and worldwide locations are also planned. Learn more and register at https://tensorflow.embedded-vision.com.