This blog post was originally published at Intel’s website. It is reprinted here with the permission of Intel.
Back in 2018, Intel launched the Intel® Distribution of OpenVINO™ toolkit. Since then, it’s been widely adopted by partners and developers to deploy AI-powered applications in various industries, from self-checkout kiosks to medical imaging to industrial robotics. The toolkit’s popularity is due to its ease of integration with and streamlined deployment experience on Intel® architecture platforms to improve deep learning inference performance.
In designing the toolkit, two of our major developer experience objectives were ease of use and a streamlined out-of-the-box experience. To start, users have the ability to run their first sample within minutes after installation. This ease of deployment spans across all aspects of the tool, including model import, model optimization and seamless migration between releases due to strict backward compatibility. Our user and developer experience experts are continually working to evaluate all aspects of the product, and we think their efforts are visible.
At the same time, we are working to increase the functionality and features of the Intel Distribution of OpenVINO toolkit to cover more use cases. We introduced quantization to support platforms with int8, a throughput mode for platforms with a large number of cores, and multi-device inference for full system utilization and maximized performance. All these new features introduce additional tools or settings that unleash functionality and allow fine-grained configuration. However, it can be a bit complicated to put all the pieces of the puzzle together and optimize the performance/accuracy tradeoff for individual applications. That is why in 2019, a year after the toolkit was launched, we also launched the Deep Learning Workbench, which is a graphical user interface (GUI) extension tool that combines all existing utilities to assist users with the most commonly used tasks, such as model downloading, checking accuracy and the newest addition, low-precision quantization, as well as to guide them through the development and deployment process.
Let’s walk though some of the key tools and capabilities equipped in the Deep Learning Workbench.
1. Every development step through a streamlined user interface
In its simplest form, development using the toolkit includes the steps shown below:
Figure 1. Typical steps in development using Intel Distribution of OpenVINO Toolkit.
Each step is represented via a separate tool which provides access to all capabilities through command line options, providing developers with advanced tool capabilities (additional quantization algorithm settings, model import parameters, etc) while allowing developers to build and deploy with flexibility and scalability. Some tools may have similar (or the same) parameters that must be specified for every tool. For example, to validate a model on a particular dataset, the path and dataset type should be provided, while the same path and type must be provided for quantization tools. The Deep Learning Workbench streamlines development by enabling a seamless user interface to represent the development pipeline, making use of command line tools to perform development tasks, and exposing only the capabilities that are required.
While following the development flow, the tool remembers all of the information entered on previous steps and uses it to simplify subsequent development steps. For example, once users specify the model type during the model import step, they will not need to specify it again for the application that is used to measure accuracy on customer datasets. Similarly, dataset location and path are specified in the accuracy analysis step eliminating the need to specify the information again for the quantization path. When performing model quantization (one of the capabilities that can be accessed via the Deep Learning Workbench), a quantized model will be produced and accuracy can be measured automatically without specifying dataset paths. That way, the tool encapsulates all use case-related information in a single location and uses this information to simplify development.
The tool not only attempts to simplify the overall development flow, but it also provides additional features in each development step that are enabled by the user interface.
2. Assistance in model import
More and more tasks can be solved using Deep Learning models, resulting in the introduction of new architectures and an increase in complexity for existing ones. The toolkit stays up to date with regular updates to the Model Optimizer in the Intel Distribution of OpenVINO toolkit and individual hardware plugins. These necessary updates add to the complexity of the Model Optimizer tool, which can complicate the selection process for model import.
In order to alleviate developers from the added complexity of these necessary updates, the Deep Learning Workbench simplifies the workflow by leveraging model graph analysis techniques. These techniques decrease the number of parameters users must provide in order to use Model Optimizer to convert models into Intermediate Representation (IR) for later inference. During model import, the workbench analyzes the model first and then attempts to identify model type, inputs, outputs and other attributes to simplify import.
For developers who wish to try models from the Open Model Zoo of the Intel Distribution of OpenVINO toolkit, we provide a convenient interface to choose from those models, and then automatically import them without specifying a single option.
Figure 2. Import of public and pre-trained models from Open Model Zoo.
3. Detailed performance and accuracy insights
The first thing a developer wants to know after model import is how fast and accurate it is. The Intel Distribution of OpenVINO toolkit provides tools to benchmark and measure accuracy. The toolkit’s Deep Learning Workbench comes fully equipped with these tools.
When measuring performance, users can see not only the overall performance, but also understand how certain parts of the topology impact overall execution. The Deep Learning Workbench provides detailed timing per layer, information on fusions that happened, and a complete runtime graph. Based on this information, developers can adapt their topologies by replacing or modifying certain layers during model training to achieve higher performance. For instance, replacing heavy ELU activation with lightweight ReLU will boost performance.
Figure 3. Per-layer execution performance and detailed insights into graph optimizations.
Once performance is increased, users may want to ensure that their topology is performing correctly. The Deep Learning Workbench can provide model accuracy metrics once key parameters, such as topology type, and required preprocessing are specified and a dataset is provided. The Deep Learning Workbench currently supports classification and object detection topologies with corresponding dataset types, such as ImageNet, Pascal VOC, and the latest addition, COCO. We are continuously adding more types of topologies and datasets that cover a wider spectrum of tasks to streamline development.
4. Easier Low-Precision Quantization
New generations of Intel hardware include multiple options for Deep Learning acceleration. For example 2nd Gen Intel® Xeon® Scalable processors offer improved int8 performance via the Intel Deep Learning Boost feature. Intel delivers platforms with built-in agility to satisfy the compute needs of our customers for the future of AI. To unleash these next-generation capabilities, developers must first prepare their models for execution with low-precision data type, such as INT8. One of the options within the Intel Distribution of OpenVINO toolkit is Post-Training Optimization for low precision quantization to data types, such as INT8, for improved performance with little degradation to accuracy. Now, with the latest release of the Intel Distribution of OpenVINO toolkit, the Post-training Optimization Tool is now made available in the Deep Learning Workbench for a streamlined developer experience paired with high-performance, deep learning.
By using the imported model and dataset provided during the accuracy measurement step, users can trigger quantization right in the Post-training Optimization Tool. When completed, performance and accuracy can be measured again on the quantized model to understand the impact of int8 execution. Moreover, users can execute a layer-by-layer performance comparison of performance to determine benefits.
The performance and accuracy results for quantized and original models are assembled in a single place. Additionally, using the same model and dataset, developers can try models on CPU, integrated GPU, and other supported devices, and see all results together. This helps users to make an informed decision when selecting hardware and software pairings. Moreover, to further help developers make informed decisions on the best hardware that runs on their specific workload, the new Intel® DevCloud for the Edge enables developers to try new Intel architecture and run edge AI workloads through remote access before purchasing the hardware.
Figure 4. Performance comparison of optimized model vs. original one.
Quantized models are available by running the Model Optimizer and converting the model to Intermediate Representation (IR).
5. To batch or not to batch?
Deep learning performance is frequently considered to be derived from computations that are required to compute topology and computations that are available on the system. However, this GPU-centric approach is a simplistic way to evaluate performance. Many factors influence the final performance of a model, including input resolution, computation precision, the topology structure itself, layer types and the target hardware architecture. Some target and topology combinations work best with increased batch size and some with multiple parallel inference streams, while some actually suffer from those options. Predicting performance is extremely complicated; hence, we find that the traditional method of trial and run still works best.
That is why the Deep Learning Workbench not only provides detailed performance details, but also allows developers to try different execution options for different combinations of models and targets. It automates multiple trials with different parameters and builds a graph with performance and latency for each scenario. Users can select the range of parameters used and see results for all parameters in a convenient manner.
Figure 5. Performance chart to assist in best execution parameters selection.
The Intel Distribution of OpenVINO toolkit enables a streamlined, end-to-end development experience from taking a trained model to deploying across Intel architecture with the tools and capabilities added to the toolkit release by release. What’s more, by creating the toolkit’s Deep Learning Workbench, we have further simplified this workflow with convenient user interface and set of customization capabilities. From the model import process to calibration/quantization, model optimizations and other measurements, the Deep Learning Workbench is designed to make development significantly easier. We are committed to ongoing improvements; we encourage users to give it a try and send us your feedback!
You can also view our YouTube library of Deep Learning Workbench videos here.
Principal Engineer, Internet of Things Group, Intel
Senior Software Engineer, Machine Learning Performance Group, Intel
Engineering Manager, Intel Architecture, Graphics and Software, Intel