NVIDIA Announces TensorRT 5 and TensorRT Inference Server

TensorRT5_Feature

September 12, 2018 – At GTC Japan, NVIDIA announced the latest version of the TensorRT high-performance deep learning inference optimizer and runtime.

TensorRT 5 provides support for the new Turing architecture, new optimizations and INT8 APIs that achieves up to 40x faster inference over CPU-only platforms.

This latest version dramatically speeds up inference of recommenders, neural machine translation, speech, and natural language processing apps.

TensorRT 5 Highlights:

  • Speeds up inference by 40x over CPUs for models such as translation using mixed precision on Turing Tensor Cores
  • Optimizes inference models with new INT8 APIs
  • Supports Xavier-based NVIDIA Drive platforms and the NVIDIA DLA accelerator for FP16

TensorRT 5 will be available to members of the NVIDIA Developer Program.

The TensorRT inference server is a containerized microservice that maximizes GPU utilization and runs multiple models from different frameworks concurrently on a node. It leverages Docker and Kubernetes to integrate seamlessly into DevOps architectures.

Learn more >

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top