Obsessed with technology?
Subscribe to the latest tech news as well as exciting promotions from us and our partners!
By subscribing, you indicate that you have read & understood the SPH's Privacy Policy and PDPA Statement.
News Categories

NVIDIA boosts deep learning inferencing capabilities with updated tools and partnerships

By Vijay Anand - on 28 Mar 2018, 1:44am

NVIDIA boosts deep learning inferencing capabilities with updated tools and partnerships

NVIDIA CEO Jen-Hsun Huang, "We created an architecture that NVIDIA was willing to dedicate themselves – Compute Unified Device Architecture (CUDA)". This is now a fundamental element in enabling GPU computing for deep learning that is now the cornerstone for AI. (Image source: NVIDIA)

Last year we reached an important milestone - the Big Bang of Modern AI. This is where systems have the ability to learn and identity information by extracting raw data from imagessensory perception (through sensor arrays such as speed, temperature, pressure, etc.), speech recognition, natural language processing, self-caption videos, robots/systems learning through computer vision, deep learning, deep neural networks learning, training, inferencing or even questioning other DNNs.

At GTC 2018, NVIDIA’s CEO Jensen Huang was quick to convey how much GPU acceleration for deep learning inferencing has gained traction and how their new tools and partnerships will expand its potential inference market to 30 million hyperscale servers worldwide.

Quick FAQ: What’s the difference between deep learning training and inferencing? In short, Inferencing is where capabilities learned during deep learning training (encapsulated as a trained model) are put to work by presenting new data. Here’s a more detailed explanation.

TensorRT version 4

Today, NVIDIA unveiled their latest TensorRT software stack, now in version four, to accelerate deep learning inferencing across a broad range of applications such as in hyperscale datacenters, embedded and automotive GPU platforms by rapidly optimizing, validating and deploying the trained neural networks.

The updated TensorRT 4 in combination with a Tesla V100 GPU is now up to 190x faster than a single socket Intel Skylake 6140 server when running ResNet50 v1 model (a deep convolutional neural learning network for image recognition).

To recap, TensorRT is a library created for optimizing deep learning models for real-time production deployment that delivers instant responsiveness by maximizing throughput and efficiency of deep learning applications. It takes trained neural nets – defined with 32-bit or 16-bit operations – and optimizes them for reduced precision INT8 operations.

TensorRT integration with Google’s TensorFlow 1.7

In a move to further streamline development efforts, NVIDIA and Google have combined forces to integrate TensorRT into Google’s latest TensorFlow version 1.7. If you recall, TensorFlow is a set of software libraries that runs on top of Google’s deep learning neural networks and this integration makes it easier to execute deep learning inferencing application on GPUs, which undoubtedly have the upper hand in such workloads.

According to Ian Buck who is the VP and GM for Tesla Data Center Business at NVIDIA, TensorFlow 1.7 dramatically improves inferencing performance coupled with NVIDIA’s Tesla V100. For a benchmark of sorts, the previous TensorFlow version equipped with the previous best-in-class hardware coped with inferencing 300 images per second. The new TensorFlow 1.7 coupled with the latest hardware allows processing up to 26,000 images per second. He also referenced that a typical CPU in general only manages 11 images per second. Now that's a massive speed-up.

The TensorFlow team is collaborating very closely with NVIDIA to bring the best performance possible on NVIDIA GPUs to the deep learning community. TensorFlow’s integration with NVIDIA TensorRT now delivers up to 8x higher inference throughput (compared to regular GPU execution within a low latency target) on NVIDIA deep learning platforms with Volta Tensor Core technology, enabling the highest performance for GPU inference with TensorFlow. -- Rajat Monga, Engineering Director at Google.