Obsessed with technology?
Subscribe to the latest tech news as well as exciting promotions from us and our partners!
By subscribing, you indicate that you have read & understood the SPH's Privacy Policy and PDPA Statement.
News
News Categories

NVIDIA unveils the Tesla P100, 15-billion transistor, hyperscale datacentre GPU

By Vijay Anand - on 6 Apr 2016, 7:19am

NVIDIA unveils the Tesla P100, 15-billion transistor, hyperscale datacentre GPU

This is the Tesla P100 - the world’ first high performance computing unit for cloud servers.

Just unveiled at the 2016 GPU Technology Conference is the most advanced compute accelerator ever. Based on the new NVIDIA Pascal GP100 GPU and powered by new technologies, the Tesla P100 delivers the highest absolute performance for high performance computing (HPC), technical computing, deep learning, and many computationally intensive datacenter workloads. It is also targeted at web/cloud service providers and researchers.

Just as we predicted, deep learning and AI evolution are key for NVIDIA to fulfill their vision of autonomous cars in the near future. While the real-time monitoring of road conditions, aggregating sensor data and other calculations are performed by the NVIDIA Drive PX engine, back-end training and deep neural networks can benefit from a newer GPU schema to really accelerate the learning processes or crunch computationally intensive workloads.

What are the highlights of the Tesla P100?

It is the first GPU and compute accelerator to be based off Pascal – a brand new GPU architecture that’s fabricated from a 16nm FinFET process technology. Despite the new process technology, its die size is 600mm2 to house the 15.3 billion transistors of the new GPU. That's not counting the memory dies on the package, which combined with the GPU, total up to 150 billion transistors!

There a total of 56 Streaming Multiprocessors (SMs) on the Tesla P100 GPU, although the GP100 GPU core supports up to 60 SMs. That just means higher tier products will come in due course. In total, the Tesla P100 will have 3,584 FP32 (single precision) CUDA cores and about half that amount for FP64 (double precision) CUDA cores. That’s quite a bit more processing throughput over its predecessors. Here’s a quick rundown of how it compares with its other Tesla solutions:-

NVIDIA Tesla GPU performance compared.

Memory topology is completely new as it uses Stacked memory on the same package as the GPU to enable vastly faster data transfers than traditional GDDR memory - similar to what we've seen from AMD's High Bandwidth Memory (HBM) technology. The difference here is that NVIDIA has adopted 16GB of the newer second-generation HBM technology that offers 720GB/s bandwidth that far exceeds first-gen HBM technology. That and the use of higher density memory dies together offer a dramatically increase in performance. Additionally, ECC is available penalty free.

This will greatly help improve responsiveness for high-end computing tasks like parallel computing, graphics rendering and machine learning - ideal for NVIDIA's new GPUs and their intended use.

At the same time, such an arrangement allows for building more compact GPUs, with boosted throughput and efficiency. Indeed, the Tesla P100 does look far more compact than other GPU products in the past. Over on its back, the Tesla P100 uses a new chip to chip communication link call NVLink. It is a purpose-specific high speed GPU interconnect to replace the aging all-purpose PCIe bus between CPU and GPU, as well as between multiple GPUs. According to NVIDIA, its theoretical maximum bandwidth of 160GB/s per GPU bi-directionally between peers. This is vastly superior to data the transfer ceiling of PCIe Gen 3.0, which is capped at 16GB/s. This greatly benefits applications like deep learning that have high inter-GPU communication needs. The NVLink technology is jointly developed by NVIDIA and IBM.

Left: The Tesla P100 Accelerator (Front) with the GPU and memory dies on the same package. Right: The rear of the Tesla P100 showing off the NVLink connectivity.

Another highlight is the addition of unified memory space which will allow applications to tap on the processing capabilities of both CPU and GPU more effectively. This is because developers don’t have to allocate memory resources explicitly between the CPU and GPU anymore, as the unified memory space is accessible by both processing and graphical cores.

We'll be touching on the Pascal GP100 GPU in more detail in another article, but for now, here's how the latest Tesla P100 compares with its predecessors and see what industry partners have to say about the new Tesla P100:-

When is it available?

As impressive as the Tesla P100 hyperscale datacenter GPU sounds, it’s not expected to be out in retail and partner vendors till Q1 of 2017!

So what gives? The Tesla P100 is actually in production now and it can ship soon. However, NVIDIA is reserving the initial supply for their DGX-1 rack-based supercomputer that comes configured with 8 x Tesla P100 16GB GPUs. The DGX-1 is slated for release in June this year and you can check out more details of why it has a six-figure price tag in our adjoining news piece.

Source: NVIDIA (1), (2)

Loading...