The NVIDIA DGX-2 is the world’s first 2-petaflop single server supercomputer

Last year's DGX-1 packed eight Tesla V100 GPUs, but the new DGX-2 supercomputer packs 16 Tesla V100 32GB NVLink GPU modules. How does it achieve that?

By Vijay Anand - 28 Mar 2018

The NVIDIA DGX-2 supercomputer and half its first layer of innards exposed.

Last year’s DGX-1 with its eight updated Tesla V100 NVLink GPU modules was already nearing a petaflop of computational power.

Today, NVIDIA unveils the DGX-2 supercomputer server that packs a whopping 16 Tesla V100 32GB NVLink GPU modules.

The magic sauce - NVSwitch interconnect fabric

The 2-billion transistor, 18-port full bandwidth interconnector - the NVSwitch.

Now how is that possible when the maximum normal scaling is eight GPUs over NVLink? For this, NVIDIA put together a brand new 2-billion transistor NVSwitch interconnect fabric that can flexibly connect any topology of NVLink-based GPUs and has five times higher bandwidth than the best PCIe switch available.

While multiple DGX-1 supercomputers can be interconnected over Infiniband 100GbE connections to scale up the processing capabilities required, it would be subjected to interconnection latencies that would exist outside of the system since the NVLink advantage is lost while clustering multiple systems. Plus, it can't address in-memory workloads that exceed the capabilities of an 8-GPU cluster system supported on the DGX-1.

As such, the immediate goal of the NVSwitch was to relieve the limitations of the DGX-1, and be able to scale it up with even more processing power and in-system unified memory addressability with low latency. To do this, the NVSwitch has 18 full-bandwidth ports, which makes it far more capable of addressing more GPU connections than the six ports on a Tesla V100 GPU.

The full bidirectional bandwidth of each NVSwitch is a staggering 900GB/s. That's derived by 18 links x 25GB/s per NVLink and per direction.

Here's the 2-billion transistor NVSwitch in the flesh (right), compared to the Tesla V100 GPU (left).

Inside the 2-petaflop DGX-2 supercomputer

Here's the full view of the DGX-2's top and bottom internal structure laid out. Of course the most eye-catching is the 16 Tesla V100 32GB NVLink GPU modules that give rise to its 2 PFLOPS of compute performance.

The DGX-2 is the first system to debut NVSwitch, and it has utilized no less than 12 NVSwitch units to enable connecting 16 Tesla V100 NVLink GPUs + the NVLink plane card that interconnects with the triple planes within the DGX-2 containing the GPU array and the CPUs. Here's the high-level topology of how the 12 NVSwitch units are interconnecting the 16 GPUs:-

With all 16 GPUs interconnected and addressable in the same system, NVSwitch has also enabled sharing a larger unified memory space, thus allowing developers to tackle serious large datasets and extremely complex deep learning models.

In certain context, this also implies that NVIDIA has arguably created the world's 'largest GPU' whose cumulative stats are 81,920 CUDA cores, 2,000 TFLOPS of Tensor Core performance, 512GB of HBM2 high speed and low-latency memory for a combined memory throughput of 14.4TB/sec!

Compute power aside, here's what else the dense DGX-2 supercomputing server consists of in this exploded diagram:-

What does the existence of the DGX-2 mean to anyone and what will it tackle in the real world?

The new DGX-2 system draws upon from a wide range of industry-leading advances developed by NVIDIA at all levels of the computing stack such as the new NVIDIA CUDA and TensorRT frameworks, collaboration with Google’s TensorFlow, upgraded Tesla V100 GPUs with double the high speed low latency HBM2 memory at 32GB and the new NVIDIA NVSwitch to build systems with more GPUs hyperconnected to each other.

These updates have allowed the DGX-2 system to reach the two-petaflop compute performance milestone – the equivalent of 300 servers occupying 15 racks of data center space while being 60x smaller and 24x more power efficient. This representation by NVIDIA paints the proposition of the DGX-2 beyond doubt:-

300 dual-processor servers can cost up to U$3 million and consume 180kW of power. Not to mention a substantial amount of rack space.

We'll let the image and the stats do the talking for the DGX-2.

Indeed, the DGX-2 is a powerful and power-efficient supercomputer in its own right for it can effortlessly glimpse traditional supercomputers of a few years of age that occupy entire rooms. Purpose-built for data scientists pushing the frontiers of deep learning research and computing, who knows what breakthroughs the DGX-2 supercomputer will make in the near future.

Already, the combined advances made for the DGX-2 is making it fare more than twice as powerful as the DGX-1, but in fact, in certain memory-intensive workloads like training FAIRSeq (a neural network model for language translation), the new DGX-2 sees a phenomenal 10x performance increase. What took the DGX-1 supercomputer 15 days to train, the DGX-2 does it in 1.5 days!

The gains made since DGX-1V was available late last year and today's DGX-2; it's a phenomenal leap in just six months!

Similarly, NVIDIA looked back to compare how far they've progressed since they' were involved in the world of deep learning from the Fermi architecture days - a 500x improvement in training AlexNet!

Regarding positioning, the US$399,000 DGX-2 joins the growing lineup of NVIDIA supercomputer offerings for data scientists along with the DGX-1 and the DGX Station.

NVIDIA's CEO felt that a system of its class should likely command a US$1.5 million price tag, but he's feeling generous enough to price it at US$399,000. It's still a big sum, and it's certainly more expensive than two DGX-1 machines, but it's a justified premium for its current positioning and the newer Tesla V100 GPUs with double the graphics memory.

NVIDIA's family of supercomputers. From left to right - the DGX-2, DGX Station and the DGX-1.

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.

The NVIDIA DGX-2 is the world’s first 2-petaflop single server supercomputer

The magic sauce - NVSwitch interconnect fabric

Inside the 2-petaflop DGX-2 supercomputer

What does the existence of the DGX-2 mean to anyone and what will it tackle in the real world?

Tags

Share this article