The NVIDIA DGX-2 is the world’s first 2-petaflop single server supercomputer
10x performance uplift in just six months? NVIDIA's new DGX-2 supercomputer pulls it off!
Today, NVIDIA unveils the DGX-2 supercomputer server that packs a whopping 16 Tesla V100 32GB NVLink GPU modules.
The magic sauce - NVSwitch interconnect fabric
Now how is that possible when the maximum normal scaling is eight GPUs over NVLink? For this, NVIDIA put together a brand new 2-billion transistor NVSwitch interconnect fabric that can flexibly connect any topology of NVLink-based GPUs and has five times higher bandwidth than the best PCIe switch available.
While multiple DGX-1 supercomputers can be interconnected over Infiniband 100GbE connections to scale up the processing capabilities required, it would be subjected to interconnection latencies that would exist outside of the system since the NVLink advantage is lost while clustering multiple systems. Plus, it can't address in-memory workloads that exceed the capabilities of an 8-GPU cluster system supported on the DGX-1.
As such, the immediate goal of the NVSwitch was to relieve the limitations of the DGX-1, and be able to scale it up with even more processing power and in-system unified memory addressability with low latency. To do this, the NVSwitch has 18 full-bandwidth ports, which makes it far more capable of addressing more GPU connections than the six ports on a Tesla V100 GPU.
Inside the 2-petaflop DGX-2 supercomputer
The DGX-2 is the first system to debut NVSwitch, and it has utilized no less than 12 NVSwitch units to enable connecting 16 Tesla V100 NVLink GPUs + the NVLink plane card that interconnects with the triple planes within the DGX-2 containing the GPU array and the CPUs. Here's the high-level topology of how the 12 NVSwitch units are interconnecting the 16 GPUs:-
With all 16 GPUs interconnected and addressable in the same system, NVSwitch has also enabled sharing a larger unified memory space, thus allowing developers to tackle serious large datasets and extremely complex deep learning models.
In certain context, this also implies that NVIDIA has arguably created the world's 'largest GPU' whose cumulative stats are 81,920 CUDA cores, 2,000 TFLOPS of Tensor Core performance, 512GB of HBM2 high speed and low-latency memory for a combined memory throughput of 14.4TB/sec!
Compute power aside, here's what else the dense DGX-2 supercomputing server consists of in this exploded diagram:-
What does the existence of the DGX-2 mean to anyone and what will it tackle in the real world?
The new DGX-2 system draws upon from a wide range of industry-leading advances developed by NVIDIA at all levels of the computing stack such as the new NVIDIA CUDA and TensorRT frameworks, collaboration with Google’s TensorFlow, upgraded Tesla V100 GPUs with double the high speed low latency HBM2 memory at 32GB and the new NVIDIA NVSwitch to build systems with more GPUs hyperconnected to each other.
These updates have allowed the DGX-2 system to reach the two-petaflop compute performance milestone – the equivalent of 300 servers occupying 15 racks of data center space while being 60x smaller and 24x more power efficient. This representation by NVIDIA paints the proposition of the DGX-2 beyond doubt:-
Indeed, the DGX-2 is a powerful and power-efficient supercomputer in its own right for it can effortlessly glimpse traditional supercomputers of a few years of age that occupy entire rooms. Purpose-built for data scientists pushing the frontiers of deep learning research and computing, who knows what breakthroughs the DGX-2 supercomputer will make in the near future.
Already, the combined advances made for the DGX-2 is making it fare more than twice as powerful as the DGX-1, but in fact, in certain memory-intensive workloads like training FAIRSeq (a neural network model for language translation), the new DGX-2 sees a phenomenal 10x performance increase. What took the DGX-1 supercomputer 15 days to train, the DGX-2 does it in 1.5 days!
Regarding positioning, the US$399,000 DGX-2 joins the growing lineup of NVIDIA supercomputer offerings for data scientists along with the DGX-1 and the DGX Station.