NVIDIA's new Turing architecture powers the world's first ray tracing GPU
Real-time ray tracing take two
Today at SIGGRAPH 2018 in Vancouver Canada, NVIDIA’s CEO Jensen Huang took the stage for a keynote to unveil the company’s next-generation Quadro RTX family of GPUs, and for once, there were actually many things for the press to take note of…literally. We were bombarded with new terms and lingo that NVIDIA created for the industry because, well, the Quadro RTX is the world’s first Ray Tracing GPU.
But, let’s step back a little bit. If RTX sounds a little familiar, that’s because it is. Back in March, NVIDIA already blew up a storm by showcasing real-time ray tracing in a demo during GDC 2018. Remember this Star Wars clip? Yep, that’s it. Again, Jensen Huang was on stage waxing lyrical about the golden age of graphics. At the time though, RTX seemed like it was just a technology that enabled real-time ray tracing, a combination of software and hardware utilizing Volta-based GPUs and their deep-learning Tensor cores.
Enter Turing architecture and the Quadro RTX family
But as it turns out Volta-based cards like the Quadro GV100 were merely a stepping stone to ray tracing. And RTX is to become a proper hardware component based on another new Turing architecture.
How is Turing different from Volta? Here are some quick stats.
- Still based on a 12nm FinFET process
- Up to 4,608 CUDA cores in a Turing Streaming Multiprocessor (SM)
- Capable of concurrent floating point (FP) and integer (INT) execution with unified L1 cache
- Up to 576 Tensor cores
- New RT Cores for dedicated hardware accelerated ray tracing
- Native 8K DisplayPort with 8K HEVC real-time encoding
- New GDDR6 memory
- 100GB/s NVLink for dual GPU configuration with up to 96GB addressable memory
So, Turing actually does many new things. Firstly, it throws in another type of processing core, the RT Core, which makes the magic of real-time ray tracing happen. Since this is now a proper hardware layer, theoretically any ray tracing API including OptiX, DirectX Raytracing (DXR) and Vulkan would be able to interface with it.
With the RT Cores, NVIDIA has coined a new performance spec for the Quadro RTX and that is GigaRays per second, and yes, it’s exactly what it reads like.
Check this out. Supposedly fully real-time rendered with ray tracing. How many rays? 10 Gigarays/s. Remember Gigarays? Gigarays! #SIGGRAPH2018 #nvidia #quadroRTX #turing pic.twitter.com/tJzeXTpwMc— HardwareZone (@hardwarezone) August 14, 2018
The first generation of Quadro RTX cards based on Turing will be able to push up to 10 GigaRays/s. According to NVIDIA, Turing is about 6X faster than Pascal in terms of real-time ray tracing performance.
Next up are the Tensor Cores, which we already saw in Volta. Similarly, Tensor Cores accelerate AI inferencing. The Tensor Cores on Turing are capable of up to 125 TFLOPS FP16 performance, which is identical to Volta’s specs. Turing however, adds INT8 and INT4 precision support, which could help Turing manage low-precision operations better.
What’s new here is that NVIDIA is using the Tensor Cores for what it calls Deep Learning Anti-Aliasing or DLAA for short. You know how in every police drama, there will be that scene where the cop asks the tech guy to “enhance” a pixelated image till it becomes clear? Well, DLAA is the technology that would eventually make that a reality. NVIDIA is using DLAA as a sort of a hack to reduce overall compute required by rendering on a lower resolution and then applying these AI-powered denoising, upscaling, and anti-aliasing on the final product. At present, it is capable of a 64-sample render, but it can only go up from here.
Lastly, Turing’s Streaming Multiprocessor (SM) features dedicated INT cores, which allow FP and INT operations to be executed in parallel. This theoretically gives it 16 TFLOPS + 16 TIPS of performance. NVIDIA is also claiming a unified L1 cache with double the bandwidth of previous generation architectures, but we’ve yet to see a proper die map of Turing as yet, and NVIDIA hasn’t revealed die allocation of the Tensor, RT and CUDA cores.
On a final note, you’d have noticed NVIDIA is using GDDR6 on Turing, while Volta uses HBM2. While we did ask NVIDIA on this, it seems like it comes down to the best choice at the time of design. My own speculation though, is that NVIDIA is clearly keeping Volta on a separate track specifically for HPC applications, which would benefit from HBM2’s higher bandwidth-per-watt and bus width, while Turing’s target of the visual effects and design industry has use for more capacity.
TL;DR edition: Turing has more FLOPS, OPS and IPS than any of its predecessors, plus it now comes with Rays too.
Quadro RTX models
If you're not the least bit interested in a US$10,000 Quadro graphics card, know that all those cool demos NVIDIA has shown were created with gaming tools, namely Unreal Engine 4. With DirectX and Vulkan supporting ray tracing APIs, it's only a matter of time before Turing's architecture makes an entrance in the next GeForce product and GamesCom 2018 is just a week away. Hold your breath.