What you need to know about ray tracing and NVIDIA's Turing architecture
RT cores and Tensor cores
Real-time ray tracing acceleration
The new RT cores in each SM are at the heart of Turing's ray tracing acceleration capabilities. I already talked briefly about what ray tracing is at the beginning of the article, and now it's time to look at how Turing enables it all.
However, while Turing GPUs do enable real-time ray tracing, the number of primary or secondary rays cast per pixel or surface location varies based on many factors, such as scene complexity, resolution, and how powerful the GPU is. This means you shouldn't expect hundreds of rays to be cast per pixel in real-time.
Instead, NVIDIA says far fewer rays are actually needed per pixel taking advantage of the RT cores' real-time ray tracing acceleration and advanced de-noising filtering techniques.
The crux of the matter is something called BVH traversal, short for Bounding Volume Hierarchy. This is basically a method for optimizing intersection calculations, where objects are bounded by larger, simpler volumes.
GPUs without dedicated ray tracing hardware would need to perform the process of BVH traversal using shader operations, requiring thousands of instruction slots per ray cast to check against successively smaller bounding boxes in a BVH structure until possibly hitting a polygon. The color at the point of intersection would then contribute to the final pixel color.
In short, it's extremely computationally intensive and impossible to do on GPUs in real-time without hardware-based ray tracing acceleration.
NVIDIA's solution is to have the Turing RT cores handle all the BVH traversal and ray-triangle intersection testing, which saves the SMs from spending thousands of instruction slots per ray.
The RT cores comprises of two specialized units. The first carries out the bounding box tests, while the second performs ray-triangle intersection tests and reports on whether it's a hit or not back to the SM. This frees up the SM to do other graphics or compute work.
Turing's final highlight is NVIDIA NGX, which is a new deep learning technology stack that is a part of NVIDIA's RTX platform. NGX utilizes deep neural networks to perform AI-based functions capable of accelerating and enhancing graphics, among other things.
NGX relies on the Turing Tensor cores for deep learning-based operations, and it does not work on older architectures prior to Turing.
Turing uses an improved version of the Tensor cores first introduced in the Volta GV100 GPU. For instance, FP16 is now fully supported for workloads that require higher precision.
According to NVIDIA, the Turing Tensor cores significantly speed up matrix operations and are used for both deep learning training and inference operations, in addition to new neural graphics functions.
Of these, the Tensor cores excel in particular at inference computations, where useful information is inferred and delivered by a trained deep neural network based on a specified input. This includes things like identifying images of friends in Facebook photos and real-time translations of human speech, but gamers are probably most interested in the Tensor cores' ability to improve image quality and produce better looking games with a smaller performance hit.
Deep Learning Super Sampling (DLSS)
NGX encompasses many things, but the one NVIDIA gave the most attention to is something called deep learning super sampling, or DLSS. This can be thought of as a new method of anti-aliasing that helps reduce jagged lines and prevent blocky images. However, the key difference is that it doesn't run on the shader cores, which frees them up to do other work.
In a sense, this is free AA, where you get better looking graphics without the usual performance hit. Turning MSAA on in a game like Deus Ex: Mankind Divided is enough to cripple some of the most powerful systems, and DLSS offers a possible way around that.
In modern games, rendered frames go through post-processing and image enhancements that combine input from multiple rendered frames in order to remove visual artifacts such as aliasing while still preserving detail.
DLSS applies AI to this process and is supposedly capable of a much higher quality than temporal anti-aliasing (TAA), a shader-based algorithm that combines two frames using motion vectors to determine where to sample the previous frame.
TAA renders at the final target resolution and then combines frames, losing detail in the process. However, NVIDIA says DLSS permits faster rendering at a lower input sample count, and infers a result that should rival TAA but requires approximately half the shading work in the process.
The interesting part is that DLSS can be "trained", where it learns how to produce the desired output based on large numbers of super high quality images. NVIDIA says it collected reference images rendered using 64x super sampling, where each pixel is shaded at 64 different offsets instead of just one. This results in a high level of detail and excellent anti-aliasing results.
The DLSS network is then trained by trying to match the 64x SS output frames with its own, measuring the differences between the two, and making the necessary adjustments.
Eventually, DLSS learns to produce results that come close to that of 64x SS, while avoiding problems that can arise in more challenging scenes, such as blurring, disocclusion (where a previously occluded object becomes visible), and unwanted transparency.
The biggest bonus is that RTX cards will supposedly run up to twice as fast as previous-generation GPUs using conventional anti-aliasing, assuming the game supports DLSS.
And that's really a biggest caveat with NVIDIA's RTX cards. The new features all sound great on paper, but that's only if there's robust developer support in the long run. Still, things are looking up, with nine new games announcing support for DLSS today.