NVIDIA's next-gen gaming graphics card, GeForce GTX 980 revealed!

Maxwell strikes again, but this time NVIDIA has supercharged the architecture with new features to ensure it would stay at the top of its game for some time to come. It's also clocked much faster, is more power efficient and is ready to tackle 4K gaming than any of its predecessors. Dive in and find out some of its new defining features that makes it ready for next generation gaming!

** Updated on 1 October - Added detailed GPU architecture information and GeForce GTX 980 photos.

Second generation Maxwell takes root

Maxwell strikes again, but this time NVIDIA has supercharged the architecture with new features to ensure it would stay at the top of its game for some time to come. It's also clocked much faster than its predecessors, breaching the 1GHz mark for GPU clock speeds and has 4GB of high speed GDDR5 memory to tackle gaming at 4K resolution with ease. Bearing the Maxwell architecture, it's also more power efficient than any of its predecessors.

Call it a 'Maxwell Refresh' if you will, for those who are keenly following the updates in the graphics card segment, you would recall that NVIDIA actually introduced the original Maxwell architecture to rejuvenate its low-end series with the GeForce GTX 750 Ti and the GTX 750. In official documentation, these GPU models were classified as "Maxwell first generation" that focussed on power efficiency and greater performance per watt than Kepler.

 

So why did Maxwell first debut at the low-end?

As was revealed in CES 2014 when the Tegra K1 debuted, all of NVIDIA's new generation GPU architectures will be designed to scale in performance and capabilities that's fit for mobility needs, all the way to workstations and servers. Prior to the this, the GPU design for mobile and embedded computing evolved in a separate and outdated path, requiring distinct teams to be working on Tegra and GeForce product groups. With the internal direction change and consolidation, NVIDIA achieved two things: - firstly, this brings updated and advanced graphics APIs to mobile game developers for more realistic gameplay and easily allowing more desktop games to be ported over to the mobile OS equivalents with ease. Secondly, NVIDIA can collectively progress its graphics engineering forward that would benefit all its product groups collectively, thus working more efficiently too.

This led to the introduction of NVIDIA's then new Maxwell first generation that focussed on power/design efficiency and refreshed its low-end GeForce line-up.

Half a year on, today, Maxwell second generation arrives and it powers the brand new GeForce GTX 980 and GTX 970 top-end graphics cards brimming with new features and technologies.

 

4 key Features of the Maxwell second generation

Before we march on with the key specs of the new generation gaming cards to lust for, we'll briefly touch on the new technologies that the Maxwell second generation brings to the table - some of which strengthen the next evolution in gaming, while others focus on delivering even more with existing graphics horsepower.

 We'll be covering these features in great detail over the next few pages.

 

The GM204 GPU powering the GeForce GTX 900 series

Getting down to business, the new GM204 GPU used for the GeForce 900 series is based on the 28nm manufacturing process, similar to the original Maxwell architecture on the GM107 chip that debuted on the GeForce GTX 750 Ti. This time round, the Maxwell second generation architecture builds on the breakthroughs of the GM107 such as high power efficiency that was derived from both the die shrink from Kepler and of the newer internal reconfiguration of the processing blocks deep within the core. For example, the Kepler streaming multiprocessor (SM) unit has 192 CUDA processing cores, but the newer SM on the GM107 (and the GM204) feature 128 CUDA cores which are easier to address and manage because they correspond to the power 2 and thus are more efficient to utilize. Also, unlike Kepler, each SM in Maxwell is further partitioned into four processing blocks of 32 CUDA cores, each with its own control logic (instruction buffer, schedulers, dispatch units and registers). The overall result is an improved and more efficient datapath organization which simplifies scheduling logic, reduces idle time and less time spent waiting for instructions, thus putting the processing cores to better use.

This is the Maxwell Streaming Multiprocessor (SMM) that looks identical to the first generation SMM on the GM107. Interestingly, the PolyMorph Engine (PE) is upgraded to version 3.0, but NVIDIA doesn't seem to say much other than it has twice the number of PE units as opposed to Kepler GK104. This allows it to tackle geometry heavy workloads like Tesselation with ease as performance is at least doubled over Kepler.

This is the Maxwell Streaming Multiprocessor (SMM) that looks identical to the first generation SMM on the GM107. Interestingly, the PolyMorph Engine (PE) is upgraded to version 3.0, but NVIDIA doesn't seem to say much other than it has twice the number of PE units as opposed to Kepler GK104. This allows it to tackle geometry heavy workloads like Tesselation with ease as performance is at least doubled over Kepler.

But that's about the similarities between the GM204 and the GM107 Maxwell-class cores. The structure and improvements made to the GM204 are multi-fold to ensure its a true performance oriented part, but with the bonus goodness that is common for all Maxwell generation GPUs.

Even so, if you pause to consider the naming scheme of past GPU chips, you would realize the the "GM204" naming signifies that it's one rung lower than the top tier GPU model. For example, the Kepler architecture for the GPUs debuted as the GK104 with the GeForce GTX 680 and a more power packed revision came about later with the GK110 chips that was featured on the GTX Titan, GTX 780 and the GTX 780 Ti. As such, NVIDIA's own documentation has squarely compared the new GTX 980 against a more appropriate previous generation GTX 680 than the GTX 780 / 780 Ti. Would this mean the new GeForce GTX 980 is actually slower than the current class leading GTX 780 Ti? Let's take a look at the fully configured GM204 SKU for the GeForce GTX 980.

 

Underneath the GeForce GTX 980 and GTX 970 GPU

The full block diagram of the GM204 core as used by the GeForce GTX 980.

The full block diagram of the GM204 core as used by the GeForce GTX 980.

Like the Kepler GK104 (and older predecessors), the new GM204 also boasts four Graphics Processing Clusters (GPC). In essence, each GPC is an independent block of processing engines that can exist on its own as it contains all the necessary processing stages within its cluster. A full complement of four GPC units round off the GM204's full firepower with double the number of streaming multiprocessors that are also clocked faster (base clock alone is 1,126MHz on the GTX 980), double the ROP units and quadruple the L2 cache size than the GK104. So instead of a pair of 'heavy' SM units per GPC on Kepler, the second generation Maxwell goes with the leaner SM units, but has four of them per GPC. Do the math and you'll get 16 Maxwell SMs (SMM) that combine to give the GM204 a grand total of 2048 CUDA cores on the fully equipped GeForce GTX 980 SKU. This is certainly no match to the GeForce GTX 780 Ti and the Radeon R9 290X that boast much more CUDA/shader cores, but they operate slower than the GeForce GTX 980.

Kepler gen.1 vs. Maxwell gen.2

Kepler gen.1 vs. Maxwell gen.2

Interestingly, the GM107 with its first generation Maxwell architecture has five SM units for its single GPC, but this structure wasn't carried forward in the GM104. If we were to make an educated guess, the higher tier GM200 GPU might easily boast this SM structure for all four GPCs for a total of 20 SMM units or even go as far six GPC clusters for their enterprise GPUs in Titan/Quadro/Tesla class products. But that story is for another day.

The memory subsystem for the GM204 is also geared to tackle higher speed memory with its 256-bit memory interface hooked up to 7Gbps GDDR5 memory that both the GTX 980 and GTX 970 boast - the fastest in the industry. Coupled with a larger L2 cache, there are less requests that would need to be serviced by the GDDR5 memory. There are also improvements to the implementation of memory compression with better delta color compression algorithms. Overall, the better caching and compression schemes ensure less bandwidth is used from writing less pixel data to the graphics memory. This should hopefully help the GPU to be more competitive despite the fact the GTX 780 Ti boasts 336GB/s of memory bandwidth while the GTX 980 comes in with a more conservative 224GB/s memory throughput just based on specs.

NVIDIA claims the improved memory optimization gives the GeForce GTX 980 and GTX 970 an estimated 25% improvement in memory bandwidth over the GTX 680, instead of the 17% improvement without the optimization (thus just banking on the raw frequency difference).

NVIDIA claims the improved memory optimization gives the GeForce GTX 980 and GTX 970 an estimated 25% improvement in memory bandwidth over the GTX 680, instead of the 17% improvement without the optimization (thus just banking on the raw frequency difference).

As for the GeForce GTX 970, it is very similar to the GTX 980, except that it has 13 SM units instead of 16 on the GTX 980. That means a reduction in CUDA cores (down to 1664) and texture mapping units (104), along with slightly lower clock speeds (base clock of 1050MHz). On the board level, the reference GTX 970 loses the ability balance power draw across its three power rails - something that first debuted on the GTX 780 Ti and continues to be featured in the GTX 980. As such, we don't expect the GTX 970 to hit as high an overclock as the GTX 980.

Speaking about power, it's interesting to note that the efficiency of the second generation Maxwell architecture has helped the GeForce GTX 980 and GTX 970 to boast a significantly low TDP of just 165 watts and 145 watts respectively - figures that are almost unheard of for a performance oriented parts. As such, you won't need more than dual 6-pin PCIe Molex power connectors to drive these graphics cards.

Here's how the GeForce GTX 980 and GTX 970 compare to the current stable of top performing GPUs:-

[hwzcompare]

[products=474225,427431,269094,474324,283794,426095]

[width=150]

[caption=NVIDIA GeForce GTX 900 series compared]

[showprices=0]

[/hwzcompare]

Beyond the improvements at the heart of the core, NVIDIA has updated its display output capabilities as the second generation Maxwell now supports 1,045MHz pixel clock as opposed to 540MHz on the previous generation. This allows it to scale up to support much higher resolution displays beyond 4K and in fact supports 5K resolution screens - 5,120 x 3,200 pixels @ 60Hz. The GeForce GTX 980 is also the first NVIDIA GPU to support HDMI 2.0, which now allows it to drive full 4K resolution at 60Hz. The reference card comes with 1 x HDMI 2.0 port, 3 x DP 1.2 connectors and a single dual-link DVI connector. Like Kepler, Maxwell can natively output to four digital displays simultaneously. But if that's not enough, the new Maxwell core is also able to drive four 4K MST displays by a single GPU (up from two in Kepler).

 

Pricing, positioning and availability of GeForce GTX 980 and GTX 970

Now that we're done with the technical aspects of the new GPUs, let's talk about the more practical aspects. The GeForce GTX 980 and GTX 970 will be available immediately worldwide and have a suggested retail price of US$549 and US$329 respectively.

At the same time, NVIDIA has officially chosen to discontinue the existing GeForce GTX 780 Ti, GTX 780 and the GTX 770. Given all the improvements to efficiency that the second generation Maxwell architecture boasts in these new GPUs and NVIDIA's direction to discontinue its previous top-end GPU models, this only signifies that the GeForce GTX 980 and GTX 970 are expected to perform as well as the GTX 780 Ti and GTX 770 SKUs respectively. Traditionally, NVIDIA has never made a deliberate mention of such positioning and change of hierarchy in previous GPU launches and they sort of usually coexist in the market for some time before the market self adjusts itself to the new products. As such, this only serves to reinforce the positioning and expectation of the newcomers.

Furthermore, the new lowered price points also signify that NVIDIA is quite likely to have a higher performing GPU (perhaps the GM200 or GM210) in the not so distant future which we discussed earlier.

Given the small technical differences between the GTX 980 and GTX 970, the latter seems to be quite a catch at just US$329 and we reckon it could be the new sweet spot given its capabilities. Speaking of sweet spots, with the coming of the GeForce GTX 900 series cards, the GeForce GTX 760 which was perched at US$249 will now shift to an even more affordable spot at US$219 - making it a true sweet spot for those with modest budgets. Again, this is a deliberate shift in positioning by NVIDIA and we reckon over time, there will be more changes to the existing GTX 700 series cards that aren't yet discontinued, or until new lower tier GTX 900 series SKUs are commissioned.

 

A close look a the reference GeForce GTX 980

There she is, the GeForce GTX 980 in all her glory.

There she is, the GeForce GTX 980 in all her glory.

Note the array of display connectors - 1 x HDMI 2.0 port, 3 x DP 1.2 connectors and a single dual-link DVI connector. Only 4 of digital displays are usable at any one point of time and you can only have surround spanning three monitors (while the fourth becomes an additional screen for other usage needs). These rules are no different from the current Kepler based GTX 600 and GTX 700 series.

Note the array of display connectors - 1 x HDMI 2.0 port, 3 x DP 1.2 connectors and a single dual-link DVI connector. Only 4 of digital displays are usable at any one point of time and you can only have surround spanning three monitors (while the fourth becomes an additional screen for other usage needs). These rules are no different from the current Kepler based GTX 600 and GTX 700 series.

Note the rear design of the back plate.

Note the rear design of the back plate.

At the crown of the card, you'll see the lit GeForce GTX logo. Also note that the GeForce GTX 980 requires only dual 6-pin power connectors for its power needs. The card's rated TDP is only 165W and so NVIDIA recommends just a modest 500W PSU to power your average system with the new graphics card.

At the crown of the card, you'll see the lit GeForce GTX logo. Also note that the GeForce GTX 980 requires only dual 6-pin power connectors for its power needs. The card's rated TDP is only 165W and so NVIDIA recommends just a modest 500W PSU to power your average system with the new graphics card.

We're not done explaining what's in store by the GM204 GPU of the GeForce GTX 980 and GTX 970.

What we've covered so far are just the technical aspects and product positioning of the newcomers. Over the next few pages, we'll explore the key new features that are enabled by the improved GPU architecture and exclusive to the new GPU only such as DSR, MFAA, VXGI and VR Direct. So read on!

If you're looking for how the best of the newcomers perform, check out our revised performance review of the GeForce GTX 980.

** Updated on 29 September - Added a video that helps relate DSR in actual gameplay.

Dynamic Super Resolution (DSR)

What this feature is designed to address is to improve your game's graphics quality while you're limited to your Full HD resolution monitor - the most commonly used monitor in the market. In other words, you've maxed out your graphical settings, your system is generating far more frame rates than necessary and yet you're not quite satisfied with the game's rendered quality.

How do you improve a game like Dark Souls II to deliver better graphical quality than it was designed for? Well, NVIDIA seems to think that a different approach to rendering the image with DSR would help the game deliver a better a gameplay experience.

How do you improve a game like Dark Souls II to deliver better graphical quality than it was designed for? Well, NVIDIA seems to think that a different approach to rendering the image with DSR would help the game deliver a better a gameplay experience.

A prime example NVIDIA cited is in Dark Souls II where the grass foliage swaying in the distance looks unsettling because the grass stalks look jagged and sort of 'break up' when they sway. This phenomenon happens because when the scene to be rendered is placed against a resolution grid to determine which pixels get lit or not on the screen (based on binary representation), inevitably, you get a stair-stepped effect. You would think that anti-aliasing should resolve this, but this highly depends on where the coverage samples are present with respect to the rendered image. As such, in the example of Dark Souls II, you still get unsatisfactory representation of certain elements on screen despite having maxed out the graphical quality.

NVIDIA proved that if your game is rendered at a higher resolution, for example in 4K, and then using appropriate filtering to downsample the rendered content to fit the 1080p monitor, you still get an upgrade in imaging quality as opposed to rendering at Full HD right off the bat. The reason? Using 4K resolution, the resolution grid is finer and each pixel is smaller - this gives it a higher chance that something as fine as a blade of grass has a higher chance of being rendered more accurately. The resulting outcome works well for Dark Souls II where excess horsepower is put into good use to improve the in-game imaging quality.

NVIDIA's presented a close-up split-screen of Dark Souls II rendered on a 1080p display without DSR and with DSR (right). You can immediately see the difference in the quality of the blades of grass has improved quite drastically.

NVIDIA's presented a close-up split-screen of Dark Souls II rendered on a 1080p display without DSR and with DSR (right). You can immediately see the difference in the quality of the blades of grass has improved quite drastically.

To give you a good feel how it's like in actual gameplay, we've captured this technical demo where NVIDIA's technical marketing director Tom Petersen runs through how DSR works in detail:-

//www.youtube.com/embed/_0v-wcOijsg
As explained earlier, if you look at how the graphics card determines if it's worthwhile to render the pixel to represent the blade of grass, you can see that some pixels aren't lit in the bright shade, thus the blade of glass appears to be 'broken' - exactly how it appears in-game as well. Rendering at a higher resolution using a finer resolution grid means that there are more chances that the coverage samples detect the presence of the blade of grass and render it in a suitable shade appropriately.

As explained earlier, if you look at how the graphics card determines if it's worthwhile to render the pixel to represent the blade of grass, you can see that some pixels aren't lit in the bright shade, thus the blade of glass appears to be 'broken' - exactly how it appears in-game as well. Rendering at a higher resolution using a finer resolution grid means that there are more chances that the coverage samples detect the presence of the blade of grass and render it in a suitable shade appropriately.

Here's an actual in-game close-up photographed from a demo machine's monitor without DSR. This was taken at NVIDIA's event, thus we have to go with photograph. Click to view the 100% cropped photo.

Here's an actual in-game close-up photographed from a demo machine's monitor without DSR. This was taken at NVIDIA's event, thus we have to go with photograph. Click to view the 100% cropped photo.

Now this is an almost similar shot on a side-by-side system, but this has DSR enabled. While you can see DSR helps, its full benefit would be better appreciated in-person while playing the game. Click to view a 100% crop of the photo up-close.

Now this is an almost similar shot on a side-by-side system, but this has DSR enabled. While you can see DSR helps, its full benefit would be better appreciated in-person while playing the game. Click to view a 100% crop of the photo up-close.

All this sounds similar to Supersampling Anti-aliasing? Kind of, because SSAA also renders the image at a higher resolution, but the according to NVIDIA, the real differentiator with DSR is in the filtering techniques used to achieve the downsampled image.

Likewise, NVDIA claims that this works well for a number of other games and is a feature that doesn't require game developers to do anything. DSR will be offered as a function within the next GeForce Experience and should in fact work without user intervention. In other words, you don't have to be concerned whether your system has enough processing power to deal with DSR as GeForce Experience will determine this for you.

You can however tune the DSR setting on a per game level to your preference; for example, you might want it to tone down the resolution to something lower than 4K but still better than FHD, among a few other knobs to be absolutely sure you maintain high frame rates.

However, the level of in-game quality improvement is squarely dependent on each game and as well as the game's own UI if it is able to scale appropriately at a higher resolution and then downsampled to fit the monitor's resolution.

You don't have to worry about grappling with yet another feature as NVIDIA promises to offer DSR via GeForce Experience. This will ensure you reap the benefits from the new developments made by NVIDIA without requiring to know how to make use of it or if your system is able to support it. Just let GeForce Experience do its job while you enjoy your game.

You don't have to worry about grappling with yet another feature as NVIDIA promises to offer DSR via GeForce Experience. This will ensure you reap the benefits from the new developments made by NVIDIA without requiring to know how to make use of it or if your system is able to support it. Just let GeForce Experience do its job while you enjoy your game.

Should you want more control over your DSR enabled gaming, there are a few control options offered that directly correlate to your in-game FPS experience.

Should you want more control over your DSR enabled gaming, there are a few control options offered that directly correlate to your in-game FPS experience.

While Dynamic Super Resolution isn't necessarily a Maxwell second generation hardware dependent feature, it will initially be available only on new graphics cards based on the new Maxwell GPUs like the GeForce GTX 980 and GTX 970. NVIDIA said they would consider enabling this feature to previous generation graphics cards at a later time, but suffice to say that you'll need more horsepower to pull off such a feature and hence it makes sense to just offer this on newer products. Still, we believe owners of previous generation high-end cards or multi-GPU configurations should be entitled to benefit from this enhancement as well.

Multi-Pixel Programmable Sampling with Multi-Frame Sampled AA (MFAA)

The previous feature dealt with improving gaming quality when you have excess image processing horsepower for the game played. This next feature deals with the opposite scenario such as in Battlefield 4 that simply saps up your frame rates if you crank up the image quality setting all the way up. In this case, you would wish you had a more powerful graphics subsystem. But what if there was a way to deliver a similar level of image quality, but with more performance headroom?

Multi-Frame Sampled AA (MFAA) is NVIDIA's next trick up their sleeves to accomplish this. Unlike traditional MSAA that uses a fixed number of coverage samples per pixel on a rotated grid pattern, MFAA will use half the number of coverage samples but the coverage sample pattern differs per pixel and per frame. The resultant per-frame outcome will then be treated with a Temporal Synthesis Filter between these frames that would then seem to rival or equal what MSAA achieves, but at nearly half the performance penalty. Hence the namesake of this AA mode, which relies on multiple frames to derive its cumulative benefit.

Taking the most commonly used antialiasing level of 4x MSAA, 4x MFAA can rival 4x MSAA image quality but at 2x MSAA processing requirements. You could also benefit from the reduced performance overhead to get better imaging quality at no extra penalty (such as 8x MSAA quality at 4x MSAA performance levels). Let's step through this visually.

The difference between no antialiasing (AA) and 4x MSAA should be pretty straightforward to most gamers and graphics card enthusiasts.

The difference between no antialiasing (AA) and 4x MSAA should be pretty straightforward to most gamers and graphics card enthusiasts.

As iterated earlier, MFAA uses less coverage samples but alters the sample pattern per pixel and in every frame. Take note of the math that indicates what level of the pixel is covered based on the coverage sample locations.

As iterated earlier, MFAA uses less coverage samples but alters the sample pattern per pixel and in every frame. Take note of the math that indicates what level of the pixel is covered based on the coverage sample locations.

Applying a temporal filter between frames to counter frame variations, the final outcome of MFAA is unveiled. Take note of the math that determines the resultant level of pixel coverage.

Applying a temporal filter between frames to counter frame variations, the final outcome of MFAA is unveiled. Take note of the math that determines the resultant level of pixel coverage.

As such, a "4x MFAA" setting achieves the equivalent of 4x MSAA from a visual standpoint but with much less processing overhead. Efficiency is the key here to unlock more performance by achieving the same or similar result.

As such, a "4x MFAA" setting achieves the equivalent of 4x MSAA from a visual standpoint but with much less processing overhead. Efficiency is the key here to unlock more performance by achieving the same or similar result.

To prove this point further, NVIDIA showed us a side-by-side comparison using the Portal 2 game. The resulting quality is indistinguishable on MSAA or MFAA.

To prove this point further, NVIDIA showed us a side-by-side comparison using the Portal 2 game. The resulting quality is indistinguishable on MSAA or MFAA.

In an actual in-game photo taken off a monitor, this is a 100% crop of 2x MSAA...

In an actual in-game photo taken off a monitor, this is a 100% crop of 2x MSAA...

… while this is 4x MFAA which gives superior antialiasing that rivals 4x MSAA but at far less processing overhead; about equivalent to 2x MSAA.

… while this is 4x MFAA which gives superior antialiasing that rivals 4x MSAA but at far less processing overhead; about equivalent to 2x MSAA.

Yet again, here's another example with a pinwheel - on the left half is 4x MSAA while the right half with the green border signifies 4x MFAA. It's practically impossible to tell them apart.

Yet again, here's another example with a pinwheel - on the left half is 4x MSAA while the right half with the green border signifies 4x MFAA. It's practically impossible to tell them apart.

To get MFAA working, you'll need at least 40 FPS sustained with the new AA technique, otherwise you'll end up noticing flickering effects since the sample patterns will differ between frames. To manage this, MFAA will once again be left to GeForce Experience to determine whether your system has enough power to deliver this experience in your respective game. This will take away the guesswork and trial-and-error on your part, while giving gamers what they need most.

In terms of exact details on how much you stand to gain, NVIDIA estimates you'll get at least 30% more performance on 4x MFAA than if you were still running it on 4x MSAA. At 4x MFAA, since NVIDIA states its workload is similar to 2x MSAA, we queried if the extra Temporal filtering needs contribute to higher processing needs. In short, 4x MFAA will be a tad slower than 2x MSAA only because the Temporal filtering would sap up another 1 or 2% more performance overhead than standard 2x MSAA.

MFAA nets you more performance than MSAA can accord you. Fortunately, AA quality doesn't get compromised.

MFAA nets you more performance than MSAA can accord you. Fortunately, AA quality doesn't get compromised.

** Updated on 28 September - More details and video added for the section of "Debunking a myth".

The problem - recreating realism is expensive

These days, the realism of any game comes down to how close an in-game scene can match that of what one observes in real life. While accurate structural representation of objects and view-dependant level of detail are fairly well handled dynamically and on-the-fly, it is the accurate lighting and shading within the game that makes or breaks the experience these days.

In our real world, everything that we experience and observe is lit by direct and indirect lighting. Representing the former accurately has been attainable for some time now given the graphics horsepower that we're endowed with along with object representation, factoring in material properties and much more.

However, capturing the effects of indirect lighting - as defined by NVIDIA: photons that travel from the light source, hit one object and bounce off of it and then hit a second object, thus indirectly illuminating that object - to complement direct lighting has proven to be the real challenge as it's computationally very intensive. And without indirect lighting factored into the mix, the in-game scenes can look harsh or even artificial.

Here's a perfect example with a scene rendered in direct lighting only.

Here's a perfect example with a scene rendered in direct lighting only.

And this is the same scene with global illumination enabled to capture indirect light sources to more accurately represent the real world.

And this is the same scene with global illumination enabled to capture indirect light sources to more accurately represent the real world.

To overcome that limitation, you might be familiar with the term "global illumination", which is a lighting system to model indirect lighting. Even so, most games employ pre-computed lighting, screen-space effects (such as reflections and ambient occlusions), virtual point lights along with other tweaks/post-processing and specific artwork to cater reproducing the intended lighting effects. These pre-baked techniques are used primarily for performance reasons.

The downside of pre-computed and assisted lighting techniques are that it's not dynamic and is impossible to update indirect lighting characteristics when major in-game changes occur such a point source of light is shot down or objects/scenes are malformed or destroyed. As such, it's suitable for static areas of a scene and not for animatable characters and objects. Given the fact that games are increasingly being designed to cater to dynamic terrain and levels that correspond to actual user intervention, real-time global illumination calculation is needed to keep pace with the realism expected and experienced in-game as we progress game engines, games and hardware year after year.

 

The solution - VXGI acceleration

NVIDIA engineers actually came up with fast approximate method to compute global illumination dynamically in 2011. It's still computationally intensive and thus new software algorithms and special hardware acceleration built into the second generation Maxwell architecture ensure that dynamic global illumination does indeed take off this time round.

Abbreviated as VXGI, this is short for Voxel Global Illumination. Since indirect lighting is often unfocused and that the first few bounces of the photons from the originating light source have the most energy present, VXGI's design goals were to keep these two aspects in mind to best represent indirect global real-time lighting.

VXGI is executed in a three-step process implementing the Voxel Cone Tracing technique which we'll briefly sum up here:-

Step 1: Voxelization

While a "pixel" represents 2D point in space, a "voxel" represents a small cube of 3D space. Since realism of a scene is paramount to how light reflects off objects, which is indirect lighting, it's important to capture this information in all three dimensions. Just like how "rasterization" determines the value of a scene in 2D space for every pixel, "voxelization" is the equivalent of that as it determines the value of a scene at every voxel.

Using VXGI, two aspects of information is captured a each voxel - the fraction of the voxel that contains an actual object and the properties of light (such as direction and intensity) coming from it, which includes indirect light bouncing off it. The resulting voxel coverage calculation is represented in a visualization such as the following showing how a rasterized image appears when voxelized.

This is a summary of Voxelization.

This is a summary of Voxelization.

On the left is a simple scene, while on the right is a visualization of the voxelized result. Obviously, empty voxels aren't drawn, while those fully covered are in red and partially covered are represented by a shade between blue (minimal coverage like the edge of an intersection that’s not fully covering a voxel) and red (fully covered).

On the left is a simple scene, while on the right is a visualization of the voxelized result. Obviously, empty voxels aren't drawn, while those fully covered are in red and partially covered are represented by a shade between blue (minimal coverage like the edge of an intersection that’s not fully covering a voxel) and red (fully covered).

Since fractional coverage in each voxel needs to be determined with high accuracy to ensure the voxelized 3D grid represents as much of the original 3D object properly, NVIDIA came up with a hardware feature called "Conservative Raster" where a pixel is considered covered even if any part of the pixel footprint is covered by the object - and it doesn't have to cover pixel center to determine the coverage sample.

 

Step 2: Light Injection

This stage calculates the amount of direct light reflected by the voxels' physical geometry by factoring in a material's opacity, emissive and reflective properties.

Different light sources striking on various materials will result in differing level of reflected light.

Different light sources striking on various materials will result in differing level of reflected light.

For example, the left material is solid, while that on the right is a mirror. Further to that are the differing light sources that also affect the amount of reflected light and even color.

For example, the left material is solid, while that on the right is a mirror. Further to that are the differing light sources that also affect the amount of reflected light and even color.

In this stage, there's a need to analyze the same scene from several view points such as each face of the voxel cube and different light sources to determine coverage and lighting levels for each voxel. Known as multi-projection, NVIDIA added a hardware feature called "Viewport Multicast" to reduce geometry shader overheads and to speed up multi-projection.

In this example, the direct light source is indicated by the yellow dot, which causes light to strike the white walls and some surface of the red/green boxes. Each surface will then reflect a certain amount of light based on their color and material properties.

In this example, the direct light source is indicated by the yellow dot, which causes light to strike the white walls and some surface of the red/green boxes. Each surface will then reflect a certain amount of light based on their color and material properties.

 

Step 3: Final Gather

The amount of indirect light gathered in this scene after VXGI based computation.

The amount of indirect light gathered in this scene after VXGI based computation.

The last stage is to rasterize the scene with the final and more accurate voxel data structure that can be used in its lighting calculations along with other structures such as shadow maps and more.

VXGI approaches the final calculation of indirect lighting with cone tracing - an approximation of secondary rays used in traditionally more computationally intensive ray tracing method for a realistic approximate of global illumination.

This graphical representation of cone tracing captures the essence of reducing the complexity of secondary rays and its related calculation from traditional ray tracing techniques.

This graphical representation of cone tracing captures the essence of reducing the complexity of secondary rays and its related calculation from traditional ray tracing techniques.

The intensity in which real-time reflections are calculated on a glossy curved surface is most punishing when using traditional ray tracing as hundreds of thousands of scattered secondary rays need to be computed for each ray that bounces off the surface. Cone tracing replaces all of that with a handful of voxel cones traced through the voxel grid.

The same approach can be used with fewer cones for specular lighting too. The algorithm used is scalable based on the scene at hand, be it whether image quality or performance takes precedence. As such, cone tracing enables global illumination to be computed at high frame rates in real time to render glossy, metallic, curved surfaces and much more.

A variety of voxel cones can be used to help reproduce differing forms of diffuse and specular lighting.

A variety of voxel cones can be used to help reproduce differing forms of diffuse and specular lighting.

 

The result and debunking a myth

According to NVIDIA, due to these added functions via VXGI, they've seen a 3x speed-up on the voxelizaion process on a popular global illumination test scene with the GeForce GTX 980 (as opposed to disabling these features on the same hardware).

To put VXGI into an actual use case scenario and the perfect demonstration of its capability, NVIDIA engineers attempted to digitally recreate a scene from Apollo 11's moon landing mission and to answer why and how certain traits were seen on the photograph taken on the moon - the very same photograph that was subjected to a number of conspiracy theories of how the entire moon landing might have been staged.

Based on schematics of the Moon Lander, the photograph and the knowledge of the materials and their properties that need to be considered for modelling the scene virtually as it was 45 years ago on the day of the landing, including the sun's position, moon's atmosphere and much more, NVIDIA has successfully recreated this faithful scene using VXGI that's accelerated on the second generation Maxwell architecture. The NVIDIA demo team used the Unreal Engine 4 game engine to build this scene with real-time global illumination as the point of view changes when you zoom in and out, rotate around and about the Lander at your will.

This is the scene that was faithfully recreated by NVIDIA's engineers with full details on all materials, attributes and lighting information.

This is the scene that was faithfully recreated by NVIDIA's engineers with full details on all materials, attributes and lighting information.

To prove that it's not just a pretty photo, here's a look at the scene's voxel data.

To prove that it's not just a pretty photo, here's a look at the scene's voxel data.

Two of the most discussed and debated aspects from this scene are a bright spot of light near Buzz Aldrin and how well lit he seemed to be, as well as the photo not showing any stars in the sky. From the demo team's recreation, both aspects were conclusively debunked. It was found that Neil Armstrong's suit was reflecting quite a bit of light that further helped illuminate Aldrin as he was getting off the ladder. Meanwhile, star-less sky was easily accounted from the camera exposure used to capture what's taking place on the moon's surface; the demo team digitally manipulated the exposure setting for the scene and actually found the stars!

If you would like to hear more technical details and experience how it was demonstrated to us, check out this video clip of Tony Tamasi, NVIDIA's senior vice president of content and technology, as he explains to the tech media why VXGI and debunking the Apollo 11 landing myths are both major accomplishments:-

//www.youtube.com/embed/KoERWykUhSU

With that, NVIDIA has successfully answered a number of anomalies seen on the famous photograph and to answer why certain reflected light sources appear as they were captured. In short, it's a true leap in real-time global illumination that's handled on a single GPU effortlessly.

More information and the impetus for this project from the demo team can be found here.

 

VXGI support and the reality

While all this certainly sounds good in theory, it will be sometime to come before leading games start to tap upon the benefits of VXGI. First and foremost, game engines have to be designed with this in mind and there isn't an engine that currently supports it out of the box. NVIDIA is however closely working with major game engine developers to add this support to help progress the next stage in game realism.

Unreal Engine 4 is closest to having this support as both EPIC and NVIDIA are working together to create a variant of the UE4 engine that supports VXGI. This doesn't necessarily mean all future UE4 based titles will have VXGI baked in as it depends which fork is being implemented by the game developer. Obviously, the VXGI enabled edition seems better for all parties involved, but it will require newer gaming hardware like the GeForce GTX 980 to realize its usefulness as the second generation Maxwell architecture has hardware acceleration for it.

VR Direct - A suite of VR improvements

Yet another forward looking feature that the second generation Maxell architecture will incorporate is VR Direct to enhance virtual reality gaming. Obviously the Oculus Rift has been making waves in the tech industry and we too have handled it in several demos that reinforce its immersive nature will eventually propel it into a must-have hardware gear.

VR gaming is however still at its infancy - far from even niche adoption. While we await its impending release in a big way, one area that has yet to be ironed out conclusively is latency. Any perceived delays can severely disrupt the VR experience, such as a mismatch in head movement and what the game displays on your head unit. To overcome motion sickness and other related VR-related concerns like headaches, latency must be minimized throughout the usage experience and processing throughput.

NVIDIA estimates that the standard VR pipeline from when you move your head to when it translates to a respective screen action on our VR display is about 50 seconds. To achieve the goal of reducing latency from head tracking to provide a more real-time VR experience, NVIDIA has improved on the following:-

- Improved on the interconnection between the game the GPU, thus shaving off 10ms off the standard VR pipeline.

- Next, NVIDIA banked on MFAA to further reduce GPU related latency to deliver the preferred level of performance. Since there's less overhead and effort required to attain the quality of the game experience with MSAA, MFAA has liberated the GPU by an additional 4ms.

By improving the software communication layer and building upon GPU efficiency to deliver speedier performance, NVIDIA has helped to reduce latency notably.

By improving the software communication layer and building upon GPU efficiency to deliver speedier performance, NVIDIA has helped to reduce latency notably.

But that's not all. NVIDIA is currently working on technique called Asynchronous Warp to throws out the need for the GPU to re-render each frame from scratch. Asynchronous Warp used the last rendered image and updates it based on the latest head position communicated to it by the VR sensor. This allows you a more immersive VR gaming experience as there's less discontinuity between head movement and action on the screen through vastly reducing latency traditionally encountered.

Asynchronous Warp negates the need for certain processes that saps up time and instead passes on the time savings through a more immersive VR gaming experience with less discontinuity between head movement and on-screen action.

Asynchronous Warp negates the need for certain processes that saps up time and instead passes on the time savings through a more immersive VR gaming experience with less discontinuity between head movement and on-screen action.

Yet another enhancement to VR gaming is how GPUs handle the multi-GPU configuration. Traditionally SLI relied on alternate frame based rendering where each GPU handled odd or even frames respectively. To further lower latencies and deliver better performance, VR SLI will rely on each GPU to dedicatedly render per display (in essence the display for each eye).

VR will also work hand-in-hand with DSR to boost gaming quality on existing VR displays. As such VR DSR is a sub feature for the VR Direct suit of improvements.

That about sums up our last key feature of the second generation Maxwell architecture of the GM204 chip. To reiterate, both VXGI and VR Direct are forward looking features that aren't yet implemented in existing games for us to review these aspects. Meanwhile, DSR and MFAA work on any game provided that the GeForce Experience software layer is updated to its latest version to ensure its database of games and visual settings recommendations are sound for your hardware. Next, you'll want to know how the best of the newcomers perform, so check out our revised performance review of the GeForce GTX 980 graphics card.

 

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.

Share this article