Feature Articles

NVIDIA GeForce 8800 GTX / GTS (G80) - The World's First DX10 GPU

By Vijay Anand - 9 Nov 2006

NVIDIA's Unified Architecture - Part 2

NVIDIA's Unified Architecture - Part 2

So how does this unified stream processor array compare against existing GPUs? Current GPUs are based on vector processing units and this is such because many graphics operations occur with vector data. Even so, scalar computations are bound to be present even with vector-optimized code. However after much shader code analysis during the design of the GeForce 8800 GPU, NVIDIA has found that scalar operations are increasingly far more apparent in many of the modern lengthy shader programs. This finding has prompted them to equip the GeForce 8800 GPU with 128 scalar processors instead of 32 4-component vector processors used on its previous high-end GPUs. Also notable is that scalar computations are more difficult to execute on a vector pipeline, but not so for vector-based shader codes which are converted to scalar operations for the GeForce 8800 GPU's compatibility. For knowledge sake, a single instruction issued to a vector processor can operate on multiple data elements concurrently, while scalar processors can process one data item at a time (hence the reduced processor count on older GPU designs versus the increased quantity on the GeForce 8800). Due to the current nature of shader program code, the change in shader processor type and the high efficiency of shader processors on the GeForce 8800, NVIDIA claims that it can deliver up to twice the performance gains on existing DirectX 9 applications. A bold claim indeed and we'll see to what extent was that achievable.

The entire array of 128 unified stream processors are grouped in a collection of 16 stream processors with each having a set of Texture Addressing units, Texture Filtering units (also known as texture mapping units or TMU for short) and its associated cache units to function independently. Here's a close-up of one of these groups:-

A close-up of a functioning group of steam processors and its associated supporting units.

Eight such groups make up the flagship GeForce 8800 GTX graphics card while a toned down version with six such groups active in the G80 core corresponds to the GeForce 8800 GTS. There are other features that differentiate them, but not to worry as we've a proper comparison page installed later in the article. To manage all of these stream processors and ensure they are all optimally used, NVIDIA has incorporated a thread manager of sort, called GigaThread Technology that supports thousands of executing threads in flight, much like ATI's Ultra-Treading Dispatch Processor in the Radeon X1K series.

Besides better utilization of shader processors, elevating early performance limitations from vertex or pixel shader bound application bottlenecks, another reason why the unified architecture route was developed is to perform more than just the standard roles of what conventional vertex and pixel shaders did. As mentioned above, the stream processors used for the unified shader architecture on the GeForce 8800 take on a more generalized floating-point processing role with a generalized feature set, they can tackle other data processing functions as well. That and the fact the unified shader model breaks away from the traditional linear pipeline stages (vertex --> triangle setup --> pixel --> ROP --> Memory), it allows the unified shader processors to perform multiple shader operations on a data set (e.g. vertices) whose results of the initial processing pass are passed back to the top of the shader cores to be dispatched again, processed again (maybe for pixel shading this time), looped to the top again and the cycle continues till the processed data is ready to enter the raster operations stage (ROP) and then finally be passed on to the frame buffer for scan out. Because of this processing model's flexibility, the GeForce 8800's unified architecture can dynamically allocate processing power to crunch vertex, pixel, geometry, physics and even other forms of workloads in future, all at the same time running in parallel.

The unified architecture conceptual schematic.

Geometry and Physics. These are the newly defined workloads that are made possible by the 128 streaming processors on the GeForce 8800 GTX working in conjunction with GigaThread technology. Geometry processing capability is required of the Shader Model 4.0 standard and we'll soon touch on that in our overview of the DirectX 10 standard. Not much is known yet of the physics processing capability, but that's a bonus of the unified architecture which NVIDIA fondly coined it as their Quantum Effect Technology. How this would be put to use and under what API has not been discussed in detail yet, but you can rest assure that with both NVIDIA and ATI adopting a generalized array of shader processors to do their bidding, we believe it would be much more palatable for game developers to engineer games for the next level of interactive and visually pleasing environmental effects than building games supporting the Ageia physics processor.

The stream processors on the GeForce 8800 series are driven by their own high-speed clock, which differs from what the core clock that drives the rest of the GPU. The GeForce 8800 GTX has a core clock of 575MHz while its stream processors operate at 1.35GHz - yet another reason why NVIDIA claims high shader throughout, benefiting the overall performance of the card. The lesser GeForce 8800 GTS version tops has clocks of 500/1200MHz respectively.

Join HWZ's Telegram channel here and catch all the latest tech news!
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.