Product Listing

ATI Radeon HD 2900 XT 512MB (R600)

By Vincent Chang - 14 May 2007

Architectural Enhancements Part 1

Second Generation Unified Shader Architecture

As ATI would happily inform you, the R600 is the company's second attempt at creating an unified shader architecture. The first of course is the custom Xenos GPU for Microsoft's Xbox 360. Unsurprisingly, the new graphics cards have benefited from ATI's previous experiences at creating such architectures and inherit elements from both. Although it may be inaccurate to say so, you could probably think of the R600 architecture as a 'spiritual' cross between the Xenos and the Radeon X1000.

An architectural overview of the R600 core.


Like the GPU in the Xbox 360, there is no longer any delineation between pixel and vertex shaders in the R600. Instead, there is a new setup engine with three separate functions to categorize the instructions according to its nature (vertex, geometry and pixel). These instructions are then pushed to the new ultra threaded dispatch processor (found in some form in both the Xenos and the Radeon X1000), which will handle all the incoming instruction threads and decides which ones to execute first through internal arbiter units. Finally, the chosen instructions will be sent to the SIMD array of stream processing units to be worked on. Enhancements to increase efficiency here include dedicated shader caches to allow for infinite shader length and dedicated arbiter units for texture and vertex fetches. Threads are also suspended if the required data has not been forthcoming, with the priority going to other threads till the needed data arrives. This means that the SIMD arrays are kept busy as much as possible.

So what's in one of these SIMD array? Well, you will find numerous stream processing units, which are grouped in fives to form multiple 5-way superscalar shader processors. Therefore, while ATI states that there are a total of 320 stream processing units in the Radeon HD 2900 XT, they are actually 'counting' differently from NVIDIA's definition of stream processors. If you go by NVIDIA's accounting, there are only 64 shader processors (versus 96 and 128 on the 8800 GTS and GTX respectively). This is because the SIMD array itself basically follows a VLIW (Very Long Instruction Word) design, hence all five operations (excluding the flow control operation) will be performed on the data in parallel by each shader processor - this is of course only possible in the most ideal situation. Thus if the processing load is ideal and the dispatch processor and branch execution unit are able to cleverly load all the processing units of all the shader processors, only then ATI's 320 stream processing units have a huge distinct advantage. This is the reason why ATI's seemingly more powerful configuration doesn't actually surpass the performance of the GeForce 8800 GTX. Dwelling in a little more detail, each of ATI's shader processor's group of five stream processing units is supported by general-purpose registers and branch execution units that handle flow control and conditional logic so that the stream processors don't need to handle these tasks which would otherwise add to the processing's overhead. Again, the idea is to keep them working constantly on the important stuff.


Image Quality Enhancements

We all still go gaga over the special effects found in big budget Hollywood movies that require massive rendering farms to make us believe the illusion. Desktop graphics may not have reached that stage yet but ATI has sought to bring us a step closer by including dedicated and programmable tessellation hardware onboard the R600 core, developed from the Xbox 360. As they say, the devil is in the details and ATI's tessellation hardware helps to generate these details from more basic information. Hence, developers can rely on the GPU to recreate more detailed animation and terrain effects from originally primitive data. This not only takes a sizable portion of the workload away from the CPU but it is also faster by many magnitudes. And it probably allows developers to focus on the gameplay and plot rather than eye candy. Of course, much of this depends on whether developers will warm to it though it does seem as if both gamers and developers will benefit from having such hardware. Thus this is a feature that's unfortunately of no use today, but holds a lot of promise in the near future if embraced. Perhaps we may even see indie or budget games featuring such detailed eye candy in the future.

From the example provided by ATI, a developer just needs to provide the coarse data in the first image and rely on the GPU to do the tessellation and displacement to add detail to the image.

Next, we also see a number of improvements in the texture units found on the R600, which we will briefly mention here. HDR textures can now be filtered bilinearly at up to seven times faster than the Radeon X1000 series, while ATI also claims improvements in high quality anisotropic filtering. Textures are now supported up to extremely high resolutions of 8192 x 8192, though we haven't really seen how that could translate into concrete applications just yet. ATI has also implemented a Vertex Texture Fetch where previously, in the Radeon X1000 series, it had offered an alternative method - Render to Vertex Buffer - that isn't strictly Shader Model 3.0 compliant. The new hardware brings it on par with NVIDIA and removes any lingering doubts over its support for the current Shader Model 4.0 standard.

Finally, there has also been some work done on anti-aliasing (AA), with two new modes introduced, namely 8x Multi-sample AA and 24x Custom Filter AA. Of the two, the newer and more interesting mode is the 24x Custom Filter AA. Basically this mode extends the usual 'box' method of doing AA, where a given image is divided into many rectangular grids and samples are taken from each box. Custom Filter AA increases the 'coverage' area of the sampling, by taking samples from a larger circular area than the grids. This approach means that there are more sampling points taken and together with the overlaps, it should provide a smoother rendered image. There are a few custom filter modes, like Narrow and Wide Tent, with slightly different coverage areas for each. The user can select these modes from the new Catalyst Control Center for the Radeon HD 2000 series. Additionally, ATI has implemented an adaptive edge detection filter that will run an edge detection pass over the rendered image and determine where the edges on an image are present. For these edges, extra samples will be taken with higher quality filters (e.g. 16x Wide Tent CFAA), while other non-edge areas can still utilize the usual box filter with fewer samples. Instinctively, this makes sense as the whole idea of AA is to 'reduce jaggies'. This balanced way of handling edges should also ensure that performance will not take a backseat to image quality and users will not encounter a massive drop in framerates on enabling higher quality AA modes.

ATI explains how CFAA works using this simple illustration of how it 'covers' and overlaps a larger area than the typical box filter (which samples only from within each grid) for greater accuracy.

Join HWZ's Telegram channel here and catch all the latest tech news!
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.