NVIDIA GeForce 6200 with TurboCache

With the new TurboCache technology, NVIDIA has lined up two interesting entry-level graphics cards based on a modified GeForce 6200 GPU. Dive into this article and find out the workings of TurboCache, its benefits and the impact it made.

By Vijay Anand - 16 Dec 2004

Leveraging The PCI Express x16 Interface

The current crop of PCI Express (PCIe) graphics cards are not too different from their AGP counterparts with the exception that they have been redesigned to communicate with the new PCIe x16 interface. The immediate gain achieved is the vast bandwidth that the PCIe x16 slot offers and of course faster transfers since PCIe operates at higher base frequencies than the AGP 8x slot. However, they aren't quite yet designed to take advantage of PCIe's advantages. The PCIe bus also allows concurrent bidirectional communication to and from the host controller with equal bandwidth. AGP too allows bidirectional communication, but that's at half duplex mode (unlike PCIe's full duplex mode) and its upstream bandwidth is insignificant compared to the AGP 8x downstream. For those of you who don't quite recall what full and half-duplex represent, the former allows data transfer in both directions simultaneously while the latter restricts data transfer at any particular point of time to a single direction. Here's a table to easily differentiate the key advantages:-

Graphics Card Interface	Total Bandwidth	Bidirectional Transfers ?	Full Duplex Mode Transfers?	Max. Downstream Transfer Rate	Max. Upstream Transfer Rate
PCI Express x16	8.0GB/s	Yes	Yes	4.0GB/s	4.0GB/s
AGP 8x	2.1GB/s	Yes	No	2.1GB/s	266MB/s

In reality, current PCI Express graphics cards only perform limited upstream transfers because existing games and applications are designed based on the days of the PCI and AGP architecture. Back in June during Computex 2004, demos were being run to showcase the bidirectional bandwidth advantage of the PCI Express bus in the realms of high definition (HD) video capture, editing and playback of multiple HD video streams. Although this was something to look forward on the average desktop computer, it requires maturity of both hardware and software to realize this and at this point of time, it's not quite ready for mass usage yet. Besides, this would only benefit a select user group dabbling in high quality video.

Since the introduction of the first PCI Express graphics cards, both ATI and NVIDIA have been trying to put all that extra bandwidth into better use, such as going back to the early AGP days where its DIrect Memory Execute (DIME) feature allowed AGP Texturing. This allowed the use of the system memory to store large textures to be read by the AGP graphics card. Practical implementation of AGP Texturing in gaming however, didn't quite turn out as useful as was envisioned and graphics cards steadily bumped up local frame buffer memory to prevent the need to use that feature. However that's all in the past and with the new PCIe x16 interface, things are looking up as we venture into the first such useful implementation by NVIDIA that is now dubbed the TurboCache technology and it does a lot more than AGP's DIME.

Evolution Of TurboCache

Not more than two months ago, the GeForce 6200 was unveiled as NVIDIA's entry level GPU for the GeForce 6 series. It had four pixel pipelines and three vertex shader units. In order to bring the performance expectation level into perspective, a midrange GeForce 6600 series has eight pixel pipelines and three vertex shaders. Most of the important features such as the CineX 3.0 engine, UltraShadow II and many more from GeForce 6600 and GeForce 6800 series are present, but a few did go under the axe such as color and Z compression. FSAA performance does takes a notable dive, but the GeForce 6200 GPU was designed for the average mainstream users who do not use anti-aliasing, so it's not a big loss. Additionally, NVIDIA High-Precision Dynamic-Range (HPDR) Technology is also omitted and again it is to save on the die size since the entry-level segment won't really be utilizing such features, much less the hardware being able to handle it at capable speeds. Typical memory configuration would be 128MB of local memory on a128-bit memory interface.

With an average SRP of US$140, it was seated somewhat closer to the midrange and wasn't priced low enough to be a true entry-level product. Today, two more variants join the GeForce 6200 lineup and these are the GeForce 6200 with TurboCache with 16MB (32-bit) and 32MB (64-bit) memory at US$99 and US$129 respectively.

This is the GeForce 6200 with TurboCache (16MB local memory). Besides passive cooling that is ideal for quiet media center PCs and other slim system integration, the graphics card offers the standard TV-output, analog DB-15 VGA connector and DVI-I connector.

As the rear of the card shows, the 16MB version only requires one DDR chip in front and the rear is mostly bare. The 32MB version uses two memory chips with the second chip located at the rear of the card.

Removing the heatsink, the GeForce 6200 GPU is unveiled.

The memory is a Samsung 128Mbit x32 device clocked at 350MHz or also commonly referred as 700MHz in DDR terminology.

How TurboCache Works And The Benefits

The TurboCache technology was engineered such that graphics cards need only be equipped with a tiny amount of local memory (such as the two new cards) while further memory is borrowed when it is required from the system's main memory to fulfill its complete frame buffer needs. The beauty of this technology is that it sees both the graphics card's local memory and the system memory as a single frame buffer. For now, TurboCache is designed for a combined frame buffer size of 128MB and in the case of the new GeForce 6200 models with TurboCache, they only require a maximum of 96MB to 112MB of your main system memory. It's sort of like Intel's Dynamic Video Memory Technology (DVMT), except that your system properties will always report the system's full installed memory size and the graphics adapter will always report a full frame buffer size of 128MB regardless of the fact that it may not even be using any more than the card's local memory when running in the Windows desktop.

The fact of the matter is that system memory is dynamically allocated to the graphics adaptor when required and de-allocated to the system as soon as it is not needed. To manage this, a new TurboCache Manager (TCM) seats at the software (driver) level and works seamlessly with the new memory management unit (MMU) built into the GeForce 6200 with TurboCache.

Besides efficient texturing from system memory, another very crucial aspect of NVIDIA's TurboCache is that it enables rendering to/from the system memory directly. The PCIe x16's large upstream bandwidth (which is equivalent to the downstream of 4.0GB/s) is definitely an enabler of this feature, but with NVIDIA's patented hardware and software technology, it is now a reality. Essentially, this enables almost all the data and buffers (back, Z, shadow, glow, blur etc.) to be dynamically located either on local memory or the system memory with exception of the front buffer that consists of the data to be refreshed to screen. So with the ability to perform both read and write seamlessly and simultaneously to the system and local memory, the TurboCache technology has effectively extended the graphics card's frame buffer and saves both graphics memory cost and power consumption. The use of TurboCache based GPUs in future notebooks was strongly highlighted by NVIDIA and it is not difficult to see why as every watt saved goes a long way in extending the battery life (not to mention space, complexity and cost of the graphics module).

GPU Optimizations For Optimal TurboCache Performance

Now, if you stop to consider the locality of the graphics card's local memory in contrast to a system's main memory, the most obvious problem despite large bandwidths would be latency issues. Already latency issues have shown that the initial batch of DDR2-400/533 system memory is slower than stock DDR400 memory modules, but the differences we are dealing with here are even larger. A graphics card's local memory offers instant access while main memory operates at much higher latencies, situated much further away and requires intervention from another memory controller (be it in the chipset or the CPU in the case of AMD64 platform).

A regular graphics processor is architected to access its local frame buffer that responds almost immediately, so simply crippling such a GPU with limited local memory and expecting the PCIe bus and system memory to cater to its needs will simply starve the GPU while it waits for data to work upon. Fundamentally, you got to go back to the basics of pipelining and scheduling and come out with a design based on the known data load parameters, the wait states involved, external influences and etc. For example, longer pipelines and deeper buffers can help hide latencies to a certain extent (though that has its own limitations and consequences). NVIDIA was well aware of these issues and modified the initial GeForce 6200 architecture to suite the new GeForce 6200 with TurboCache technology along with amendments to the pipeline execution units and the memory management units.

TurboCache technology is also aware of your current installed system memory and it will only support a combined 128MB frame buffer if your system has 512MB of main memory or more. Any less and it will reduce its total frame buffer size proportionately.

This the typical 3D pipeline of the original GeForce 6200 GPU.

This is the same 3D pipeline of the GeForce 6200 with TurboCache. The yellow portions are where NVIDIA made architectural changes, namely in the pipelines, the memory management units and the raster operations (ROP) units.

Further Information On GeForce 6200 With TurboCache

The two GeForce 6200 models with TurboCache are being marketed as supporting 128MB of memory but are differentiated by their total memory bandwidth. Remember the first model with 16MB of local memory? It uses a single DDR memory chip of 128Mbit density and uses a 32-bit input/output configuration (x32). So although the GeForce 6200 with TurboCache GPU has a 64-bit memory interface, the use of a single 32-bit device limits its bandwidth. Operating at a 350MHz clock or effectively 700MHz in DDR terminology, it has a local memory bandwidth of 2.8GB/s. However, don't forget to take into account the PCIe x16 interface bandwidth and in the case of the TurboCache GPUs, they utilize both the upstream and downstream bandwidth that accounts for a full 8.0GB/s bandwidth. Hence according to NVIDIA, this GeForce 6200 with TurboCache has a total effective bandwidth of 10.8GB/s. Likewise, the GeForce 6200 with TurboCache having 32MB of local memory (which uses two 128Mbit x32 chips) has a total effective memory bandwidth of 13.6GB/s.

Of course the argument here is that the main memory has a different available bandwidth that varies from system to system due to different memory modules which can be anything in likes of DDR333 or the other extreme of DDR500/550 or DDR2-533/600/700. Besides that, there are other factors such as the main memory kept busy by the CPU and other requests that the main memory has to juggle in any normal system usage. Due to these complexities, NVIDIA has simply chosen to factor only the maximum data throughput possible on the PCIe x16 interface.

Using that as the basis of comparison for total memory bandwidth on a graphics card, non TurboCache graphics processors have 4.0GB/s less bandwidth on the PCIe x16 interface since they adopt the traditional graphics architecture - where only the downstream lanes were utilized while upstream transfers is sparsely used.

So here's how the various entry-level PCIe graphics cards stack up at this point of time:-

Graphics Cards	GPU clock	Local Memory DDR clock	Total Frame Buffer Size	Total Memory Bandwidth (including PCIe x16 interface)
NVIDIA GeForce 6200 with TurboCache (16MB - 32bit)	350MHz	700MHz DDR	128MB	10.8GB/s
NVIDIA GeForce 6200 with TurboCache (32MB - 64bit)	350MHz	700MHz DDR	128MB	13.6GB/s
NVIDIA GeForce 6200 128MB (128-bit)	300MHz	550MHz DDR	128MB	12.8GB/s
ATI RADEON X300 SE 128MB (64-bit)	325MHz	400MHz DDR	128MB	7.2GB/s
ATI RADEON X300 128MB (128-bit)	325MHz	400MHz DDR	128MB	10.4GB/s

As future games utilize more complex shader routines while taking advantage of the programmable nature of the DirectX 9 standard to render the final required output, more work is shifted to the GPU with reduced need for multiple frame buffers passes as required if DirectX 8 functions were used instead. Although DirectX 8 was the era of programmable shaders, its flexibility was very limited in contrast to DirectX 9. Therefore, with lesser frame buffer needs, we may one day even see TurboCache technology being utilized elsewhere beyond just entry-level graphics card solutions. Time will tell depending on how upcoming games will evolve and how they are programmed.

Better Motherboard Chipsets Equate Better Performance?

Well, although the title is self-explanatory, the 'issue' here, according to NVIDIA is that the Intel Grandsdale platform is only capable of 3.0GB/s downstream and 1.0GB/s on the PCIe x16 interface, while NVIDIA's nForce4 is able to deliver the full 8.0GB/s bandwidth. As a result, NVIDIA claims that TurboCache would have greater benefits on its platform than on the Intel Grandsdale. There is however no open information on Intel's technical datasheets of this limitation. We don't wish to speculate anything for the moment as the use of different CPUs and memory controller configurations alone would give rise to different performance results. (Ed. - Now where's NVIDIA's chipset for the LGA775?)

Test Setup

Our PCI Express graphics card testbed is configured with an Intel Pentium 4 3.2GHz Extreme Edition and 1GB of DDR2-533 memory operating in dual channel mode on an Intel D925XCV motherboard. The test system is installed with Microsoft Windows XP Professional (with Service Pack 2) and DirectX 9.0c.

We tested both versions of the new GeForce 6200 with TurboCache technology (16MB and 32MB local memory) using the new NVIDIA ForceWare 71.20 drivers. We've also included results based on the original GeForce 6200 with 128MB and this was tested with the ForceWare 66.81 driver set. For a mark of comparison with ATI's entry-level products, we have performance numbers from an ATI RADEON X300 SE 128MB, which was tested on the Catalyst 4.10 driver release. This would be ideal to gauge the GeForce 6200 with TurboCache (16MB local memory) as it is pitched in the same segment as the RADEON X300 SE.

For assessing these graphics cards, our test suite consisted of the following benchmarks and game tests:-

Futuremark 3DMark2001 SE Pro (version 330)
Futuremark 3DMark03 Pro (version 340)
Star Trek: Elite Force 2
Command & Conquer: Generals
Unreal Tournament 2003 Demo
Unreal Tournament 2004 Demo
Codecult's Codecreatures
Halo: Combat Evolved
AquaMark 3 benchmark
FarCry 1.1
Doom 3

Before we proceed further, here is a screenshot of the driver display properties for one of the cards:-

This is the new overall display properties page. As mentioned earlier in the review, the GeForce 6200 with TurboCache reports only the combined frame buffer size of 128MB.

Results - 3DMark2001 SE Pro (Build 330)

A quick glance at 3DMark2001 SE, it seems like the NVIDIA GeForce 6200 with TurboCache is delivering as expected. It would have been swell if we had the non-SE version of the RADEON X300 in this article, but we will soon be revisiting these entry level graphics cards in another article in proper. For now, having at least one direct competitor is a decent gauge of performance measure to assess if the TurboCache technology did live up to its expectations.

Inferring from the numbers obtained in this benchmark, these entry level PCIe graphics cards can handle DirectX 7 class of games at fairly high frame rates. More on this as we show you gaming performance from the Star Trek game.

Results - 3DMark03 Pro (Build 340, No FSAA)

3DMark03's harsh DirectX 8 and 9 routines are tough on the TurboCache enabled graphics cards and you can see that the difference between the GeForce 6200 with TurboCache (32MB local memory) and the normal GeForce 6200 with 128MB memory is much wider than in 3DMark2001 SE. However in comparison with the ATI counterpart, the 16MB version of the TurboCache graphics card is still performing to expectations or better.

Results - Star Trek: EF2 (OpenGL Benchmark)

As mentioned a few pages earlier, the NVIDIA GeForce 6200 TurboCache enabled graphics cards are performing rather well in this older OpenGL title. Although the game is based on the old Quake3 engine which has been heavily modified to support higher class of OpenGL extensions, it is among the most taxing game based on this class of engine. Having said that, the results do look even sweeter.

Results - C&C: Generals (Direct3D Benchmark)

In Command & Conquer: Generals, again the 16MB local memory TurboCache enabled graphics card outshined the ATI RADEON X300 SE 128MB. Of course we realize that GeForce 6200 series has better hardware specifications and that it operates at faster clock speeds than the RADEON X300 SE, but it does have a 'disadvantage' of not having the entire frame buffer on the card itself. Looking at the results, we recommend light gamers to target at least the 32MB local memory version of the GeForce 6200 with TurboCache for 3D real-time strategy games.

Results - Unreal Tournament 2003 (Direct3D Benchmark)

In Unreal Tournament 2003, it looks like ATI's RADEON X300 SE is having the upper hand on the GeForce 6200 with TurboCache (16MB local frame buffer). The performance of the 32MB local frame buffer version is rather surprising as it was able to keep up with the traditional 128MB GeForce 6200 graphics card. We'll see if these trends continue on the following page with the newer Unreal Tournament 2004.

Results - Unreal Tournament 2004 (Direct3D Benchmark)

In Unreal Tournament 2004, the sub-US$99 segment is hotly contested between the RADEON X300 SE and the GeForce 6200 with TurboCache (16MB local memory), but ATI's solution still edges it out in this test.

Comparing the GeForce 6200 with TurboCache (32MB local memory) and the normal GeForce 6200 128MB, the more demanding nature of this newer game saw the TurboCache version running slower at high resolutions.

Results - Codecreatures (Direct3D Benchmark)

Codecult's Codecreatures game engine is one of the few tests that required a 128MB of frame buffer to run. It would still operate on a 64MB AGP graphics card, but it will rely on AGP texturing for further texture lookups and that means a huge performance penalty. We thought this would be an excellent test for the efficiency of TurboCache's implementation and indeed the low-resolution results against the RADEON X300 SE is pretty self explanatory. The GeForce 6200 with TurboCache (16MB local memory) isn't really cut out for high-resolution gaming most of the time purely because of its limited local frame buffer size, hence the results obtained are quite expected. The 32MB version fared much better most of the time and it is sufficiently efficient considering it is using TurboCache technology to deliver its results.

Results - Halo: Combat Evolved & AquaMark3 (DirectX 9 Benchmarks)

The results from Halo are even more encouraging to say the least and the GeForce 6200 with TurboCache (32MB local memory) was even able to edge out the traditional 128MB GeForce 6200 graphics card.

Considering AquaMark's highly demanding nature, we were rather happy with the outcome of this synthetic test.

Results - FarCry 1.1 & Doom 3 (DirectX 9 Benchmarks)

While the GeForce 6200 with TurboCache (16MB local memory) outperformed the ATI RADEON X300 SE at 800x600, they switched positions at the next resolution but both had results unfit for gaming. The 32MB version of the GeForce 6200 TurboCache fared better, but generally this group of cards isn't very suitable for demanding titles such as FarCry.

The results in FarCry definitely seem a little lower than comfortable, but one of the reasons is due the high quality image options that we have used for testing. Performance will definitely take a turn for the better with medium or low quality settings at 1024x768, but we don't expect the positions to change.

Doom 3 is no better and in fact, reducing image quality hardly gives back all that much performance; the maximum gain is a mere two frames or less. The non-TurboCache version of the GeForce 6200 128MB graphics card performed rather well and it could be due to Doom 3's extreme reliance on shadow buffering that preferred swifter local memory access.

Conclusion

When we first received word of a technology that will do away with most of the hefty local frame buffer and opt to use system memory instead, we were rather skeptical even though the PCI Express bus is much more advantageous than the old AGP interface. However, having tested NVIDIA's new GeForce 6200 graphics cards with TurboCache technology, we are rather confident of this new implementation. TurboCache not only allows texturing from system memory efficiently, but more importantly it allowed rendering to and from the system memory, which was made possible by both NVIDIA's technology and the vast bidirectional bandwidth of the PCIe x16 interface. Yet another reason to like TurboCache technology is that it doesn't lock out a certain amount of your system memory for the graphics card's frame buffer. The TurboCache Manager dynamically allocates system memory when needed and de-allocates it back to the system when not in use. Of course, not to mention that that some re-engineering of the GPU was also required getting TurboCache running in its current state.

Overall, the results have shown that the GeForce 6200 with TurboCache delivered in their respective segments and in some rare cases, the 32MB local memory version performed very close and even outperformed the standard GeForce 6200 with 128MB of memory (in rare instances). Besides delivering decent performance, it also reduced costs and power consumption - two crucial aspects that will eventually see TurboCache Technology debut in notebooks in the near future. For the first successful implementation of the TurboCache technology, coupled with pretty decent performance, its benefits and most importantly putting the PCIe x16 interface into good practical use, we believe NVIDIA's TurboCache technology deserves to be one of the more innovative products of this year.

The NVIDIA GeForce 6200 with TurboCache Technology.

For the GeForce 6200 featuring TurboCache technology, our only concern is the quirky price points. Already, the standard GeForce 6200 128MB is available in the US$130 range; hence the GeForce 6200 with TurboCache (32MB local memory) with an SRP of US$129 doesn't really offer any incentive. NVIDIA has hinted that this US$129 part might eventually phase out the standard GeForce 6200 128MB graphics card and until that happens, the TurboCache version isn't really that appealing. On to the lower-end NVIDIA GeForce 6200 with TurboCache (16MB local memory), again its US$99 SRP is a little unsettling when equivalently performing ATI RADEON X300 SE graphics cards are retailing at sub-US$80.

At this point of time, we don't recommend setting your options with the TurboCache'd GeForce 6200 as it's important to consider the final retail products and their eventual price points (which could differ from what NVIDIA initially suggests). Till then, do keep a look out for developments on TurboCache and a similar technology from ATI in the near future.

**Updated on 18th December 2004**

NVIDIA has just updated us on their latest suggested pricing plans for the GeForce 6200 series with TurboCache technology and they are as follows:-

GeForce 6200 w/ TurboCache supporting 128MB, including 16MB of local TurboCache - US$79
GeForce 6200 w/ TurboCache supporting 128MB, including 32MB of local TurboCache - US$99
GeForce 6200 w/ TurboCache supporting 256MB, including 64MB of local TurboCache - US$129

This is certainly good news as we believe these new price points will enable the GeForce 6200 with TurboCache products to compete rather effectively with its competitor's respective offerings. We look forward to seeing the retail options soon for end-users consideration in their next PCIe enabled system.

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.