Since its announcement in October 2010 at NVIDIA's GPU Technology Conference (GTC), Kepler, the successor to Fermi, has kept us waiting in bated breath for its arrival. Today, 22nd March 2012, marks its official launch and marks the first salvo of retaliation by NVIDIA to AMD's Southern Island series of GPUs. Like AMD's latest GPUs, Kepler is based on the 28nm fabrication process and is currently touted to be the fastest and most power efficient GPU. AMD had the first-to-market advantage and the card manufacturers wasted no time in getting their wares based on the Radeon HD 7900 GPU series and Radeon HD 7700 GPU series to their different target markets. We feel that NVIDIA has some catching up to do in this race as it reveals its first manifestation of Kepler in the form of the GeForce GTX 680.
The GeForce GTX 680 is positioned to be the most power efficient flagship GPU, claiming this title from its older GeForce GTX 580 when it too touted similar claims in its heyday. According to technical information gleaned during our visit to NVIDIA GeForce Editor's Day 2012, the architecture of Kepler is actually positioned as an improvement from the GF114 core of GeForce GTX 560 as the GTX 560 is the latest from NVIDIA's GeForce 500 series GPUs. Using that GPU architecture as the base, NVIDIA designed the GK104 core that makes up the GeForce GTX 680.
Now for the techies, some of you might have long guessed from the core code that this the GTX 680 isn't the true flagship and there is some truth to this looking at the rumors on the grapevine and from our close discussions from NVIDIA. However with no official word and the fact that the GeForce GTX 680 can is positioned to compete with the top Radeon HD 7900 series single GPU graphics card, for all intents and purposes, the GeForce GTX 680 is their current flagship. There are three guiding principles in the development of the Kepler architecture and they are summed up neatly in a triplet of adjectives: Faster, Smoother and Richer.
One of the most important hardware feature of the Kepler GPU is its four new Graphics Processing Clusters (GPC). This term was first used in Fermi's architecture which also featured four GPC units. A GPC is basically an independent block of processing engines that can exist on its own as it contains all the necessary processing stages within its cluster. While the number of GPC units haven't changed in all these years, the hardware and configuration within each GPC has changed drastically. Within each GPC of the GeForce GTX 680 resides the next generation Streaming Multiprocessor (SMX).
According to NIVIDA, SMX promises to deliver more performance, due to its higher number of CUDA cores, than the Fermi architecture and it will deliver the stipulated level of performance with less power; in fact, it is touted to be twice as efficient as the previous generation SM. Upon scrutinizing the block diagrams of both SM and SMX, we feel that the Kepler architecture is evolutionary when compared against the Fermi architecture. Both architecture still sport the same CUDA core, the basic processing block; albeit SMX has 6 times more shader cores. This progression is only natural considering that fact that Kepler's manufacturing process has improved from Fermi's 40nm to the current 28nm.
Other improvements we have noticed are the new PCIe 3.0 bus interface as well as a new version of its Polymorph Engine which promises double the tesselation performance. NVIDIA also states that the Kepler architecture features a new improved memory subsystem. While it's total memory bandwidth is still identical to that of the GeForce GTX 580 at 192.3GB/s, NVIDIA has paired the GPU with the fastest graphics memory ever that operate at 6GHz. With much speedier memory, its bandwidth is limited only by its reduced 256-bit memory interface (down from 384 bits wide on the GTX 580). According to NVIDIA GPU managers we spoke to, any potential increment to Kepler's memory interface will entail in a corresponding die size increment. At that juncture, their die size target limited Kepler's memory interface to 256 bits wide, while more importance was shifted to the GPU's overall processing prowess with a massive increase in CUDA processing cores - the GTX 680 has 1536 stream processors as compared to 512 on its predecessor. While NVIDIA maintains that its memory bandwidth is sufficient for the GPU, we'll soon find out from our testing if it holds true.
In any case, NVIDIA has also engineered new anti-aliasing techniques to massively reduce memory footprint and processing requirements with FXAA and the newer TXAA sampling techniques (we'll touch on these more later). This is also another reason why NVIDIA is confident introducing a new flagship with the same memory bandwidth, but has partially offset its limitation with much speedier memory.