How NVIDIA's Tegra K1 Brings PC Gaming and GPGPU to Your Mobile Device


CPU and GPU Architectural Details

CPU Choices - 32-bit and 64-bit ARM Core Options

Yes, you’ve read correct. The Tegra K1 is available in both 32-bit and 64-bit ARM processor variants.

Building on NVIDIA's under-the-radar Project Denver, NVIDIA has managed to cram a dual-core Denver 64-bit CPU within the Tegra K1 that's pin compatible to the regular quad-core Cortex A15 CPUs with the fifth power-saver core.

Project Denver is a full custom 64-bit ARM CPU that's based on an ARMv8-A 64-bit processor architecture. It's designed for high performance single-thread and multi-threaded performance with all the benefits of an ARM processor architecture for devices beyond the handheld segment. Made known in 2011, little else has been mentioned about it other than the next-generation Parker Tegra processor was to be the first to feature the Denver CPU. Well, it looks like Logan beat Parker to debut the 64-bit ARM processor and it's a good thing considering Apple's A7 64-bit processor powering the flagship products and that Qualcomm will also debut 64-bit processors this year; albeit not flagship models - yet.

The 64-bit variant can be expected to be in devices in the second half of this year while the 32-bit Tegra K1 will be in devices launching first half of 2014.

32-bit Tegra K1 Still a 4-Plus-1 Processor Model

So is there anything notable about the 'regular' Tegra K1? For one, it's designed for smartphones and tablets in mind, so it's quite likely that the 32-bit Tegra K1 is what will matter most for most consumers in 2014.

While the same Cortex A15 quad-core CPUs of the Tegra 4 make an appearance again on the Tegra K1, they are not identical. On Tegra 4, those use the Coretx A15 "R2" release; the Tegra K1 uses "R3" editions of the core that feature micro-architectural power reductions. As such, it's not a performance boosted edition, but efficiencies that allow it to scale better. Even the power/battery-saver fifth core is also using a Corex A15 core, but it uses a different power profile for operation that's different from the other four main cores.

Further to that, the Tegra K1 will use the 28nm high performance mobile (HPM) process technology that fit the needs of high speed and low leakage power. In essence, boasting the best of HP and LP process technologies. Tegra 4 was built with built on the 28nm HPL process technology. For more reading of the process technologies involved, TSMC has a round-up of their 28nm process family.

The maximum clock rate for the four main cores is 2.3GHz -regardless if only one of them, or all of them are in operation. The companion core has a rated operating frequency of up to 1GHz. Memory interface is 64-bit wide with DDR3-1866MHz memory the likely preferred choice of implementation, but this could change to DDR3-2133MHz by late in the year as better availability of low power high speed memory and improved price points kick in over time.

When we asked NVIDIA on power consumption figures, they could only share the design performance portfolio for Tegra K1 products.

  • Smartphone - Under 2W
  • 7-inch Tablets - 3 to 3.5W
  • 9-inch Tablets - 4 to 4.5W

Speaking about possible platforms where Tegra K1 might be seen first, the tablet space seems likely as NVIDIA has managed to design the Tegra K1 to successfully operate within their Tegra Note 7 chassis that was first created for OEMs to roll out Tegra 4 based reference tablets.

 

GPU Architecture - A True Kepler

We cannot stress this point enough that the Tegra K1 has a true Kepler GPU. As the slide above shows, all the major GPU processing blocks are taken directly from the Kepler, which first debuted with the GeForce GTX 680. Since the Tegra K1 is a mobility handheld oriented device, the GPU features only one GPC with one SMX cluster - taken directly from Kepler. The basic SMX cluster from any Kepler based GPU has 192 CUDA cores and that’s exactly what the Tegra K1 is endowed with. A minor change on the SMX unit in the Tegra K1 is that it has only eight texture filtering units as opposed to 16 on a traditional GPU's SMX unit.

Considering the Tegra K1's GPU specs, it resembles closest to a GeForce GT 630 using the GK107 core that might have been marketed in emerging markets and OEM channels. No doubt entry-level in nature, it has all the features it needs to run this tessellation demo that was first used to showcase a GeForce GTX 480!

More than ever, tessellation now plays an important role to minimize the size and complexity of terrain, world environment and objects by using a combination of tessellation and displacement maps to create the level of detail required with reference to viewpoint and background/foreground objects. It's possible to do this on the CPU, but it's far less efficient than running it off a GPU. The Tegra K1's modern Kepler graphics core lends its support in this regard to give gamers a far more realistic game experience with high levels of tessellation. Indirectly, this also means better power optimization and improved battery life since the GPU is more adept at handling this kind of workloads.

Other optimizations put in place in the GPU for the Tegra K1 are hierarchical on-chip Z cull as opposed to an off-chip algorithm used in original Kepler, new texture compression support such as DXT, ETC, ASTC, etc., as these formats commonly exist only in the mobile environments due to the different kind of graphics rendering engines used in software to accommodate the different set of mobile processing hardware as opposed to the desktop space. As a whole, the implementation focussed on how different sub-components communicate with the different fabric or environment present on the Tegra processor as opposed to its traditional desktop/notebook add-on plug-in.

Texture and color compression support is available at the UI level and this gives further savings to the Tegra K1 device in terms of bandwidth and power - a feature that was unavailable on Tegra 4.