How NVIDIA's Tegra K1 Brings PC Gaming and GPGPU to Your Mobile Device

NVIDIA's new Tegra K1 boasts of graphical capabilities that have leapfrogged a few generations on the mobile front to match up with its desktop counterpart. Find out in this article what's exactly underneath the K1 and what else its feature set enables it to accomplish with ease.

By Vijay Anand - 6 Jan 2014

A Tegra with a Mean Graphics Core

Believe it or not, the NVIDIA Tegra K1 is a drastic change in direction for NVIDIA's mobile processor. Instead of just touting more main processing cores and higher clocks speeds, the Tegra K1 that just debuted hours ago at NVIDIA's CES 2014 press conference, focuses on its radically updated graphics sub-component that's now on par with capabilities of its desktop GPU.

NVIDIA has identified the major stumbling block for gaming realism on a mobile device is that it's limited by the graphics core's capability and the graphics API it supports.

To rectify the limitation, NVIDIA fast tracked their mobility GPU plans and made the leap by adopting a vastly superior GPU, which is none other than their own top of the line Kepler GPU architecture.

Kepler Graphics - The Star of the Show

As identified in the slide above, there's a distinct difference in gaming experience on a mobile device and that of your PC. After Tegra 4, NVIDIA had the option to incrementally update the mobile 72-core GeForce GPU within the Tegra 4 to give it more horsepower in the next iteration, which unfortunately wouldn't have made much of an impact as it's still supporting only lesser APIs that have been associated with mobility products. The other option is that NVIDIA lead the charge to consider a more radical approach to incorporate a superior GPU architecture (in this case, Kepler) in their next generation Tegra.

Now before you consider the obvious answer, consider the fact that a deskop GPU is design is engineered for a completely different power profile, efficiency and performance points. It's not as simple as cutting down the number of processing units within the GPU core to meet the intended mobility product's power profile. In fact, it's not possible unless the GPU architecture was designed and engineered for this purpose. To give you an idea of the discrepancy, this slide gives you a high-level overview of tremendous differences between a mobility device and a desktop device as far as power consumption is concerned:-

Having said that, the Kepler GPU microarchitecture wasn't designed to be mobility-centric from the very beginning, but somewhere along it development cycle, NVIDIA took a crucial and important stand to alter their development process such that their next GPU architecture can scale from mobile to workstation class. This would also mean no more maintaining two different GPU projects and all efforts will focus on one GPU lineup.

The end result is that the Kepler core was built to tackle every usage scenario that will scale from milliwatts to megawatts. No other GPU architecture is made with this in mind and the Tegra K1 is proud to be the first of them. This is why the latest edition of the Tegra is called K1 as it's based on the Kepler architecture - a tremendous leap for mobility products that are still entrenched with DirectX 9 level of gaming quality and programming via OpenGL ES 2.0.

Having new-age hardware features is just part of the equation, but the real impact of Kepler would be truly appreciated by game developers. Since the Tegra K1's graphics core is based on the Keppler architecture, APIs and development like NVIDIA Gameworks and others are readily usable on the Tegra K1. As such, there's nothing that the Tergra K1 can't run; including the entire Gameworks Library.

The beauty of having a graphics core that's similar in capabilities no matter the scale or size of the target product, in this case mobility devices and desktop hardware, is that game designers can develop for any major environment, API or target and not be concerned about having to re-tool the game for a dumbed down environment, which presents a lot of manpower costs for the conversion as well as wasted effort since certain graphical elements may not be seen on lesser equipped hardware. Furthermore, the developer doesn't need separate coding teams working on the same project simultaneously if its required for multi-platforms.

To understand what level of gaming realism is possible, just know that the Unreal 4 engine that was shown not long ago with high performance desktop hardware, can now run via OpenGL 4.4 on Android on the Tegra K1. Here are a handful of video examples we've captured of the reference Tegra K1 device tackle with ease:-

This is why the Tegra K1 can bring PC Gaming realism into the palm of your hands. OpenGL is now fully supported on any programming environment with the latest set of tools, thus making these examples and more possible. In the end, Tegra K1 helps consumers to experience higher quality games spawning across multiple hardware options. A perfect case of design once, publish on multiple platforms.

It's not just the development stage where this synergy works out. With a full Kepler GPU, the Tegra K1 is able to support real-time frame and code debugging. This comes in handy while troubleshooting apps or applications for optimal performance on mobile devices. NVIDIA assured us that the necessary software tools are now in development to catch up with the full debugging capabilities possible on the PC.

The DNA of Tegra K1

The main functional components are identical to that of the Tegra 4 - but most of the components are now vastly updated.

We've talked about the star of the Tegra K1, but not in-depth enough. But before we proceed to the next level of detail, now's a great time to keep stock of what makes up the Tegra K1. As seen in the slide above, the basic building blocks are identical to that of the Tegra 4. What's different on the K1 is that the blocks themselves are now updated as listed on the table summary above.

We'll be discussing each of them in detail over the next few pages.

CPU Choices - 32-bit and 64-bit ARM Core Options

Yes, you’ve read correct. The Tegra K1 is available in both 32-bit and 64-bit ARM processor variants.

Building on NVIDIA's under-the-radar Project Denver, NVIDIA has managed to cram a dual-core Denver 64-bit CPU within the Tegra K1 that's pin compatible to the regular quad-core Cortex A15 CPUs with the fifth power-saver core.

Project Denver is a full custom 64-bit ARM CPU that's based on an ARMv8-A 64-bit processor architecture. It's designed for high performance single-thread and multi-threaded performance with all the benefits of an ARM processor architecture for devices beyond the handheld segment. Made known in 2011, little else has been mentioned about it other than the next-generation Parker Tegra processor was to be the first to feature the Denver CPU. Well, it looks like Logan beat Parker to debut the 64-bit ARM processor and it's a good thing considering Apple's A7 64-bit processor powering the flagship products and that Qualcomm will also debut 64-bit processors this year; albeit not flagship models - yet.

The 64-bit variant can be expected to be in devices in the second half of this year while the 32-bit Tegra K1 will be in devices launching first half of 2014.

32-bit Tegra K1 Still a 4-Plus-1 Processor Model

So is there anything notable about the 'regular' Tegra K1? For one, it's designed for smartphones and tablets in mind, so it's quite likely that the 32-bit Tegra K1 is what will matter most for most consumers in 2014.

While the same Cortex A15 quad-core CPUs of the Tegra 4 make an appearance again on the Tegra K1, they are not identical. On Tegra 4, those use the Coretx A15 "R2" release; the Tegra K1 uses "R3" editions of the core that feature micro-architectural power reductions. As such, it's not a performance boosted edition, but efficiencies that allow it to scale better. Even the power/battery-saver fifth core is also using a Corex A15 core, but it uses a different power profile for operation that's different from the other four main cores.

Further to that, the Tegra K1 will use the 28nm high performance mobile (HPM) process technology that fit the needs of high speed and low leakage power. In essence, boasting the best of HP and LP process technologies. Tegra 4 was built with built on the 28nm HPL process technology. For more reading of the process technologies involved, TSMC has a round-up of their 28nm process family.

The maximum clock rate for the four main cores is 2.3GHz -regardless if only one of them, or all of them are in operation. The companion core has a rated operating frequency of up to 1GHz. Memory interface is 64-bit wide with DDR3-1866MHz memory the likely preferred choice of implementation, but this could change to DDR3-2133MHz by late in the year as better availability of low power high speed memory and improved price points kick in over time.

When we asked NVIDIA on power consumption figures, they could only share the design performance portfolio for Tegra K1 products.

Smartphone - Under 2W
7-inch Tablets - 3 to 3.5W
9-inch Tablets - 4 to 4.5W

Speaking about possible platforms where Tegra K1 might be seen first, the tablet space seems likely as NVIDIA has managed to design the Tegra K1 to successfully operate within their Tegra Note 7 chassis that was first created for OEMs to roll out Tegra 4 based reference tablets.

A Tegra K1 reference tablet with its new specs on the same Tegra Note 7 chassis.

GPU Architecture - A True Kepler

We cannot stress this point enough that the Tegra K1 has a true Kepler GPU. As the slide above shows, all the major GPU processing blocks are taken directly from the Kepler, which first debuted with the GeForce GTX 680. Since the Tegra K1 is a mobility handheld oriented device, the GPU features only one GPC with one SMX cluster - taken directly from Kepler. The basic SMX cluster from any Kepler based GPU has 192 CUDA cores and that’s exactly what the Tegra K1 is endowed with. A minor change on the SMX unit in the Tegra K1 is that it has only eight texture filtering units as opposed to 16 on a traditional GPU's SMX unit.

Considering the Tegra K1's GPU specs, it resembles closest to a GeForce GT 630 using the GK107 core that might have been marketed in emerging markets and OEM channels. No doubt entry-level in nature, it has all the features it needs to run this tessellation demo that was first used to showcase a GeForce GTX 480!

More than ever, tessellation now plays an important role to minimize the size and complexity of terrain, world environment and objects by using a combination of tessellation and displacement maps to create the level of detail required with reference to viewpoint and background/foreground objects. It's possible to do this on the CPU, but it's far less efficient than running it off a GPU. The Tegra K1's modern Kepler graphics core lends its support in this regard to give gamers a far more realistic game experience with high levels of tessellation. Indirectly, this also means better power optimization and improved battery life since the GPU is more adept at handling this kind of workloads.

Other optimizations put in place in the GPU for the Tegra K1 are hierarchical on-chip Z cull as opposed to an off-chip algorithm used in original Kepler, new texture compression support such as DXT, ETC, ASTC, etc., as these formats commonly exist only in the mobile environments due to the different kind of graphics rendering engines used in software to accommodate the different set of mobile processing hardware as opposed to the desktop space. As a whole, the implementation focussed on how different sub-components communicate with the different fabric or environment present on the Tegra processor as opposed to its traditional desktop/notebook add-on plug-in.

Tegra K1's Kepler graphics pipeline adaptation.

Texture and color compression support is available at the UI level and this gives further savings to the Tegra K1 device in terms of bandwidth and power - a feature that was unavailable on Tegra 4.

NVIDIA has shared this slide on its GPU efficiency on the Tegra K1 as opposed to it competitors; a dramatic 1.5x improvement of performance per watt.

Computational Imaging with Chimera 2

Chimera is NVIDIA's branding for their computational photography architecture that first hailed on their Tegra 4. Now on the Tegra K1, the massively parallel processing architecture of the Kepler GPU combined with the updated CPU cores and most notably, dual upgraded image signal processors (ISPs), NVIDIA has conjured the Chimera 2.

With such an advanced ISP and with two of them, NVIDIA claims to possess state of the art Area Processing that's much required when using increasingly higher resolution imaging sensors. Considering the ISP on Tegra K1 boasts support for up to a 100MP camera sensor (theoretically), NVIDIA needs to ensure its hardware is ready to tackle the kind of throughput (though we wonder if the recording media would be fast enough for it).

Combined with the performance of the Tegra K1, the GPU can easily support advanced on-the-fly effects such as live local tone mapping (LTM) preview, Pano paint, video stabilization (auto-correction for rolling shutter, handshake, etc.), live HD video filters (such as sketch, oil patch, cartoon, etc.) and many more live preview effects without delays or performance penalty.

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.