NVIDIA's Head Honchos Speak - Computing Trends and Platform Updates

We met with more than a dozen of NVIDIA's top executives at NVIDIA's GTC 2010 and covered quite a bit of updates from GP-GPU computing, Further CUDA notes, where general computing is heading, NVIDIA's processor strategy, Tesla and Tegra platform status and The Next Big Thing for consumers. Find out in this hard-hitting update!

By Vijay Anand - 24 Sep 2010

The Great GTC Meet - Where Disciplines Converge

The GPU Technology Conference hosted by NVIDIA is the perfect breeding ground for ideas, collaborations, enrichment and much more with all the like-minded folks coming together to share their experiences and findings of the world of GPU computing . The ground is practically teeming with GPU programmers, graphics designers, research heads, solution providers and more. This also means NVIDIA's top staff is available around the clock for the duration of this show to facilitate, assist and take in feedback from participants for their future progression. For us, this is an added bonus as we get to pick on various product managers and experts from NVIDIA on their progress in various verticals. We managed to get in touch with high flying executives like Jen-Hsun Huang, CEO of NVIDIA, Bill Dally, Chief Scientist (VP of Research), Andy Keane, GM of GPU Computing, Matt Wuebbling, Senior Product Manager for Notebooks and Tegra, and many other figureheads. Since NVIDIA's involvement in the industry reaches several verticals, we'll share with you the concise version of our findings for easy reading.

Jen-Hsun leads a media roundtable conversation like no other and in this session, he's flanked by Sanford Russel, GM for CUDA group and on the right is Andy Keane, GM of Tesla Business Division and GPU Computing.

GP-GPU Computing

With the first GTC held in 2009, the GPU computing scene has achieved 'lift off' status with several experimental aspects. This year, Jen-Hsun Huang believes the industry would have reached 'escape velocity' as GP-GPU enabled programs go in to production. Next year, the market would probably have experienced the outcome of GP-GPU programs, build case studies, share success stories and drive even greater momentum for GPU computing.

Additional Notes on the CUDA Momentum

We've covered a bunch of new updates and the uptake of CUDA in a , so here's a few more points to add on from our discussions with the executives:-

The recent PGI CUDA C (CUDA-x86) wasn't meant to get multi-core CPUs to match the performance of a many-core GPU. Most apps do not scale linearly with the number of cores. The problem is with coherency and there's not enough memory bandwidth.
Even if the scalability isn't perfect. If CUDA programs can run on a 1000-core cluster, and can achieve a speed-up over a non-CUDA version, then there's still much to benefit from.
The most important thing is that CUDA apps can now run everywhere on the widely available x86 platform. And so increasing CUDA apps' usefulness.
For many general users, CUDA has yet to make a notable impact. Jen-Hsun believes Image Processing is a key area where CUDA can make a difference as it is the most important consumer application that can benefit from parallel computing architecture.
As shown by the Adobe demo (pictured below and with video), computational photography (not digital photography) is the future of photography. For NVIDIA, this is an important area of interest and one that they would invest a great deal in the future.
Last but not least, our media roundtable session had an interesting question if NVIDIA should open a CUDA app store to encourage more developers to get into the game? NVIDIA thought it was an interesting idea which they could probably venture into later. For now there haven't been plans for such.

Adobe's research team came to GDC 2010 to show off a new way of manipulating images with GPU power - computational photography. Using a plenoptic lens to capture the large image seen in the background, Adobe tapped on to the power of the GPU to render a full proper image with further control to alter the image focusing and even create stereoscopic imaging - all thank to ultra high resolution captures, the plenoptic lens being able to capture several variations of the same shot for added image information and the GPU having a parallel processing architecture to process all this data fast.

The General Computing Scene

There are 3 primary areas of computing that NVIDIA is investing in - Mobile, Visual and Parallel computing. While they've primarily been a visual computing solutions providers, with their CUDA momentum and a parallel processing centric GPUs, Jen-Hsun believes the company is probably 65% involved in Parallel Computing scene (considering their Visual solutions are sort of a branch of parallel computing at this juncture), while the rest of the focus pertains to their mobile computing with their Tegra investment.

With the heightened CUDA momentum, GPU computing and software engineering to make it all happen, Jen-Hsun mentioned that they are opening up the software and processor architecture again after a very long period of status-quo. The PC is evolving and thus a whole new industry and is brewing with new vendors, distributors and more.

Even the way software is consumed is drastically changing. One used to buy software off the shelf and uses it for the life of the PC, which could be 3 to 5 years. Today, apps are very affordable and sometimes even free; one can download it off the air and quickly discard it if it's not appealing to the users. So the software usage model is shifting and NVIDIA's CEO has some interesting comments to add on:-

According to Jen-Hsun, software used to be very important, but x86 software compatibility is not important at all today. This is because there's a constant supply of apps coming in all the time. So the heritage (referring to the x86 legacy apps) is only important in the server room and is no longer important for consumers these days. In fact, he mentioned it's so unimportant these days that the fastest growing PC company is Apple and it doesn't run legacy applications at all. The world has changed.
In future, most mobile computing devices will be made up of SOC implementations, which could be integrated behind the screens, thus making tablets the dominant choice of physical implementation, while notebooks/netbooks, would probably just be the same devices with keyboard attachments. Dock the tablet on a docking station, and you have a full fledged PC.
In 5 years, probably nobody would need their laptops in conferences and meetings. Either a tablet or Super Phone would prevail.
Moving on to a more technical note, improving direct connectivity of graphics cards are not important. Today, the I/O data from storage or fiber channel or something else, goes in to system memory first. And then from system memory, it gets copiedNVIDIA's Processor Strategy
According to Jen-Hsun, their 'CPU' strategy is in using the ARM architecture, which they've already invested in via the formation of Tegra. NVIDIA chose ARM because it's the fastest growing CPU architecture in the world. It's the CPU of choice for the most important mobile operating systems in the word today; like Android, Apple etc. NVIDIA also chose ARM because there are many more application developers than any other CPUs in the world today.
Jen-Hsun carried on with these interesting notes, "It is the CPU of the future, not a CPU of the past. x86 architecture growth is flattening out while ARM growth is going up almost exponentially. Even Microsoft has licensed for ARM and they are a software company. Where is the biggest market share for device growth now and in the future? It's in ARM based products. Microsoft doesn't have market share yet in ARM based devices and it's a huge potential for them. All of the smartest companies are investing into ARM, not x86. In 5 or 10 years time, ARM could perhaps perform a whole lot better to be able to populate larger or more powerful devices. Also there's still the 64-bit version of Tegra to look forward to in the future."

out to other devices like the graphics card's local memory. This move is very expensive - requiring double copies, it's very wasteful use of the memory and eats bandwidth unnecessarily. The Ideal approach is just a direct copy. It can be over PCIe because it has a much higher bandwidth than fiber channel or anything else. PCIe has plenty of bandwidth. The problem is system memory and to waste system memory with this extra copy. In the future, I/O devices could probably DMA directly to the graphics card's memory for copying data instead of copying to system memory first. It's not an easy task to put this into action, but Jen-Hsun is hopeful that it's workable in the future.
Lastly, he mentions that computational resources at this point of time is nearly infinite, but transmission bandwidth is probably still the limiting factor. Thus data and process efficiencies are still required to get the most out of the existing transmission architectures until unless there's a new shift on the horizon. (Editor's Note: perhaps Light Peak by Intel).

NVIDIA's Processor Strategy

According to Jen-Hsun, their 'CPU' strategy is in using the ARM architecture, which they've already invested in via the formation of Tegra. NVIDIA chose ARM because it's the fastest growing CPU architecture in the world. It's the CPU of choice for the most important mobile operating systems in the word today; like Android, Apple etc. NVIDIA also chose ARM because there are many more application developers than any other CPUs in the world today.

Jen-Hsun carried on with these interesting notes, "It is the CPU of the future, not a CPU of the past. x86 architecture growth is flattening out while ARM growth is going up almost exponentially. Even Microsoft has licensed for ARM and they are a software company. Where is the biggest market share for device growth now and in the future? It's in ARM based products. Microsoft doesn't have market share yet in ARM based devices and it's a huge potential for them. All of the smartest companies are investing into ARM, not x86. In 5 or 10 years time, ARM could perhaps perform a whole lot better to be able to populate larger or more powerful devices. Also there's still the 64-bit version of Tegra to look forward to in the future."

High Performance Computing with Tesla

Here's a Tesla module made for compact blade systems. This particular Tesla model is an X2070 with 6GB GDDR5, 448 CUDA cores and has a peak performance of 515GFLOPS.

Targeted for HPC applications, oil and gas exploration, molecular dynamics, and many other computational science aspects that require heavy physics processing.
The biggest competitor is multi-core CPUs since the Tesla group is practically starting from ground up. So from that point of view, the Tesla department has tremendous growth potential.
According to some of the professors in universities, the biggest problem they face is compute power (only 50TFLOPS on average). They usually have to rely on supercomputing centers, but by the time they get to experiment and try to follow-up for more compute experiments, the researchers would have lost interest and moved on to another idea. So the waiting game to try out one's ideas is counterproductive at the end of the day. A Tesla equipped system can grant researchers parallel computing power for little money and space.
A separate purpose built chip isn't required for HPC usage. There's a very strong overlap in the type of computing done in HPC and what is done on the consumer side. For example the physics simulations you run in PhysX are physics. The Tesla team used the same GPU foundation and then added what is necessary for the HPC environment. This is an effective strategy at this point of time for the team. The type of processing needed in the consumer market, Quadro market and HPC is very similar. Like the strategies in the CPU world, Tesla adds double precision, ECC and few others to make it HPC suitable.
On why some GeForce products are actually faster than Tesla products, it's because a lot of margin in the GPU has been added to ensure it can run in the data centre, 24/7, doing a very valuable job where someone could be relying on it for big revenue. So its goal is to provide the highest reliability, and not actually the highest performance. If one aims for performance, there could be some possibility where results might not turn out as intended, thus costing money. So it's better off being tuned for reliability.
The biggest benefit and value of a GPU is what a small company/start-up can do, rather than the big companies who've very big budgets and can afford large-scale systems to deliver what they need.

To move on to CUDA in the cloud, Tesla in the cloud, the first thing NVIDIA needed to solve was to have solutions for the cloud providers that they can buy. They cannot buy a graphics card, thus buy whole computers. This is why NVIDIA's partnership with HP, IBM, Dell and others are very important. Unlike just one vendor last year, several top system providers have configurations with Tesla. And so, now several other companies can get obtainable solutions without much effort. We saw several configurations and models on show in GTC 2010.

The Tesla module seen above belongs as part of one of the highest performing Tesla blade systems. This is the T-Platforms TB2-TL where each blade node as seen here has dual Intel quad-core Xeon L5630 processors, dual Tesla X2070 GPU modules and up to 24GB DDR3 main memory. The complete TB2-TL blade enclosure fully populated will have 192 quad-core processors, 192 Tesla GPUs, up to 384GB of main memory, 192GB of GPU memory, peak power of 105TFLOPS, weighs 154.6kg and consumes 12KW of power. Mind boggling for a 7U enclosure.

Here's another rack unit, Appro's 1U Tetra GPU Solution. Each rack can house dual Intel Xeon processors and quad Tesla M2050 GPUs. The interesting aspect of this rack is its modular integration of the Tesla GPUs which are neatly installed on both the sides of the rack. This is an example how server racks these days are thoughtfully designing racks to cater to GPU processing as well.

Mobile Computing Update with Tegra

The traditional PC space is steadily moving towards the mobile arena. As such, the Tegra will be the company's fastest growing personal computer business. Most of the portable devices we own today are all potential for Tegra to power them in future.

New growth areas in the near future is surely in the mobile computing arena by partnering large providers like Nokia, Motorola and more to drive their mobile offerings.
Tegra's focus will focus on solutions for tablets, mobile phones, smart TVs, mini notebooks and even automotive industries. The Tegra chip is versatile enough that it can practically be implemented in countless areas other than those mentioned. All it needs is a partner who's open to new integration ideas.
The upcoming Tegra 2 chip will primarily focus on the Android OS platform. Other OS options are a possibility, but those will be evaluated for later.
Current announced vendors working with NVIDIA on Tegra 2 products are LG and Toshiba with many more under wraps for the time being.
Wireless modems will continue to be kept outside of the main Tegra chip design to ensure the core is adaptable easily for multiple markets and operating requirements. As such NVIDIA finds that it's more advantageous to work with wireless modem specialists to move quickly with the market needs.
3D Vision on mobile Tegra equipped devices is a possibility in the future. Small devices can use displays that can be 3D ready without needing glasses. Furthermore, these devices might even have 3D stereo capture possibilities integrated.
CUDA and PhysX on Tegra is a possible implementation in the future, but not at the moment due to the low processing throughput.
When Tegra chips progress enough, they could possible tackle PC gaming on Tegra equipped devices through the OpenGL ES 2.0 API, which is easy for developers to port their existing content from the PC and other platforms.
Like other mobile chip designs, Tegra will go multi-core as it's going to be the prevalent design for performance mobile processors in time to come. For example, running dual 500MHz cores is more power efficient than operating a single 1GHz core. Race-to-Sleep will be an integral part of the chip design to ensure optimum

processor usage and maximum battery life. In the future, Tegra chips may even boast of 'turbo boost' like functionality to make best use of the chip's TDP parameters, but that's at a much later stage.
At this point of time, Tegra 3 design is almost done, while Tegra 4 is on the drawing board.

Here's a Tegra 2 based prototype tablet that we spotted at the show floor. It's basically a development kit for several application developers.

When the press asked Jen-Hsun if NVIDIA should develop their own OS to speedup adoption of Tegra based devices, Jen-Hsun had quite a bit to, including mudsling their competition:-

"No. We should do something only if we can make a significant contribution to the world. And only if we're very good at it. There are parts of the operating system which we're very good at, like API, middleware, system software. Can we make a contribution? That is answered by asking if there are other alternatives around. The world right now has several good alternatives. iOS, BB OS, Android, Windows Mobile, Symbian and Meego.

Intel chose to work with Meego, the number 6th OS. There is actually no such thing as the number sixth OS, and the reason is because operating systems need developers and the developers don't want to work on number six because they can work on the others, especially the number one and two OS types. So why is Intel working with Meego? The reason for that is because all of the others do not support x86. So if they don't do Meego, they would be number zero. Intel has no choice; Why else is Intel building AppUp? Because if Intel doesn't do AppUp, who will? There's no app store for x86. It does not exist. Nobody else is doing it because x86 has no mobile future. X86 is the enterprise past. So why would NVIDIA build yet another OS? Because NVIDIA can choose to work with the top mobile OS providers, who already run on ARM based processors."

The Next Big Thing...

Jen-Hsun believes that the next hot commodity would be 'Super Phones'. These are devices that are larger than conventional smartphones, but are still handier than tablets. This is also the area where NVIDIA is aiming Tegra 2 to fill the new slew of products. He expects to see super phones to be in retail by this Christmas and sustain the cool factor as the next big thing in mobile computing/devices for years to come.

More tablets will certainly come and be popular as a content consumption device - again an area that Tegra 2 will fill the role nicely. However, Jen-Hsun believes super phones might still outpace interest of tablets (comparatively speaking).

For Jen-Hsun, the ultimate super phone would be one that he can rely upon for all usage needs on the move that he won't ever need to lug a notebook again. For that to happen, more features and standards will need to be crammed. Some of his expectations are even higher resolution screens, Wireless HDMI and more. Just like how the desktop PC has given way to notebooks where revenue and growth are concerned, the next shift will occur with super phones disrupting both the PC and notebook markets.

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.