In the near-term, the APU strategy is really going to benefit notebook platform the most with the higher integration and power savings aspect from the platform level. However, the biggest gains from the APU have yet to be untapped as it highly depends on the extent of collaboration and software design tools to take advantage of both the CPU and the GPU concurrently to execute the tasks - hence the combined acceleration of the APU. The APUs that AMD are launching have yet to be optimized for better programmability and when that happens, that's when the second wave of advantage awaits the AMD Fusion platform.
Before the conference ended, AMD also teased the audience with a die shot of their next generation APU, codenamed Orochi. It's also going to be a 32nm part like the Llano, but the CPU cores are going to be based on Bulldozer - not an existing K10 derivate nor the lower-end Bobcat. Bulldozer is the next generation core module (featuring two full integer execution cores) for upcoming Opeterons and high speed consumer processors sometime in 2011. The Orochi using this will have four Bulldozer core modules capable of addressing 8 threads with a total of '8 processing cores'.
Before we wrapped up our time in Taipei, we were privileged to have some time with Joe Macri, AMD's Corporate VP and CTO for Client Division, to get more insights of AMD's Fusion plans.
HWZ: Difference Between AMD's Fusion concept and the competition like Intel's current Core i3/i5 and upcoming Sandy Bridge processors?
Joe: There are some big differences between what we're doing and what our competitors are doing. When we say our tagline, the future is fusion, it actually embodies a lot more than just a marketing term. We really see the fusion architecture moving from this initial step out of time and evolving in a way where it's always backwards compatible. So we've a very cohesive vision as we look forward. When we compare our initial version of fusion that we're launching, to what Intel has done on the Sandy Bridge, we see differences. So Sandy Bridge is literally, I look at it and say a graphics unit put on the same piece of silicon as the CPU, but they are not cohesive in any way, shape or form. When you compare with Ontario, our graphics unit can, firstly, do compute; so you can actually have the CPU and GPU can work on the same problem. When you need floating point capabilities, the GPU can work on it, and when u need a lot of scalar capabilities to do a lot of integer processing quickly with low latency, the CPU can work on it. So right off the back, we have huge performance and power advantage over what they've done with Sandy Bridge.
The second thing we've done is that we want to have a cohesive architecture on the graphics side, which means we need to have the latest graphics architecture. We want our external GPUs and our internal GPUs to be able to work with each other. When putting in an external GPU, you don't want to turn off the internal one but rather, you would want it to work together on the same problem together. And if it's a visual problem, you've got to do it in ways like we do with CrossFire. And CrossFire means you've got to have a compatible architecture. We put in the latest graphics engine into the APU unlike Sandy Bridge and that's another differentiator. So the latest graphics programs are going to work; and if you look back at DX9 and DX10, we've got a software investment at AMD with the unified driver model that all your old stuff works well. That's why it took us a long time to give you something so extensible; to give you something that it's the latest and provide full compute capabilities on both sides. We could have taken short cuts to deliver some aspects of these earlier, but we couldn't figure out why it would be compelling for the end-users. On the Sandy Bridge, if you want to run any of the latest stuff, you need to plug in a discrete card and when you plug that in, your internal graphics just gets turned off because it's just incompatible with it. At least it would have been nice if the Sandy Bridge graphics could do compute, but instead it just gets turned off. It has a very good x86 core, but the rest of it, is just a 'hack'. The L3 cache is required if you're going to share data between two different compute units, but the GPU in Sandy Bridge doesn't do compute, so it's just there to hide latencies. GPUs don't need low latency, they need lots of bandwidth to a lot of memory. So their use of a cache is likely a design shortcut. The memory subsystem on the Fusion is very complicated as it needs to manage both the low latencies required by the CPU and the high bandwidth needs of the GPU.
The other thing is that we balance our GPU to the CPU on the APU. As we put in bigger CPUs, you get bigger GPUs, so that for most of the users in the world, the APU would be fine. But when you need to have an imbalanced system, such as gamers who need a lot of 3D performance, and when they stick in a discrete GPU, you don't turn off the internal GPU. You either use it in conjunction depending on the size of the external graphics or you use it for compute, and so you get a cohesive system.
As we look forward, we're going to enhance it in a way that will make the APU easier to program, easier to do compute, easier to share data between the CPU and GPU. But all these enhancements we're doing, all the old stuff is still going to work (such as the old DX games, x86 programs), code optimized for Ontario and Llano will still work; we never want to have anything be thrown away. As we go forward, we got a good vision that make the CPU and GPU be equally programmable, equally usable. It will allow the software writer to be an artist and not be an engineer. When the software writer thinks engineering, he's not thinking of the end vision he wants to give the consumers. What he's focused on his the data disparity, two different memory systems, how to manipulate them and ensure the latest data sets are obtained, etc. - we want to get rid of all that. We want to make it so simple for programmers to tackle the problem, make the memory models simpler, etc. So as we move forward, after Ontarion and Llano, you'll see this continuous enhancement, but it's all backwards compatible. And that's what I don't see out of our competitors.
HWZ: How would you get vendors to optimize programs for the APU specifically when the rest of the market is still on a conventional programming model?
Joe: We do it because our biggest metric is performance per watt. If you write code that runs on for the system yesterday, it will still run very well on an APU; actually in many cases it will only run on an APU if you write for DX11 while on Intel's Sandy Bridge, it just wouldn't work. We want to offer to the programmers a better path, but it is also compatible with the model of today and that's a really old model. Intel talks about the future of programming, though not in a cohesive vision but they talk about going to Larrabee, but it's a totally funny way of connecting x86 units to vector units; very wasteful of the hardware if you need a lot vector units instead of x86 units, all those x86 cores are going to be dead and if you need a lot of x86 units, those vector units will be dead. That's radically different from what they've got on Sandy Bridge, nothing in common in that programming model at all, maybe in some cases it won't even run because it's such a radical change. So I think architecturally, we're heading in the right direction, even if we've multiple models, our model is designed to work with them all. You can use it in a way to give you the best, but it will still run everything else just as good as the others.
HWZ: Will Fusion eventually replace low-end and mid-range discreet GFX?
Joe: Well, I think it's going to get redefined. As shown in the slides, today you have low-end discrete positioned above integrated graphics. And APUs aren't really integrated graphics since they bring about a whole new level of programming, in fact a whole new way of really looking at the platform. We're going to want our discrete GPUs to be above it. There's just a limit of how much graphics we can put into an APU; you've got to keep the chips cost effective. When you're buying an APU, you're basically buying into a balanced system. We'll want you to add more discrete graphics on to the system to tune it to your particular needs. So you'll find AMD products are always very cohesive but for the discrete graphics, we might still offer some low-end SKUs to match up to Intel's platform, because we make a lot of money selling them as Intel doesn't do such a good job at graphics. You can see that Sandy Bridge, they can't even run some of the latest stuff. So I think our low-end discrete will still ship on Intel (platforms) when it makes sense. We've no problems making money where they're not doing so good.
HWZ: Do you think Super Phones would bring the demise of mainstream computing?
HWM: No, not at all actually. I think ultraportable devices are very critical to all of our lifestyle. They don't replace what we see a PC can do today. The ability to manipulate content, create content, the ability to get the ultimate multimedia experience, these things will come down in form factor, but the user interface has to adapt with it. Otherwise you just won't be able to manipulate it. So I really believe in a cohesive set of devices. Desktop systems aren't going away, all the way down to the smallest form factors. My belief is that Fusion will span them, and span that with a common architecture, so that applications can move from one device to the next. So I think that's the real forward vision. I don't think cell phones and thin clients with limited compute capabilities are our future. Actually I think we need powerful computers everywhere; you want to move the computer to where the data is. Stuff like advanced user interfaces will have huge amounts of data that will be generated and will be thrown away, hence you'll want those computes locally. And when you advance the UI, you'll need more compute and be able to shrink form factors. I think the future is very bright, for both the super phones as well as larger form factors.
HWZ: So x86 will have a long shelf life?
Joe: ARM has no advantage. Don't let anyone tell you that ARM is superior to x86 in any way. I also won't say the other direction. What matters is how you design; it's not the ISA that truly differentiates a processing unit. What's beautiful of the x86 ISA is compatibility. That's the most important thing we've offered and you're going to continue to see that extend. I don't ever think twice about being able to outdo an ISA, it's just that there's no real advantage or disadvantage to it. The only disadvantage is that there isn't a lot out there; and what is out there, isn't very cohesive because the ecosystem doesn't want it. The guys at Blackberry don't want to work on an Apple, and vice-versa. So the ARM ecosystem is more about a business model, it's not anything, as an engineer, make me want to jump up and down.