** Updated as of 16th May 2012, 9.30am - We've included the technical details of the Trinity APU to go along with the reference notebook performance analysis.
Today is AMD's big day as it officially launches the AMD Trinity APU - the next generation APU architecture from AMD. Built on the mature 32nm SOI process technology, the Trinity APU will be using a second revision of the Bulldozer architecture and will be a key proponent for ultrathin AMD platform based notebooks. Last month, we gave you our brief experience with the Trinity APUs from the AMD Trinity APU Tech Day event held at Austin, Texas (USA). While that was purely from a standpoint of general usability, it already showcased a marked improvement against their competitor and further cementing the fact that AMD is banking heavily on their heterogeneous system architecture (HSA). As the name suggests, HSA leverages on both the CPU and GPU portions to execute tasks in parallel and thus giving rise to the balanced system architecture that AMD often touts about. This is essentially what an APU is about - an accelerated processing unit which doesn't really concern the user what's really underneath the hood and is able to deliver the right experience utilizing its full set of resources (processing units).
So let’s get started with deciphering codenames, specs and all techie matters. What is the Trinity APU about? Trinity is the true second generation APU variety from AMD. Although the Llano APU is the second variety launched after Ontario in the AMD Fusion family of processors, it was still considered first generation APUs. This is despite the fact that the Llano and Ontario APUs are quite different in several aspects. However, both were the first set of APUs to address the netbook and mainstream notebook segments respectively.
Trinity on the other hand is a replacement for Llano, targeting the same market segment, but with far more improved CPU and GPU components to bring about far better performance per watt with the best in-class entertainment and gaming experience. In fact, Trinity got its name for combining three different architectures - the new Piledriver CPU cores, Northern Islands GPU cores (more specifically Caymen’s VLIW4 architecture) and the video processing and display support engine from Southern Islands GPUs. With so many features enrolled, Trinity promises quite a bit and thus it’s only fitting to be called the true second generation APU.
|Platform code name||Brazos||Sabine||Comal|
|APU code name||Ontario||Llano||Trinity|
|APU Generation||1st Gen. APU||1st Gen. APU||2nd Gen. APU|
|Manufacturing Process||40nm (bulk)||32nm (SOI)||32nm (SOI)|
|TDP Rating||5.5 to 9W||35 to 45W||17W to 35W|
|CPU Core Architecture||Bobcat||Husky (K10 Propus derivate)||Piledriver (enhanced Bulldozer)|
|No. of CPU Cores||1 to 2||2 to 4||2 to 4|
|GPU Class||DX11 (ATI Cedar derivate -
Radeon HD 5400 class)
|DX11 (ATI Redwood derivate -
Radeon HD 5500 / 5600 class)
|DX11 (Northern Islands Caymen derivate -
Radeon HD 6670 class)
|Target Segment||Low Power platforms||Mainstream||Mainstream|
|Availability||Q1 - 2011 onwards||Q3 - 2011 onwards||Q2 - 2012 onwards|
Like the Llano APU before it, the Trinity APU will also be officially known as the A-Series APU - the only difference is the new model numbering scheme which will be denoted by the 4000-series instead of the predecessor’s 3000-series. Featuring the same 32nm SOI manufacturing process and even the same number of cores which number between two to four ‘cores’. However, instead of the old K10 derivative architecture, the AMD Trinity APU takes on an enhanced version of the Bulldozer core.
Now we’re all aware that the Bulldozer CPU core architecture used in the AMD FX processors didn’t fare well mostly because of its ‘dual-core’ module which essentially features dual integer pipelines sharing a floating point unit. As such, a four-core chip will have quad INT units and dual FP units. This unusual design choice was made for more efficient resource sharing in mind as opposed to each processing module having a balanced integer and floating point unit. Obviously AMD wants to make the CPU die more compact with such a setup, but they did also openly admit that the design trade-off is in processing existing x86 instructions in an optimal manner. This basically affects almost all programs until unless updated compilers are used with new applications to better take advantage of its design.
As with many broad-organization-wide decisions, Bulldozer was their future CPU architecture investment which they had to either fix or give it up. They chose the more logical choice to ‘fix’ or improve the architecture which gave birth to the Piledriver CPU architecture that will first be featured on the Trinity APU. Also known as the Enhanced Bulldozer, it was primarily spruced up in terms of new instruction set architecture support (such as AVX, AES and more), several scheduler and pre-fetcher upgrades, improved branch prediction and enhanced cache structure.
Each ‘dual-core’ Piledriver processing module is accompanied by a 2MB L2 cache (which can scale down to as low as 512KB on some entry-level variants), foregoing the L3 cache for mainstream mobile processors as was the case even on Llano. AMD further clarified that negligible improvements were seen in at their design stage to incorporate another cache layer and occupy unnecessary die space. On a typical upper-tier ‘four-core’ A-series APU, this would mean a total of 4MB of L2 cache per CPU. Since it’s still based on the 32nm manufacturing process, the Trinity APU’s 1.303 billion transistors occupy 246mm2 as compared to the Llano APU’s 1.178 billion transistors that have a footprint of 228mm2.
With the more mature 32nm SOI manufacturing process, AMD now is able to deliver even 17W TDP parts for the mainstream APUs as opposed to 35W being the minimum previously. This will help it better get into the Ultrabook race that Intel began but AMD will pursue those favorable traits with the Ultrathin naming convention. However, there’s only one part qualified for the 17W TDP profile and with such limited specifications, we’re note quite sure how it will compete with Intel’s various ultra low voltage Sandy Bridge and Ivy Bridge 17W processors used in Ultrabooks until we encounter an Ultrathin notebook for our evaluation. The full list of mobile AMD Trinity APU processors are as listed:-
(turbo / base)
|L2 Cache||Radeon Cores||GPU Clock
(turbo / base)
|A10-4600M||HD 7660G||45W||4||3.2GHz / 2.3GHz||4MB||384||686MHz / 497MHz||
|A10-4655M||HD 7620G||25W||4||2.8GHz / 2.0GHz||4MB||384||497MHz / 360MHz||
|A8-4500M||HD 7640G||35W||4||2.8GHz / 1.9GHz||4MB||256||655MHz / 497MHz||DDR3-1600
|A6-4400M||HD 7520G||35W||2||3.2GHz / 2.7GHz||1MB||192||686MHz / 497MHz||DDR3-1600
|A6-4455M||HD 7500G||17W||2||2.6GHz / 2.1GHz||2MB||256||424MHz / 327MHz||DDR3-1333
* Take note that every two CPU cores make up 1 Piledriver module. Hence a four-core processor has two Piledriver modules.
AMD’s documentation also revealed that besides the mobile Trinity APUs designed for 17W, 25W and 35W TDP variants, they also mentioned that the future desktop variety will comprise of 65W and 100W TDP variants. This is similar to their existing line-up, but we feel it’s not competitive enough against the competition. AMD’s Trinity solutions are rather mainstream oriented compared to the Intel’s line-up that’s much higher performing for the same TDP profile. AMD will likely have to play the value card once more, but we’ll reserve further comments when we get to test relevant systems.
One of the major feature improvements in the Trinity APU is support for the latest Turbo Core technology, now in version 3.0. For those interested to catch-up on how AMD’s Turbo Core works, you can read up all about it in our Phenom II X6 coverage where it first debuted. This original version only had two states of operation where you get turbo clock speeds or you don’t. When half or more of the processor’s cores are idling, the other active half will operate at a speedier predefined clock speed. In version 2.0, a third state was made available so that all the cores get some small boost frequency as long as they are within the TDP of the CPU. This was implemented on the AMD FX.
Since Trinity uses the new Piledriver core which is an enhanced Bulldozer architecture, Turbo Core has been more thoroughly overhauled in this third iteration as it touts automatic bi-directional power management between the GPU and CPU portions of the die. This is a drastic improvement since previous Turbo Core iterations could only ramp up/down the frequency and voltage aspects on the CPU portion.
Turbo Core 3.0 is able to achieve this via thermal mapping of the die as the Trinity APU constantly and dynamically calculates temperature of the CPU cores and the GPU block based on loading level estimation and maps them to obtain an optical operating point to maximize performance from both processing units - all while staying within the safe operating temperature limits. Note that this isn’t based on measured temperature, but calculated based on workloads which AMD has verified in their labs that turbo boosting behavior is fairly consistent and predictable for any given loading level. Thanks to Turbo Core 3.0, Trinity APUs list base and boost clocks for both the processor and the GPU.