AMD Trinity APU - A Notebook Platform Performance Review


AMD Trinity Notebook Platform - Rise of the Underdog

** Updated as of 16th May 2012, 9.30am - We've included the technical details of the Trinity APU to go along with the reference notebook performance analysis.

Rise of the Underdog

Today is AMD's big day as it officially launches the AMD Trinity APU - the next generation APU architecture from AMD. Built on the mature 32nm SOI process technology, the Trinity APU will be using a second revision of the Bulldozer architecture and will be a key proponent for ultrathin AMD platform based notebooks. Last month, we gave you our brief experience with the Trinity APUs from the AMD Trinity APU Tech Day event held at Austin, Texas (USA). While that was purely from a standpoint of general usability, it already showcased a marked improvement against their competitor and further cementing the fact that AMD is banking heavily on their heterogeneous system architecture (HSA). As the name suggests, HSA leverages on both the CPU and GPU portions to execute tasks in parallel and thus giving rise to the balanced system architecture that AMD often touts about. This is essentially what an APU is about - an accelerated processing unit which doesn't really concern the user what's really underneath the hood and is able to deliver the right experience utilizing its full set of resources (processing units).

 

AMD Trinity - What’s in a Name?

So let’s get started with deciphering codenames, specs and all techie matters. What is the Trinity APU about? Trinity is the true second generation APU variety from AMD. Although the Llano APU is the second variety launched after Ontario in the AMD Fusion family of processors, it was still considered first generation APUs. This is despite the fact that the Llano and Ontario APUs are quite different in several aspects. However, both were the first set of APUs to address the netbook and mainstream notebook segments respectively.

Trinity on the other hand is a replacement for Llano, targeting the same market segment, but with far more improved CPU and GPU components to bring about far better performance per watt with the best in-class entertainment and gaming experience. In fact, Trinity got its name for combining three different architectures - the new Piledriver CPU cores, Northern Islands GPU cores (more specifically Caymen’s VLIW4 architecture) and the video processing and display support engine from Southern Islands GPUs. With so many features enrolled, Trinity promises quite a bit and thus it’s only fitting to be called the true second generation APU. 

Notebook Fusion Platform Progression
Platform code name Brazos Sabine Comal
APU code name Ontario Llano Trinity
APU Generation 1st Gen. APU 1st Gen. APU 2nd Gen. APU
Manufacturing Process 40nm (bulk) 32nm (SOI) 32nm (SOI)
TDP Rating 5.5 to 9W 35 to 45W 17W to 35W
CPU Core Architecture Bobcat Husky (K10 Propus derivate) Piledriver (enhanced Bulldozer)
No. of CPU Cores 1 to 2 2 to 4 2 to 4
Built-in GPU Yes Yes Yes
GPU Class DX11 (ATI Cedar derivate -
Radeon HD 5400 class)
DX11 (ATI Redwood derivate -
Radeon HD 5500 / 5600 class)
DX11 (Northern Islands Caymen derivate -
Radeon HD 6670 class)
Target Segment Low Power platforms Mainstream Mainstream
Target Products
  • Netbooks
  • Compact Computing
  • Mainstream Notebooks / Desktops
  • AIO Desktops 
  • Ultrathin Notebooks
  • Mainstream Notebooks / Desktops
  • AIO Desktops 
 Availability Q1 - 2011 onwards  Q3 - 2011 onwards  Q2 - 2012 onwards 

 

Peering Inside the Trinity APU

Like the Llano APU before it, the Trinity APU will also be officially known as the A-Series APU - the only difference is the new model numbering scheme which will be denoted by the 4000-series instead of the predecessor’s 3000-series. Featuring the same 32nm SOI manufacturing process and even the same number of cores which number between two to four ‘cores’. However, instead of the old K10 derivative architecture, the AMD Trinity APU takes on an enhanced version of the Bulldozer core.

Now we’re all aware that the Bulldozer CPU core architecture used in the AMD FX processors didn’t fare well mostly because of its ‘dual-core’ module which essentially features dual integer pipelines sharing a floating point unit. As such, a four-core chip will have quad INT units and dual FP units. This unusual design choice was made for more efficient resource sharing in mind as opposed to each processing module having a balanced integer and floating point unit. Obviously AMD wants to make the CPU die more compact with such a setup, but they did also openly admit that the design trade-off is in processing existing x86 instructions in an optimal manner. This basically affects almost all programs until unless updated compilers are used with new applications to better take advantage of its design.

As with many broad-organization-wide decisions, Bulldozer was their future CPU architecture investment which they had to either fix or give it up. They chose the more logical choice to ‘fix’ or improve the architecture which gave birth to the Piledriver CPU architecture that will first be featured on the Trinity APU. Also known as the Enhanced Bulldozer, it was primarily spruced up in terms of new instruction set architecture support (such as AVX, AES and more), several scheduler and pre-fetcher upgrades, improved branch prediction and enhanced cache structure.

Each ‘dual-core’ Piledriver processing module is accompanied by a 2MB L2 cache (which can scale down to as low as 512KB on some entry-level variants), foregoing the L3 cache for mainstream mobile processors as was the case even on Llano. AMD further clarified that negligible improvements were seen in at their design stage to incorporate another cache layer and occupy unnecessary die space. On a typical upper-tier ‘four-core’ A-series APU, this would mean a total of 4MB of L2 cache per CPU. Since it’s still based on the 32nm manufacturing process, the Trinity APU’s 1.303 billion transistors occupy 246mm2 as compared to the Llano APU’s 1.178 billion transistors that have a footprint of 228mm2.

With the more mature 32nm SOI manufacturing process, AMD now is able to deliver even 17W TDP parts for the mainstream APUs as opposed to 35W being the minimum previously. This will help it better get into the Ultrabook race that Intel began but AMD will pursue those favorable traits with the Ultrathin naming convention. However, there’s only one part qualified for the 17W TDP profile and with such limited specifications, we’re note quite sure how it will compete with Intel’s various ultra low voltage Sandy Bridge and Ivy Bridge 17W processors used in Ultrabooks until we encounter an Ultrathin notebook for our evaluation. The full list of mobile AMD Trinity APU processors are as listed:-

Specifications of AMD A-Series Trinity APUs for the Notebook Platform
Model Radeon
GPU
TDP CPU 
Cores
CPU Clock
(turbo / base)
L2 Cache Radeon Cores GPU Clock
(turbo / base)
Max DDR3
A10-4600M HD 7660G 45W 4 3.2GHz / 2.3GHz 4MB 384 686MHz / 497MHz

DDR3-1600
DDR3L-1600
DDR3U-1333

A10-4655M HD 7620G 25W 4 2.8GHz / 2.0GHz 4MB 384 497MHz / 360MHz

DDR3-1333
DDR3L-1333
DDR3U-1066

A8-4500M HD 7640G 35W 4 2.8GHz / 1.9GHz 4MB 256 655MHz / 497MHz DDR3-1600
DDR3L-1600
DDR3U-1333
A6-4400M HD 7520G 35W 2 3.2GHz / 2.7GHz 1MB 192 686MHz / 497MHz DDR3-1600
DDR3L-1600
DDR3U-1333
A6-4455M HD 7500G 17W 2 2.6GHz / 2.1GHz 2MB 256 424MHz / 327MHz DDR3-1333
DDR3L-1333
DDR3U-1066

* Take note that every two CPU cores make up 1 Piledriver module. Hence a four-core processor has two Piledriver modules.

AMD’s documentation also revealed that besides the mobile Trinity APUs designed for 17W, 25W and 35W TDP variants, they also mentioned that the future desktop variety will comprise of 65W and 100W TDP variants. This is similar to their existing line-up, but we feel it’s not competitive enough against the competition. AMD’s Trinity solutions are rather mainstream oriented compared to the Intel’s line-up that’s much higher performing for the same TDP profile. AMD will likely have to play the value card once more, but we’ll reserve further comments when we get to test relevant systems.

 

AMD Turbo Core Technology 3.0

One of the major feature improvements in the Trinity APU is support for the latest Turbo Core technology, now in version 3.0. For those interested to catch-up on how AMD’s Turbo Core works, you can read up all about it in our Phenom II X6 coverage where it first debuted. This original version only had two states of operation where you get turbo clock speeds or you don’t. When half or more of the processor’s cores are idling, the other active half will operate at a speedier predefined clock speed. In version 2.0, a third state was made available so that all the cores get some small boost frequency as long as they are within the TDP of the CPU. This was implemented on the AMD FX.

Since Trinity uses the new Piledriver core which is an enhanced Bulldozer architecture, Turbo Core has been more thoroughly overhauled in this third iteration as it touts automatic bi-directional power management between the GPU and CPU portions of the die. This is a drastic improvement since previous Turbo Core iterations could only ramp up/down the frequency and voltage aspects on the CPU portion.

Turbo Core 3.0 is able to achieve this via thermal mapping of the die as the Trinity APU constantly and dynamically calculates temperature of the CPU cores and the GPU block based on loading level estimation and maps them to obtain an optical operating point to maximize performance from both processing units - all while staying within the safe operating temperature limits. Note that this isn’t based on measured temperature, but calculated based on workloads which AMD has verified in their labs that turbo boosting behavior is fairly consistent and predictable for any given loading level. Thanks to Turbo Core 3.0, Trinity APUs list base and boost clocks for both the processor and the GPU.