Intel unwraps Alder Lake performance hybrid scalable architecture for desktops and notebooks

This is Intel's first high-performance hybrid scalable architecture ever, and it could finally pose a serious challenge to AMD's recent stratospheric rise.

By Vijay Anand - 14 Nov 2021

Note (1): This feature was first published on 19 August 2021 and is purely covering the Alder Lake architecture used in the Intel 12th Gen Core Processors.

Note (2): Here are all the overclockable desktop processors launched, and the mainstream desktop processors, considerations to build a new 12th Gen Core-based system - Coolers, Motherboard, Memory & Overclocking, and performance review of the new CPU. For laptops, here are the mobile processors launched and the extreme HX class options too.

Heralding an era of advanced high-performance hybrid design processors

At Intel’s Architecture Day 2021, Intel shared their bold new vision and family of solutions to impress industry folks and the media of what’s to come from late 2021 through 2022 with a myriad of solutions across all kinds of usage and form factors from ultra-thin and light devices, through servers and beyond.

For the client computing segment, Alder Lake represents Intel’s work in reengineering their processing cores in a big way and push for a more powerful big-core + little-core implementation that ARM processors have touted for a long while, but in the x86 ISA domain. Lakefield tried to take the first step but it really didn’t take off with very few design wins and even then, it was relegated to ultra-thin and light notebooks and tablet computing segments with performance nothing to write home about.

Alder Lake will finally fix this in a big way with scalable and powerful processor design solutions applicable for desktops, notebooks and tablet-computing or thin and light products.

What it means to jump aboard the Alder Lake train

Before we dive into the details, let us share some key highlights like the fact that you’ll need a new motherboard as the desktop Alder Lake chips will require a new LGA 1700 socket, which Gigabyte leaked much earlier. Secondly, Alder Lake supports DDR5 memory, an industry first, and with its design requirements to have the memory voltage regulation on the DIMM, this is radically different from earlier memory modules.

Alder Lake supports a total of four different memory types: DDR5, DDR4 and low power variants of both.

Thirdly, Alder Lake will be the first in the industry to transition to PCIe 5.0 connectivity, one-upping AMD’s progress. These are big changes to Alder Lake’s overall platform capability and it’s inevitable that a new motherboard and accompanying hardware are needed to complete its performance story.

While PCIe 5.0 theoretically supports up to 128GB/s on an x16 interface, Alder Lake's 64GB/s PCIe 5.0 bandwidth is still double that of PCIe 4.0.

Using a brand new platform also means you’ll undoubtedly get Wi-Fi 6E and Thunderbolt 4 support too, as long as the motherboard chipset tier and accompanying PHY devices support these standards. Understandably, different motherboard tiers may choose to support/exclude certain features, but as an overview of the Alder Lake platform, you get industry-leading memory and interconnect technologies.

Peering underneath the hood

Now here’s where things get more interesting. Unlike most high-performance processors where the processing cores are identical, Alder Lake combines two different types of processing cores – Performance cores codename Golden Cove, and Efficient cores codenamed Gracemont – to form the next-generation performance hybrid processor. To be fabricated using “Intel 7” process technology, Alder Lake will have a variety of CPUs designed for 9W to 125W TDP that will suit ultra-mobile, notebook and desktop client segments. Judging by the TDP range from its TDP, this indicates Alder Lake could even be suitable for high-performance DIY machines intended for overclocking.

How will both core types differ in their mission other than their namesake?

Efficient x86 core was designed for throughput efficiency and enabling scalable multi-threaded performance for modern multitasking.
Performance x86 core was designed for speed and pushing the limits of low latency and single-threaded application performance.

The use of both types of cores would make Alder Lake very difficult to compare and contrast against CPU architectures like Sunny Cove that is the mainstay of Ice Lake and Tiger Lake processors we see in systems these days. However, if we had to sum up the kind of changes Intel put forth for Golden Cove and Gracemont that’s used in Alder Lake, the internal piping has greatly been enhanced in the same fashion when Skylake debuted: wider, deeper, smarter, optimized and more efficient.

A snapshot of the new Efficient x86 core used in the Alder Lake processor. This is effectively the Gracemont core.

In the Efficient x86 core (Gracemont), it has been beefed up with a more accurate branch predictor to sift through deeper entries and larger structure sizes, doubled the L1 Instruction Cache size to 64KB, packed with more instruction decoders (six instead of four), 17 execution ports (up from the usual 10), deeper buffering and improved prefetchers in the memory subsystem, and supporting more modern instruction set such as advanced vector instruction with AI extensions, added more floating point registers to tackle multiply-accumulate (FMUL) and fast add (FADD) instructions to double the FP processing throughput.

Click in to view a more detailed version of the various blocks that make up the new Performance x86 core - also known as Golden Cove.

Over on the Performance x86 core (Golden Cove), besides similar piping advancements to crunch wider code with more decoders, increasing micro-operations transaction rate, better branch prediction and the ability to tackle larger code, improved parallelism with another ALU and LEA integer processing unit and similar FP throughput increase thanks dual FADD vector execution units, this performance core now boasts an updated smart power management controller for even more fine-grained control and power budget management. It also adds a new matrix engine (Intel AMX) for turbocharging matrix multiplications for AI acceleration. Coined as the Advanced Matrix Extensions (AMX) unit, this is a tiled matrix multiplication accelerator that crams up to eight times the operations per cycle per core to save power where traditionally used for fetch, decode and out of order management cycles.

Intel Thread Director technology: The real sauce enabling hybrid processor architectures

Now, packing two different core types doesn’t automatically solve how the instructions, threads and the operating system manage this new hybrid setup. Enter the new Intel Thread Director Technology, which is a new crucial hardware solution needed for dynamic handling instructions/threads to prioritise them accurately and assign them to an appropriate core to achieve the best efficiency while considering the task priority at hand, performance needs (whether it’s a background or foreground task), core temp/power states and much more without user input.

Essentially, Intel Thread Director monitors all runtime instructions and their mix with extreme precision and provides runtime feedback to the OS to optimise the schedule decision of any workload.

With Windows 11 on the horizon, Intel assured that the upcoming OS is much more adept at handling tasks and processes to better benefit the hardware used. As a result, Intel Thread Director technology (and the resulting thread director hints) along with Windows 11 work together hand in hand to achieve the best performance efficiency yet. Windows developers can also specify QOS (quality of service) required and call up the right kind of core as they take advantage of the updated software compilers to target these new CPUs, which might soon be the new normal.

Alder Lake performance expectations

Intel shared that the new Efficient x86 core offers up to 40% more performance at 40% less power draw over a Skylake core in a single thread processing scenario. These figures go up 80% in a quad-thread processing scenario, but we do take these figures with a pinch of salt since Skylake based processors debuted in 2015 – about six years ago. To understand why Intel pulled up this kind of comparison, it’s important to understand that even current Sunny Cove architecture is still derived from Skylake.

Note that these figures were based on a single application using SPECrate 2017 int_base results. Our own experience in running SPEC benchmarks are often an indication of raw power and doesn't translate to real-world client performance with a highly variable workload. So take this performance uplift with a pinch of salt, plus note that it's comparing against an architecture that debuted in 2015.

What about Alder Lake as a whole versus Intel’s 11th-gen Core processors? Using common industry-standard benchmarks like SPEC CPU 2017, SYSmark 25, PCMark 10, WebXPRT3 and Geekbench 5.4.1 – many of which we use regularly in our tests too – Intel claims an average of 19% performance uplift at the same core frequency. Take note that this theoretical claim is just from the processor’s own architectural advancements, and it doesn’t yet factor in the new packaging, memory, platform, interconnect upgrades and more.

Of course, there are some cases where Alder Lake might see lower performance than an equivalent 11th-gen Core processor and when we delved deeper with Intel executives, they shared that the new processor foregoes AVX-512 instruction support and any applications and processes that currently heavily rely on this would contribute to the performance loss they predict. However, Intel doesn’t feel that it would be too much of a concern and should still offer a better overall proposition than 11^th Gen Core processors thanks to Alder Lake’s overhauled core design.

Alder Lake processor configurations

At the time of writing this article, Intel has yet to share exact processor SKUs they expect to have, but they did leave with us some parting thoughts of what to expect on the highest configured processor which would have up to 16 cores (with 8 performance and 8 efficient cores), executing up to 24 simultaneous threads (via dual threads per performance core and a single thread per efficiency core), and a grand total of 30MB of last-level cache.

The above configuration is for a desktop-class Alder Lake processor, while image hints on their slide suggest that a typical mobile-class performance processor could have up to 14 cores (with 6 performance and 8 efficient cores), and an ultra-mobile low-power package might have up to 10 cores (with 2 performance cores and 8 efficient cores).

Don’t forget that Alder Lake will support DDR5 (and DDR4), PCIe 5.0, Thunderbolt 4 and Wi-Fi 6E and to make this all possible, the interconnect throughput within the processor blocks have been greatly widened with up to 1,000GB/s compute fabric, 204GB/s memory fabric and a 64GB/s I/O fabric.

Further outlook

Alder Lake will completely re-invent high-performance multi-core architecture if the hybrid scalable architecture pans out as well as Intel has been preparing for it for the last few years.

Expect Alder Lake client based processors to roll out beginning Fall 2021, which is not far from now.

Beyond compute clients, Alder Lake’s processing cores will also be the framework for Sapphire Rapids, Intel’s next-gen Xeon Scalable Processor with more data-centre centric optimisations added on to make it applicable in its service class. Stay tuned for more reporting on this topic as well as when Intel shares more processor SKU details when it’s ready to flood the market.