Feature Articles

Intel's CPU Roadmap: To Nehalem and Beyond

By Vijay Anand - 21 Mar 2008

Nehalem's Core and Tri-Level Cache Structure

Nehalem's Core and New Tri-Level Cache Structure

For 2008, Intel foresees that the bulk of their shipment for the higher-end processor segment would be quad-core processors and possibly even for 2009. As such quad-core would be their main focus and Nehalem's first iteration would be that as well. However, unlike previous generations where each processing core has a shared L2 cache among them, Nehalem opts for a design that's more akin to that of AMD's Barcelona.

A die shot of the Nehalem processor and its functional blocks (note the modular layout).

This means each processing core has a small and dedicated L1 and L2 cache, but shares a common large L3 cache among all the processing cores. Here's how Nehalem's Cache structure stacks up:-

  • L1 Cache per core (32KB Instruction and 32KB Data) - similar to Intel's current Core microarchitecture.
  • L2 Cache per core (256KB, low latency)
  • L3 Cache (8MB, fully shared among all cores) - adopts an Inclusive Cache Policy

Nehalem's new 3-level cache structure.

With Nehalem adopting an integrated memory controller to interface with memory directly and using QuickPath Interconnects for speedy inter-processor communications, Intel doesn't necessarily have to buffer up on huge amounts of cache like they used to on their high-end Xeons (some which have more than 12MB of L2) using the existing FSB based architecture. Thus, Nehalem uses small L1 and L2 caches dedicated to each core, but Intel has still given the processor a generous 8MB of L3 cache (so even though it has half the Barcelona's L2 cache, it has four times its L3 cache). The Inclusive Cache Policy of the L3 cache further ensures to minimize snoop traffic since it doesn't have to snoop every other cache level if the required data isn't on the L3 cache (because whatever is in L1 and L2 are in L3 as well). Barcelona however adopts an exclusive cache policy, which does allow more caching to take place, but that's because it requires it since it has a much smaller 2MB L3 cache. Intel on the other hand can really afford the die space given the massive L2 caches it is able to squeeze into the Penryn. In addition to the main cache structure change, Nehalem also incorporates a second-level 512-entry Translation Look Aside Buffer (TLB) to further improve the performance of virtual address translations.

To maintain a modular structure to scale processor designs easily, note that the L3 cache is not exactly part of the main core, but an additional building block of the processor. Likewise, the cores, QPI blocks and the integrated memory controller are all various building blocks that make up the base design of the Nehalem processor. The slide from Intel below better illustrates the use of these various building blocks to scale processor designs and an illustrative example here is a comparison of the expected 4-core processor with that of a possible 8-core processor. Take note that Intel can even integrate a graphics core into the CPU if it so wishes to, but no other details were shared other than this possibility. It won't be till much later in the year when Nehalem is expected to arrive, so there's really a long way ahead for further details to crop up on their integrated graphics option (most probably to combat AMD's Fusion strategy).

The scalable and modular design of Nehalem's microarchitecture.

Join HWZ's Telegram channel here and catch all the latest tech news!
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.