Meteor Lake: Intel’s biggest architecture shift in 40 years to deliver significant gains

Intel 4 process, first tile-based disaggregated architecture, advanced packaging tech, double the GFX horsepower and an NPU to deliver AI at scale. What’s not to like? #intel #meteorlake #ultracore #intel4 #foveros #ai #npu #tiles

Note: This feature was first published on 19 September 2023.

(Image source: Intel)

(Image source: Intel)

Meteor Lake is how Intel demonstrates that it’s firing on all its cylinders

Intel is throwing in everything but the kitchen sink with its upcoming Meteor Lake processor. The first everyday processor with an integrated NPU for AI at scale, more accelerators than any other CPU before it, double the graphics capabilities, a significant boost in performance per watt through its first-ever tile-based disaggregated architecture, the first processor to use the Intel 4 process node and the first volume consumer processor to use Foveros advanced packaging technology. We thought Alder Lake’s P-core and E-core implementation with the Intel Thread Director was huge, but frankly speaking, Meteor Lake’s advancements are off the charts.

In our books, it is a far superior processor than anything else released by Intel for consumers in recent years, technically speaking. Does that mean you’re going to get a 2x performance uplift? Perhaps in tasks involving the GPU, NPU or if you’re looking at power efficiency per watt. So yes, it won’t magically improve performance across the board for all tasks. Still, the idea of Meteor Lake is to debut notable advances to move the performance per watt efficiency in a big way and a testbed of all the new developments put together, including the new architecture type, new packaging, and more.

At Intel Innovation 2023, the company announced Meteor Lake would launch in the form of the upcoming Ultra Core processor lineup on 14th December. Meanwhile, let's dive in and get cosy with what Meteor Lake architecture offers.

Embracing chiplets (Tile-based) architecture in a new way

(Source: Intel)

(Source: Intel)

Processors utilising multiple chips have been around for a long time now, but they usually involve either connecting multiple monolithic CPU dies on one package or a CPU talking to the platform chipset and on-package high-speed memory in one package.

However, the way Meteor Lake embraces Tiles is a brand new way, thanks to Foveros' die-stacking technology advanced packaging (which we’ve discussed in great detail in our Intel Malaysia chip assembly and packaging tour). Foveros allows Intel to marry different processor blocks, each manufactured on an ideal silicon process technology that’s best suited for the cost/performance optimisation fit for the target market of the processors. This improves efficiency in churning out various processor SKUs quickly, meaning greater customisation and time to market.

(Source: Intel)

(Source: Intel)

And thanks to the high bandwidth, low latency links Foveros allows between the silicon base interposer and the tiles it connects with above, Intel doesn’t have to design complex monolithic core processor dies any longer. In fact, the brand-new Meteor Lake takes advantage of these benefits in a big way by deploying the following tiles that make up the processor:-

  • Compute Tile comprising of new P-core and E-cores (manufactured on the new Intel 4 process node)
  • SoC Tile (manufactured on TSMC’s N6 process node)
  • Graphics Tile (manufactured on TSMC’s N5 process node)
  • I/O Tile (manufactured on TSMC’s N6 process node)

     

Architected for power optimisation

The debut of a new Low Power (LP) E-core in addition to the existing E and P-cores, as well as parking the new LP E-core within a power-optimized, all-in-one SoC tile, is a big move as part of the overall re-thinking of how Meteor Lake should be optimised for power efficiency for performance-per-watt output. (Image source: Intel)

The debut of a new Low Power (LP) E-core in addition to the existing E and P-cores, as well as parking the new LP E-core within a power-optimized, all-in-one SoC tile, is a big move as part of the overall re-thinking of how Meteor Lake should be optimised for power efficiency for performance-per-watt output. (Image source: Intel)

The new disaggregated architecture not only allowed Intel to optimise the right dies to come together, but it also allowed them to redesign the blocks and interdependencies to embrace a modular and scalable power management architecture over an optimised and scalable fabric to deliver improved bandwidth and efficiency.

A typical processor block diagram prior to Meteor Lake. (Source: Intel)

A typical processor block diagram prior to Meteor Lake. (Source: Intel)

One of the primary considerations was the repartitioning of compute-intensive processes for power optimisation. Intel studied that in their existing processors, the media processing block was inside the graphics complex attached next to the compute complex. This meant that with any video decode, encode or transcode request, many high-performance blocks and interconnects are awakened and thus consume too much power for a simple task that occurs pretty often in current everyday use case scenarios.

The new processor schema with Meteor Lake. (Image source: Intel)

The new processor schema with Meteor Lake. (Image source: Intel)

To overcome the power inefficiencies of the past, Intel moved around and rearchitected several traditional blocks:-

  • Introduced a new SoC core (or tile) to incorporate several standard blocks essential for a system's operation without involving the heavy compute blocks. It is optimised for power efficiency.

     
    At its heart is a new dual-core Low Power (LP) E-core is at the heart of the SoC to handle any tasks that come its way; when it determines the process requires more horsepower, it hands the request over to the dedicated Compute tile. This is an evolution of the Hybrid Architecture that first debuted with Alder Lake and adds another stage: LP E-core è E-core è P-core. This is also marketed as the 3D Hybrid Architecture.

     
    It boasts a media engine carved out of the graphics core to manage all video workloads in the low-power SoC tile.

     
    Next, the display engine was plucked from the CPU die and is housed within the SoC tile.

     
    The memory controller, too, has been shifted from the CPU die and incorporated into the SoC tile to directly access system memory, which is swell when all other processor parts are in deep sleep. Still, it also has independent paths to all other processor tiles for them to tap on when suitable workloads and data transfer requests arrive.

     
  • The main graphics engine is delegated to its own tile that’s optimised for 3D performance; all the heavyweight P and E cores are bunched together in a Compute tile optimised for CPU performance, and high-speed I/O like USB4, Thunderbolt 4, and PCIe Gen 5 are in their own I/O tile to improve I/O bandwidth scaling.
The SoC Tile. (Source: Intel)

The SoC Tile. (Source: Intel)

In essence, the SoC tile is now the backbone of the Meteor Lake processor. Since it has most of the necessary processing blocks within itself, it will summon other compute and power-intensive blocks where appropriate. Depending on tasks and workloads, low throughput workloads can be offloaded to the new low-power E-core to shut off the main compute complex island, which results in massive power savings. This level of control comes at a minor penalty as more localised power management controllers are embedded on all the tiles to support this new hierarchy and a scalable fabric to bind them all together.

Sample of processor utilisation when the LP E cores are able to handle the workload. (Source: Intel)

Sample of processor utilisation when the LP E cores are able to handle the workload. (Source: Intel)

During the Meteor Lake tech day event, Intel demoed an engineering laptop using a Meteor Lake processor that had been running a video for several hours; through monitoring the task manager, only Low Power E-core was being used. This shows their implementation works and thus delivers a more power-efficient laptop.

Deliver AI at scale

The new integrated NPU to tackle AI acceleration at the edge. (Source: Intel)

The new integrated NPU to tackle AI acceleration at the edge. (Source: Intel)

While Intel has dedicated add-on neural processing units (NPUs) to help tackle AI workloads such as a Movidius vision processor, Meteor Lake is the first from Intel to incorporate an NPU within the processor. Since Meteor Lake processors will be deployed on all new laptops and desktops in 2024, it is the first processor class to help broaden AI adoption at scale without requiring an expensive add-on.

To be accurate, the first such client processor in the world to incorporate an NPU is AMD’s Ryzen 7040 series, launched early this year, but it’s a fact that Intel’s reach and buyer base will ensure developers make more use of the hardware within to bring about accelerated adoption of AI for everyday consumers. For example, noise removal in videos, background removal to elevate video collaboration, enhanced audio effects, and elevated effects for creators and gamers. These might all sound familiar, but you need an expensive GeForce RTX graphics card to enable some of these functions. Intel hopes its Meteor Lake processors can deliver this aspect without the added cost. When the technology matures, Intel thinks one can expect more productivity-oriented uplifts, more capable AI assistants and more.

Regarding hardware implementation, Intel designed the NPU as a high-performance, low-power AI processor. To that extent, it’s built within the main SoC Tile of the processor to keep power utilisation in check, and Intel says it’s 8x more efficient than running a workload only via CPU. That said, their approach to AI is to take advantage of all three compute components and tackle differing AI workloads at the GPU, NPU and even the CPU. If you recall, two generations ago on Alder Lake, Intel incorporated a matrix engine to turbocharge matrix multiplications often present in AI workloads, allowing CPUs to uplift AI acceleration. This is why the CPU is still a viable option as an AI accelerator, but not all workloads bode well for it.  We’ll leave Intel’s slide here for reference on their view of where different AI tasks work best:-

AI everywhere within the processor - at the GPU, NPU and the CPU. (Source: Intel)

AI everywhere within the processor - at the GPU, NPU and the CPU. (Source: Intel)

The next component is Intel’s AI software stack, and their OpenVINO inference engine API is the glue that helps interpret the API calls from the software and call up the necessary compilers, drivers, and hardware to mix and match (CPU, GPU and NPU) for the task at hand. The choice to utilise the suitable AI cores currently still belongs to the programmer, but the call to address one or more types of AI processors is a matter of assigning as required. Essentially, it’s a write-once code that will work on any compatible hardware and thus is the silent enabler of new experiences. It’s still early days, but it won’t be long before more applications have hooks to utilise Intel’s AI-friendly hardware.

(Source: Intel)

(Source: Intel)

While the outcome of running different AI processors will vary with different tasks and workloads, Intel was eager to showcase the flexibility of Meteor Lake and had this particular workload run on Stable Diffusion (the popular text-to-image generation large language model, also often known as Generative AI) through only CPU, only GPU, only NPU and a combination of using the GPU and NPU to illustrate time taken, power consumed and the relative efficacy:-

AI performance preview on Meteor Lake. (Source: Intel)

AI performance preview on Meteor Lake. (Source: Intel)

 A takeaway from the results that many of us would have also predicted is that the CPU was the slowest at getting the task done, while the GPU was the fastest at this. The NPU had the least power draw with a reasonable time to completion, thus scoring it the highest in efficacy. Splitting the task to both the GPU and NPU allowed this combination to complete the job even faster than the GPU alone, but the efficiency took a hit. Despite that, it’s an excellent showcase to understand why Intel’s implementation is something to look forward to.

Graphics performance leap with new Xe LPG GPU core

(Source: Intel)

(Source: Intel)

For a while now, the integrated graphics engines haven’t pushed the needle and have been variations with minor improvements over the last couple of generations. With Meteor Lake, Intel wants to scale the performance per watt leap with a 2x improvement. They did this by leveraging on some aspects of their Xe HPG lineup for discrete graphics (Intel Arc graphics), such as incorporating their new Xe cores and ray tracing cores, while also widening up the GPU configuration roughly 30% more than their prior Xe LP graphics (also more commonly known as Iris Xe graphics) with larger backend, samplers and pushing up the number of vector engines from 96 to 128 on the Xe LPG GPU graphics tile incorporated on Meteor Lake. 

Xe LPG graphics in a snapshot.(Source: Intel)

Xe LPG graphics in a snapshot.(Source: Intel)

It's still a far cry from matching an Intel Arc A750, which is probably why Intel also confirmed to us that they don’t have plans to boost the Xe LPG’s capabilities from vendors pairing an Arc discrete GPU, via Intel Deep Link or otherwise as they are far too dissimilar in capabilities to be utilised positively together.

Interestingly, Xe LPG can run at far higher clock speeds (well past 2GHz, much like the Intel Arc) than the Xe LP (ranges between 1.3 to 1.6GHz) at the same power profile or at much lower power for the same frequency. This is paramount in their overall claims of delivering twice the performance per watt outcome.

Higher performance and higher efficiency, you get both aspects with the new Xe LPG graphics engine. (Source: Intel)

Higher performance and higher efficiency, you get both aspects with the new Xe LPG graphics engine. (Source: Intel)

Some much-vaunted features that are inherited from Intel’s Arc series GPUs are Intel XeSS (AI-based upscaling of performance by rendering at lower resolution), DirectX 12 Ultimate optimised, ray tracing for anyone, AV1 codec support for 8K encoding and sharing the same software stack so that whatever optimisations made to the drivers will apply across Intel’s Xe graphics portfolio such as the recent massive DX9 driver optimisation.

Intel couldn’t share much using actual performance numbers other than the below slide sharing technical throughput in specific tasks such as pixel blending and depth testing. Still, it does look promising against the previous generation IGP engine. However, they did demo a Meteor Lake-based notebook running Forza Motorsport at the Intel Tech Day, and we noted it to run buttery smooth. Let’s hope Meteor Lake, in its finished form, from system vendors will upkeep what we’ve seen briefly and push through a new era of mainstream graphics.

Xe LPG performance hints. (Source: Intel)

Xe LPG performance hints. (Source: Intel)

In terms of pure display output support and capabilities, you get HDMI 2.1 and Display Port 2.1 (20G) support and eDP 1.4. You can drive one 8K60 HDR screen, four 4K60 HDR screens, or up to 1440p at 360Hz refresh.

Lastly, for Meteor Lake gamers on laptops, Intel is deploying a new software option called Endurance Gaming to get the graphics tile talking to the power management controller for power-optimized gaming by limiting graphics power consumption to just 10 watts to extend your gaming session.

Endurance Gaming mode = Max Battery Life option for gamers (Source: Intel)

Endurance Gaming mode = Max Battery Life option for gamers (Source: Intel)

Putting it altogether

We’ve gone over several crucial elements that make up the Meteor Lake processor project, such as the impetus, its new disaggregated architecture, the tiles that make up the singulated processor, how it was redesigned for power efficiency, the new NPU and GPU, both of which are certain to elevate the everyday capabilities of laptops as we know today.

Meteor Lake is a complex jigsaw puzzle brought to life by Intel's R&D, manufacturing, assembly and testing factories. (Image source: Intel)

Meteor Lake is a complex jigsaw puzzle brought to life by Intel's R&D, manufacturing, assembly and testing factories. (Image source: Intel)

There are other aspects to the Meteor Lake story, such as the Intel 4 process and the improved E-core and P-cores, but these only go into more technicalities, none of which are quite as interesting as the areas we’ve focussed on giving more weightage in this Meteor Lake overview. And it’s because Meteor Lake has leapfrogged so much from past processor offerings that many more impactful areas needed the spotlight.

As seen in the slide shared above, Meteor Lake is made up of various silicon tiles that all come together from wafer fabs worldwide, and they get assembled in Intel Penang, as we saw during Intel’s Tech Tour in Malaysia. You can find out more hands-on information about what happens at their factories in our article, but here are a few extra takeaways that stress Intel’s increased focus on testing requirements due to the more complicated processor packages and is a core focus that Intel wanted to impress upon the invited media to their facilities:-

  • As Meteor Lake is made up of several dies coming together, Intel has more stringent testing in place. Instead of relying only on the initial wafer sort testing, singulated die testing also probes the native microbumps on each die. This Advance Sort process improves the usable die yield rate to 99% since each die gets tested before passing through the assembly process later.

     
  • Due to the Advanced Sort process, the overall post-assembly test yields shoot up from 86% to 97%., improving efficiency for Intel and its customers.

     
  • Beyond the assembly, Intel also performs detailed functional, stress and burn-in tests to maintain high-quality assurance for its various business customers. Even the testing equipment is made in-house and not sourced externally.

And that wraps up all you need to know about Meteor Lake processors. As you can guess by now, Intel is focusing on Meteor Lake to roll out first on laptops, though it won’t be long before desktop systems follow suit. Look out, AMD Ryzen! The Meteor is striking soon in the form of the Intel Core Ultra processor lineup, launching 14th December 2023.

(Source: Intel)

(Source: Intel)

Read Next:

1) We visited the factory that assembles the next-gen Meteor Lake processors

2)
Intel ushers in the age of the AI PC

3)
These are all the new Intel Core Ultra processors launched

4)
New Intelligent Display Tech coming to your laptop soon to save 24% battery life

5) 
Intel is betting on glass substrates for future high-performance chips

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.

Share this article