Intel details its new mesh architecture for Skylake-X and Xeon Scalable processors
Intel has released a blog post detailing the new on-chip mesh architecture that debuts on its Xeon Scalable Processor platform, and by implication, its upcoming Skylake-X chips, both of which are based on the same Skylake-SP microarchitecture.
The post, penned by Intel’s Skylake-SP CPU architect Akhilesh Kumar, delves into the challenges around building data center processors that are able to effectively balance performance, efficiency, and scalability.
The task of scaling up is not as simple as just adding more cores and interconnecting them to create a multi-core data center chip. The interconnects between CPU cores, memory hierarchy, and I/O subsystems are equally critical, as they function as the highways through which data can flow smoothly.
Furthermore, these interconnects need to fulfil myriad criteria:
- Increase bandwidth between cores, on-chip cache hierarchy, memory controller, and I/O controller to avoid being a bottleneck that limits system efficiency
- Reduce latency when accessing data from on-chip cache, main memory or other cores. This is also dependent on the distance between chip entities, the path taken to send requests and responses, and the interconnect’s speed
- Improve energy efficiency in supplying data to cores and I/O from the on-chip cache and memory. As more cores are added, bandwidth requirements go up and data has to travel over longer distances, so the amount of energy required increases.
Intel says the mesh architecture for its Xeon Scalable processors checks all the above boxes, representing a shift away from the ring bus architecture that it’s used since the Nehalem-EX Xeon processors.
The mesh architecture first debuted on Intel’s Xeon Phi Knights Landing products, but this is the first time it is making its way to more mainstream server parts and high-end desktop (HEDT) chips.
The ring bus architecture followed a bi-directional, sequential operation that cycled through different stops located on components such as the memory controllers and CPU cores and caches. To expand it, one simply added more stops, but this meant that it became quite unwieldy – even comprising multiple rings – as core counts and CPU complexity grew over the generations.
In comparison, the mesh approach organizes the cores, on-chip cache banks, and memory and I/O controllers in rows and columns, with wires and switches connecting them at each intersection to allow for turns.
This provides a more direct path than the ring architecture, and offers multiple pathways for data, thus eliminating bottlenecks.
In this way, the mesh can operate at a lower frequency and voltage while still offering very high bandwidth and low latency, resulting in improved performance and better power efficiency.
Finally, Intel is also implementing a “modular architecture with scalable resources for accessing on-chip cache, memory, I/O, and remote CPUs”. These resources are distributed throughout the chip to further reduce subsystem resource constraints or chip “hot-spots”, which enables them to scale more effectively with the number of processor cores.