Intel Xeon 5130 and 5160 (2-way SMP) Performance Review
The highly successful Core microarchitecture infiltrated desktops and laptops, but don't forget they also made their way into workstations and servers with the rejuvenated Xeon 5100 series. Codenamed Woodcrest, they aren't as expensive as perceived to be as we found out when pitting it against the Core 2 Extreme.
By Vijay Anand -
Intel's Back on Track with the Xeon 5100 Processor Series
As with the desktop side of things, 2005 had been a rough year for Intel even on the workstation and server front as the AMD Opteron repeatedly advanced through to displace Intel's best offerings in both performance and efficiency (comprising of thermal and power considerations). After all, AMD's Opteron is based on the tried and tested Athlon 64 and Athlon 64 X2 CPU architecture that we are all familiar with and past rankings have pegged them always one rung better than Intel's options. Likewise for the Xeon, they were mostly variants from the desktop side based on the NetBurst architecture that didn't really shine. In fact, the only dual-core Xeon model that Intel peddled for quite a while for 2-way SMP systems was a 2.8GHz Paxville DP that was based upon the ill-famed Smithfield core.
Just like in the mobile and desktop space, Intel's savior in this highly contested and prestigious workstation/server segment was none other than the Core microarchitecture which as we've explained in our IDF articles, are tuned differently to be deployed in all three sectors. Launched in June this year (ahead of the Core 2 series for the desktop), the Xeon 5100 series returned with a vengeance and almost immediately rivaled and even surpassed AMD at times as far as dual-processor configurations (2-way SMP) are concerned in performance, power consumption, thermal output and even price. It was a well-needed equalizer for Intel to get back in the game and prevent further market share erosion in this high-performance computing space. Here's how the Xeon 5100 series stacks up currently:-\
Processor Model / ProcessorCharacteristics | Clock Speed | L2 Cache | Front Side Bus (MHz) | Max TDP (W) | Intel VT | Intel EM64T | Demand-Based Switching (DBS) | Estimated Price (US$) |
Xeon 5160 | 3.00GHz | 4MB | 1333 | 80 | Yes | Yes | Yes | $865 |
Xeon 5150 | 2.66GHz | 4MB | 1333 | 65 | Yes | Yes | Yes | $715 |
Xeon 5148 LV | 2.33GHz | 4MB | 1333 | 40 | Yes | Yes | Yes | N.A. |
Xeon 5140 | 2.33GHz | 4MB | 1333 | 65 | Yes | Yes | Yes | $480 |
Xeon 5130 | 2.00GHz | 4MB | 1333 | 65 | Yes | Yes | No | $335 |
Xeon 5120 | 1.87GHz | 4MB | 1066 | 65 | Yes | Yes | No | $280 |
Xeon 5110 | 1.60GHz | 4MB | 1066 | 65 | Yes | Yes | No | $230 |
Take note that the Xeon processor 5100 series (Woodcrest core) shown here greatly differs from the similarly numbered 5000 series (Dempsey core). Woodcrest is basically what Conroe is to the desktop segment but with SMP capability and a 1333MHz PSB, while Dempsey is similar to the Presler core (used in the Pentium D today) with SMP capability and a 1066MHz PSB. Given the massive comparisons we've shown you between the Core microarchitecture used in Core 2 processors versus the Netburst microarchitecture in the Pentium D processors, the difference between them is as clear as day and night. Thus you can see why the Xeon 5100 is a crucial pawn in Intel's lineup to move forward.
From the Xeon processor 5100 series stack, you can see that there's a wide variety of processor configurations available to meet funding quota and/or performance needs. These processors are best designed to be used in dual processor (DP) platforms right from the start to give you give multi-threading support of two threads per socket or up to four when both processor sockets are populated on the DP platform. There's even one SKU that's ideal for high-density computing needs via the low voltage 40W TDP part. All have Virtualization Technology, true 64-bit processing, 64-bit memory addressing capability among other well-known features such as supporting Execute Disable Bit and more. Demand-Based Switching (DBS) however, isn't available on all processors. For those who aren't too familiar with this feature, it's based upon Intel's SpeedStep technology that stemmed from mobile processors for throttling down processor frequencies when idle or at low loads (thus saving energy). DBS is just the same thing, but more tuned for monitoring server loading levels to make decisions on clocking down the frequencies as well as turning off multiple CPU cores/threads. Essentially, 'server-class' SpeedStep. The principle behind frequency under-clocking is manipulating the CPU's multiplier and since the low-end Xeons are already using the absolute lowest values possible, DBS is unavailable to these.
The Xeon processor 5100 series is Intel's answer to get back in the workstation/server space and compete with AMD's offerings more convincingly.
These Xeon processors however are just one part of the comeback equation, which Intel needed to fend off competition. However, the main proponent building the foundation for all of these server class processors and even those about to be launched is the new generation workstation/server class Northbridge chip, the Blackford Memory Controller Hub (MCH). Together, the Xeon 5xxx class processors and fresh motherboard level technologies made possible by the Blackford MCH, these form Intel's new generation dual processor server platform codenamed Bensley. More on the offerings that Blackford brings to the table, chipset variety and targeted segments are detailed on the next page.
The Bensley Platform and Intel 5000 MCH (Blackford)
Prior to May 2006's Bensley platform launch, Intel's E7520 (Lindenhurst) platform held up their dual processor platform for more than 1.5 years. Albeit it stayed on longer than anticipated, the Bensley platform launch was closely tied with introduction of the Xeon 5000 series (Dempsey) processors to kick start the new era in proper order. The Intel 5000 series chipset (Blackford) of the Bensley platform had a lot going for it, including the fact that it's designed to support not only the then launched Intel Xeon 5000 processors, but also the following month's launch of the Intel Xeon 5100 processors (Woodcrest) and the soon to be announced quad-core Xeon 5300 processor (Clovertown). To make it even more palatable, all of these processors will use the same packaging and pin-out to ensure compatibility with the LGA771 socket and allow direct drop in upgrades along the way (as long as the board's BIOS is qualified to support them).
The Bensley platform block diagram, courtesy of Intel. Note that not all 5000 series chipsets posses all of the features shown above. Only the 5000X has a snoop filter integrated in its die while only the 5000X and 5000P feature 4-channel FB-DIMM memory interface and all of them offer various PCIe link configurations, which we've tabulated later in the article.
To be well prepared to feed current and upcoming processors' heavy compute intensive tasks, the Intel 5000 series chipset is also the first DP platform to feature dual independent buses (DIB) to the processors. Instead of the shared bus topology of past chipsets where switching occurred to serve each processor (and further switching occurred if it was a dual-core processor), the new Blackford chipset has given each processor a full speed FSB to communicate simultaneously and frequencies have been boosted from a paltry 800MHz to support up to 1333MHz. All these 'upgrades' give the Bensley platform a peak FSB bandwidth of 21GB/s versus just 6.4GB/s on the Lindenhurst. Thus even the average sustained FSB bandwidth is far more than the peak FSB bandwidth of the old E7520. While all this bandwidth is not really necessary for the current Xeon 5100 processor series, the upcoming quad-core versions would definitely find that handy.
You can make a good guess that Intel 5000 series chipsets are pretty huge and complex and indeed they are at 52 million transistors (130nm process technology) and 1432 'pin-outs'. One version, the 5000X (codenamed Greencreek), is targeted for workstations and incorporates a 16MB snoop filter whose goal is to boost the efficiency of the dual independents busses to the processors. Used as a form of cache state information storage, it also works hand-in-hand with the Coherency Engine that's in the backend of the chipset closely monitoring and orchestrating events ands transactions within the platform. Obviously, the Intel 5000X chipset is larger than its other peers cramming 65 million transistors and that's a higher transistor count than the early Pentium 4 processors could boast!
The Intel 5000 series MCH chip package. Its core contains more transistors than an old Pentium 4 processor - mind boggling indeed!
FB-DIMM Memory Architecture & MCH Variants
The next drastic change is the chipset's memory support infrastructure. In the existing DDR and DDR2 memory topology, data lines from the memory controller have to be connected to those of every memory module. The current parallel channel interface requires up to 240 pins and a dual-channel interface doubles that count. To complicate matters, we require high frequency operation on this parallel channel interface that has effectively reduced the number of memory modules that it can sustain operating at high frequencies. This is the reason why you'll see that servers often offer many DIMM slots for massive memory capacity but they operate at far lower frequencies than the desktop counterpart can on the same memory technology. Thus it has been difficult to maintain high bandwidth throughput and capacity as either of them had to be compromised to prevent spiraling board design complexity and costs. FB-DIMM technology supported on the new Intel 5000 chipset series and henceforth was designed to tackle these very issues to grow both capacity and bandwidth without the pitfalls encountered previously.
FB-DIMM memory architecture; diagram courtesy of Intel.
Instead of the direct signaling interface between the memory controller and the memory devices, FB-DIMM technology splits this into two separate and independent signaling interfaces with a buffer between them. This is where FB-DIMM technology gets its designation where FB equates to "Fully Buffered". This buffer sits on each FB-DIMM module (thus a middleman between the memory controller and the actual memory devices on the memory module) and is officially known as the advanced memory buffer or AMB in short. The memory devices on the module itself are standard commodity DDR2 chips. Using FB-DIMM technology, the interface between the memory controller and the FB-DIMM slots is a narrow high-speed 'serial like' point-to-point interface, replacing the standard parallel interface of the traditional memory hub topology. This directly reduces the MCH pin count required (only 69 per channel) and drastically reduces board layout complexity. The already narrow interface is further split into two tracks and each operates in one direction. The first track for memory writes and commands uses 10 wire pairs operating in the direction from the memory controller to the memory slots. The other track for memory reads uses 14 wire pairs operating in the opposite direction, from the memory slots to the memory controller. This implementation immediately gives rise to simultaneous read/write access within the same memory channel (but on different DIMMs) and that's unlike existing DDR / DDR2 memory technologies that can only perform read or write at any one access. Since the FB-DIMM interface is now a narrow serialized high-speed link (operating six times the DDR clock), this technology now supports up to eight memory modules per channel and a total of 6 channels.
The AMB on the memory module is designed to deliver memory commands from the chipset's memory controller over the FB-DIMM interface without any alteration to the DDR2 memory devices on the module using a traditional parallel interface. Any signal deterioration that may occur along the way, the AMB compensates this through buffering and resending the signal. Besides buffering of memory traffic for large memory capacities on each module, the AMB also forwards requests via retransmissions to the other DIMMs on the same memory channel. Since the AMB completely buffers the memory devices on the module from the module connector interfacing the slot, FB-DIMM technology can easily evolve to even embrace next generation DDR3 memory technology in the same fashion.
Despite all these, FB-DIMM technology is not an all win-win solution. With all the buffering taking place, it does introduce a small latency penalty for memory access and the addition of the AMB adds to the power consumption per module. Still, the direct benefits offered by FB-DIMM memory technology outweigh these cons to a certain extent in the server space since the processor's TDP ratings have been brought down and memory capacity can be increased effortlessly while maintaining high memory bandwidth. Cost fortunately has only increased marginally even with the AMB unit and its cooling requirements; thus this aspect has not been much of a consideration.
On the Intel 5000P and 5000X chipsets, two memory controllers support four memory channels (split into two branches) for a total of 16 DIMMs. They are split in two branches because the chipset supports memory mirroring, or otherwise known as RAID-1 for redundancy, but it can operate in non-RAID mode for maximum memory capacity utilization. The controller also supports other RAS (Reliability, Availability and Serviceability) features such as ECC, Scrubbing and memory sparing among others.
FB-DIMM is certainly a major highlight of this new platform, thus our major focus upon it. Another technology introduced in this chipset series is Intel's I/O Acceleration Technology (I/O AT in short) enabling efficient data movement for network I/O. The chipset features an on-die DMA unit and helps to offload the CPU in terms of memory copy requests. In fact I/O AT extends all through the system platform with the network stack optimized for Intel's processors, the MCH aiding data movement and optimizations at the MAC level of the LAN hardware. Chiefly I/O AT is targeted to benefit high network access environments to reduce CPU overhead and increase throughput. Finally, PCI Express lane configuration options have increased in both number of links and throughput and here's where the 5000 series chipsets differ most besides memory support. Here's a table to differentiate the variety:-
NB Chipset Model / Features | Target Market | No. of memory channels / FB-DIMM DDR2-667 modules supported | Memory Mirroring (RAID 1) | PCI Express port configuration | X16 Graphics (PEG) port | Snoop Filter | System Management Bus (SMBus) Interfaces |
Intel 5000X | Workstation (performance / volume) | 4 / 16 | Yes | 1 x PCIe x8
OR
2 x PCIe x4 | Yes | Yes, on-die (16MB) | 6 |
Intel 5000P | Server (performance / volume) | 4 / 16 | Yes | 3 x PCIe x8
OR
6 x PCIe x4 | -- | -- | 6 |
Intel 5000Z | Server (value) | 2 / 8 | -- | 2 x PCIe x8
OR
4 x PCIe x4 | -- | -- | 4 |
Intel 5000V | Server (value) | 2 / 8 | -- | 1 x PCIe x8
OR
2 x PCIe x4 | -- | -- | 4 |
A Xeon for Personal Use?
Of keen interest in our timeframe to review the Intel Xeon 5100 series processors now is to find out how viable is the platform for personal use as an alternative to mainstream consumer parts (and an ideal time for a performance recap, setting the expectations for the upcoming quad-core processors). Obviously people who fit this profile are not gamers at heart but serious engineering professionals and the likes. Gaming to some of them is of course a minor pastime, but they aren't critical about it. However, they won't stand for any downtime and require a reliable, robust powerful platform to meet their needs. A platform with dual Xeon 5130 processors in particular looks tempting as it might just cost less than a consumer platform using the Core 2 Extreme X6800 uni-processor, but is able to achieve as good or better number crunching performance. Remember, your investment in the Bensley platform goes much further as it's upgradeable to support quad-core Xeon processors. With two such future processors, you can have a platform with eight CPU cores! Lets check the base platform costs for both the Xeon and Core 2 systems which encompasses the motherboard, processor(s) and the memory:-
Estimated Cost | High-end Consumer | Platforms / Item | Workstation | Estimated Cost |
US$989 | Intel Core 2 Extreme X6800 | Processor | Intel Xeon 5130 (1 pair) | 2 x US$335 |
US$180 - US$250 | Full-featured Intel P965 / 975X motherboard | Motherboard | Intel / Tyan / Gigabyte Intel 5000V chipset motherboard | US$350 - US$400 |
US$125 - US$170 | 1GB DDR2-800 | Memory | 1GB FBDIMM DDR2-533 | US$150 - US$200 |
~ US$1409 | (Maximum Cost) | Grand Total | (Maximum Cost) | ~ US$1270 |
There you have it - a Xeon 2-way SMP platform with dual dual-core processors at US$1270 that can possibly out-crunch a Core 2 Extreme uni-processor platform at US$1400+. Just one catch though; the workstation class board chosen in this theoretical comparison is based on the Intel 5000V, the lowest cost variant with no PCIe X16 graphics slot. Thus if one wants to consider an all-encompassing system, a 5000X class board is a must and that will add another US$250 at least. That brings the workstation configuration's price tag at US$1520 - just a measly hundred dollars differential from the Core 2 Extreme platform's cost. So how would such a system fare? Read on.
Test Setup & Benchmarks
Lets get down to the testbed setup used before we roll out the benchmark results as per normal. The two Xeon platforms we received for testing came in slightly different server boxes and unfortunately we didn't have time to standardize both of them completely. Still, we believe the comparisons we have gathered are sufficiently comparable with one another as you can see that it's not all too different. The dual Xeon 5130 system came in a 1U server and with Windows 2003 Server operating system; together, these factors hampered us from running several benchmarks. Another set had a pair of Xeon 5160 processors in a workstation box using the 5000X platform running Windows XP Professional (SP2). Since that would accept a standard graphics card, we give it a good run-in, though its results only serve as benchmark indicator for the upcoming quad-core processors to beat.
Intel Core 2 Extreme Configuration
- Intel Desktop Board D975XBX (Intel 975X Express chipset)
- Intel Core 2 Extreme X6800
- 2 x 512MB Corsair XMS DDR2-800 non-ECC memory modules (CAS 4. 4-4-12)
- Seagate Barracuda 7200.7 80GB SATA hard disk drive (one single NTFS partition)
- MSI GeForce 7900 GT 256MB - with NVIDIA Detonator XP 84.21
- Microsoft Windows XP Professional with Service Pack 2
Intel Xeon 5130 (dual) Configuration
- Intel Enterprise Server S5000PAL (Intel 5000P chipset)
- 2 x Intel Xeon 5130 processors
- 2 x 1GB FB-DIMMM DDR2-533 memory modules
- Seagate Barracuda 7200.7 80GB SATA hard disk drive (one single NTFS partition)
- ATI ES1000 on-board graphics - ATI driver version 6.14.10.6553
- Microsoft Windows 2003 Server
Intel Xeon 5160 (dual) Configuration
- Intel Enterprise Server S5000XVN (Intel 5000X chipset)
- 2 x Intel Xeon 5160 processors
- 2 x 1GB FB-DIMMM DDR2-533 memory modules
- Seagate Barracuda 7200.7 80GB SATA hard disk drive (one single NTFS partition)
- MSI GeForce 7900 GT 256MB - with NVIDIA Detonator XP 84.21
- Microsoft Windows XP Professional with Service Pack 2
Benchmarks Used
The benchmarks chosen are not ideal towards true workstation and platform use, but they should be sufficient to indicate performance based on the user group we are targeting this article:-
- SPECCPU 2000 v1.3
- Futuremark PCMark 2005 Pro
- BAPCo SYSmark 2004
- Lightwave 3D 7.5
- Cinebench 2003
- XMpeg 5.03 (DivX 6.2.5 encoding)
- Futuremark 3DMark06 Pro
- Unreal Tournament 2004
- AquaMark3
- Quake 4 ver.1.20
- F.E.A.R.
Here's the main CPU-Z screenshot for the Xeon 5130 platform. We had this system much earlier thus we only had the older version of this tool to get this shot. Full load clock speed is 2.0GHz per core.
The Xeon 5160 was a much more recent system and we had the latest CPU-Z version for proper CPU information gathering. At load, it operates at 3GHz per core at 1.217V. However at idle, it steps back to 2.0GHz at a voltage of 1.075V. Yup, DBS is in action
Results - SPEC CPU2000 v1.3
SPEC CPU2000 version 1.3 consists of two benchmark suites for measuring compute-intensive integer performance and floating-point performance. Considering our time with this benchmark for the last few years and the given specs of the Xeon 5160 dual, it would seem that this processor would have no problems taking the lead by a huge margin. It did churn out an impressive scores for the integer test portion, but floating-point results were disappointing for 1-user and 2-user tests. Pushing the processors to contend with a 4-user workload environment did yield better results but not by very much. The Xeon 5130 dual processor pair managed to do what it's supposed to do and garnered a small lead at the 4-user load simulation, but trailed the Core 2 Extreme X6800 at lesser load levels as expected.
Results - Futuremark PCMark05
Next, we have some lighter set synthetic tests. Futuremark's PCMark05 contains several small synthetic subsystem tests that are generally capable of pinpointing the system's capabilities in brief. In the CPU test suite we see reasonably expected performance standings of both Xeon dual processor configurations against the consumer class platforms. However, the memory test suite retuned rather bizarre results and we can't be sure of this outcome till we obtain more modern tools to analyze that aspect. A brief check with the old Cachemem utility revealed scores wildly in favor of the Xeon dual processor platform, which is quite the opposite of what PCMark returned.
Results - SYSmark 2004
Definitely not anywhere a workstation nor a multi-core CPU test, SYSMark 2004 hardly scales with more powerful platforms as it can't take advantage of more than two processing cores. In fact, it hardly uses more than a single core, but the second core just helps ease it through some of its tasks. As such, quad-core setups such as the Xeons seem like a waste to perform general productivity tasks and we couldn't agree more. In any case, this is a good case study. Take note that the Core 2 Extreme system had AHCI drivers installed while the Xeon platforms had to make do without it. As such, scores in SYSmark showed up lower than expected for a dual Xeon 5160 setup (besides the Internet Content Creation Suite). We couldn't obtain Xeon 5130 scores as the benchmark won't run in Windows 2003 Server operating system.
Results - Lightwave 3D 7.5
Highly threaded applications like Lightwave are excellent reasons to go quad-core. The dual Xeon 5130 platform managed to outperform the Core 2 Extreme X6800 by a tiny margin, albeit only when we really taxed the processor with eight threads. The Xeon 5160 pair soundly made its prowess known garnering the shortest render times by a notable margin of difference.
Results - Cinebench 2003
Another popular render based benchmark that's highly threaded, the performance standings are quite similar to what we've seen in Lightwave benchmarking and the results are pleasing.
Results - XMpeg 5.3 (DivX 6.2.5 encoding)
In our DivX movie encoding tests, while the Xeon 5160 pair managed a small lead ahead of the Core 2 Extreme X6800 platform, clearly the test wasn't really benefiting from the quad processing cores available. Perhaps it's the way that data flows were managed by the chipset because our internal testing of a native quad-core processor reflects much better scores. The Xeon 5130 pair had to be excluded once more due to complications we met with Windows 2003 Server operating system.
Results - Gaming
In our gaming tests, the dual Xeon 5130 platform had to be excluded from comparison since we can't install an equivalent graphics card on the platform given. Even so, the results from the dual Xeon 5160 platform that's able to support an add-in graphics card didn't look promising for quad-core processors in general. Additionally, the platform itself isn't optimized for gaming data flows as the mighty 3GHz Xeon pair stumbled against the lesser spec'd Core 2 Extreme uni-processor platform and was on par with a Core 2 Duo E6600 in general. Even Quake 4 that's supposed to be one of the rare few SMP capable games couldn't scale properly with four processing cores (and this was true even for the Kentsfield quad-core processor). For other game tests, we obtained sub par or weird results, which isn't the fault of the game but it's just the state of things now.
Results - Futuremark 3DMark06
3DMark06 was fortunately able to scale based on processing core(s) availability and the CPU-based rending scores really showed us how a well-threaded application can give rise to great performance leaps. Back to 3DMark06's usual graphics performance scores, there was a small improvement, but it's not of a real concern as that directly scales with the power of the graphics subsystem. Still, the small boost is a nod towards the influence of a better overall platform and CPU.
Closing Remarks
To answer the initial question if one should consider a DP Xeon platform versus the top of the line consumer uni-processor for the same cost envelope, our results have shown that there are ups and downs to adopting one, but if your applications are highly threaded and you wish to have even further scalability later on the same platform to eight processing cores, the dual Xeon processor combo has shown that it's a strong contender. The primary reason that would make or break the decision lies with the kind of applications used as those that aren't multi-threading friendly, will face a penalty. Of course we've yet to delve in to true engineering or workstation workloads to paint the ideal picture, but we will work upon that in time to come. However, we used more commonly known applications to find out the state of quad-core processing influence and we hope that with this article, we've shown you what to expect from upcoming launch of such processors in the consumer space by both Intel and AMD. Also from testing the highest spec'd Xeon DP pair, the 5160, we have also established the performance expectations of the forthcoming quad-core processors. The CPU space has never been as exciting as it has been in recent times with multitude of advancements in a very short span of time, so stay tuned to our reviews channel as we step up to the next evolutionary step in processing coming shortly.
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.