Intel Core 2 Extreme QX9650 - Entering 45nm
Intel has been preaching 45nm for a year and today, the first 45nm desktop processor has finally arrived. We take the latest and greatest Intel Core 2 Extreme QX9650 for a spin in our labs and we were plenty impressed with its new capabilities. Read all about it right here.
By Zachary Chan -
Tick-tock, Tick-tock, Intel Strikes Again
While we mere humans have to conform to our natural biological clocks, microprocessor giant Intel - the Chipzilla - follows Silicon Cadence (also known as tick-tock), a strategic time line designed to introduce new microprocessor technology and manufacturing processes approximately every two years in a continuous effort to maintain Moore's Law. And according to the powers that be, the time has come once again for a new tick to last year's tock.
What we're talking about is the official launching of Penryn today, the successor to 2006's Conroe processor based on the revolutionary Core microarchitecture. Penryn is the codename for Intel's first 45nm processor family and in the coming months, should replace the current Conroe based Core 2 processors in all market segments as Intel's flagship product. In case you haven't been following our coverage of Penryn and Intel's 45nm process technology, we recommend that you first check out our previous preview articles titled " " and " ". These two articles should get you acquainted with Intel's new 45nm High-k Metal Gate process technology as well as some teasers into early Penryn results.
Intel has everything planned out using Silicon Cadence.
One important fact to note is that Intel has been continuing its trend of microprocessor development based from mobile technologies - which is where the Core microarchitecture had its roots. So to clear things up a little, Penryn is commonly used as a blanket codename for the new 45nm processor family, but it is also the codename for the mobile CPU. In the desktop space, the dual-core and quad-core Penryn processors are codenamed Wolfdale and Yorkfield respectively. Since Penryn is essentially still sharing the same microarchitecture as the Conroe, retail processors will retain the Core 2 nomenclature.
What to expect from Penryn processors to come.
Process Technology Enhancements
Following Intel's tick-tock strategy, the Penryn comes into the silicon compaction/shrinking cycle. But what does this mean for users? Is Penryn just a 45nm die-shrunk Conroe? Is the upgrade worth it or should you wait till the next 'tock' cycle where the next microprocessor architecture overhaul is supposed to take place on the Nehalem core?
While not nearly as exciting as the initial release of the Intel Core microarchitecture (ala Conroe processor), or the circumstances that forced Intel into its current overdrive innovation cycle, calling the Penryn just a die-shrunk Conroe would be a grave mistake. The new 45nm core itself is a major improvement in process technology that reduces switching power and leakage, while improving switching speeds and allows Intel to cram more transistors on the die. The dual-core Penryn die size has now shrunk to a mere 107mm^2 compared to the 143mm^2 of the Conroe and 162mm^2 of the Pressler before it on 65nm. Yet, the Penryn's will boast around 410 million transistors, up from 291 million of the Conroe. A large chunk of this will be due to the increased L2 cache size of the Penryn, which now sports a shared 6MB L2 cache, up from 4MB of the Conroe.
A 45nm Penryn (or Wolfdale for the desktop) die. Put two of these together and you have a quad-core Yorkfield
The TDP envelop for Penryn hasn't changed though. Initial desktop processors will feature a 65W TDP for dual-core mainstream processors, 95W for quad-core mainstream and 130W for the Extreme editions. To top off a list of accomplishments, the Penryn can boast as a 100% Lead free processor.
Intel Core Microarchitecture Enhancements
Besides the new process technology, Penryn processors will also feature some improvements to last year's Core microarchitecture. A summary of these enhancements were covered in our Penryn performance preview article (the chart is also reproduced below). As you can see, Intel has delivered some enhancements to every aspect of the Core microarchitecture, so we'd like to focus on the key improvements and what you can probably expect from them.
Core microarchitecture enhancements. Notice that the power features are only available on mobile processors, which is not touched upon in this article.
Most of the enhancements seen offer improvements only to specific needs such as the Fast Radix-16 Divider, which will improve divide performance generally used in scientific and mathematically heavy software. There is also a beefed up the virtualization engine on the processor, which can potentially speed up virtual machine transitions up to 75%. Again, this is a usage specific improvement that will only benefit a select group of users.
The general performance increase will come from the universally larger 6MB L2 cache of course, and Intel has further improved cache and memory management as well with a 24-way associative L2 cache, enhanced cache split line loading and immediate store to load capabilities. The Penryn processors are also built to be ready for another FSB speed increment from 1333MHz to 1600MHz, so when that change happens, users should see another automatic bump to performance across the board.
However, the main feature improvement in the Penryn is the new SSE4 instructions and Super Shuffle Engine. Dubbed as the “most significant media instruction set architecture advancement since 2001”, SSE4's new instruction sets focus on two major categories for improvements to media acceleration and string(text) processing. SSE4 has the potential to offer very significant performance boosts in graphics, video processing, 3D imaging and data compression algorithms to name a few. However, unlike universal performance gains from say a larger cache, applications much first be optimized to take advantage of SSE4 enhancements.
You can check out the Intel white paper if you're really interested to know the full details, but in short, these are the SSE4 features that can be found in Penryn:-
- Adding support for two different vectored 32-bit integer multiply operations.
- Introducing 8-bit unsigned min/max operations, plus 16-bit and 32-bit signed and unsigned versions.
- Introducing features to improve the compiler’s ability to vectorize integer and single-precision code more efficiently.
- Adding highly specialized operations that can provide significant application level gains in video encode acceleration functions, floating-point dot product operation, 3D content creation and streaming load instruction.
Accompanying the new SSE4 instructions, the Penryn will also feature a Super Shuffle Engine. With the Core microarchitecture, Intel introduced full 128-bit wide SSE registers that enabled SSE instructions to be executed in a single cycle. This year, the Super Shuffle Engine enhances SSE algorithms further with a 128-bit shuffle unit that will now be able to execute full-width shuffles in one cycle. Not constrained only to SSE4, the Super Shuffle Engine will reduce latency and improve the speed of a wide range of SSE instructions with shuffle operations.
Wolfdale and Yorkfield
The Wolfdale core is the dual-core desktop version of the Penryn family. All the features mentioned of the Penryn core before applies to Wolfdale as well. Yorkfield on the other hand, is a quad-core processor and is essentially two Woldfale cores on the same package. As such, there are no architectural or process changes between the two processors. With Yorkfield, you just double everything - four cores, 12MB L2 cache, 214mm² total die size and 820 million transistors in all.
Predictably, Intel will launch the flagship Extreme edition Yorkfield processor first, which makes it the most powerful consumer desktop processor out in the market today, maintaining Intel's leading position. The mainstream variants of both Wolfdale and Yorkfield will also begin shipping this quarter, but you'll only see availability in Q1'2008. Today, the only Penryn processor out of the gates of Chipzilla is the quad-core, 3.0GHz, 1333MHz FSB Core 2 Extreme QX9650, and this is the processor we'll be testing.
The Core 2 Extreme QX9650 up close and personal.
Bottom snapshot of the QX9650.
Test Setup
The Core 2 Extreme QX9650 is the successor to July's quad-core QX6850, which also happens to be the ideal (and only) processor that can be used for direct performance comparison at the moment. Both are quad-core processors running at 3.0GHz on a 1333MHz FSB (9x333MHz). AMD's next generation Phenom desktop CPUs aren't here yet and it wouldn't be fair to use processors two generations back. We already know where the Athlon stands with the previous generation Conroes, so there is no reason to go down that road again. When we do get our hands on the Phenom, you can be sure of a full scale report then. For now, it's Yorkfield vs Kentsfield XE. We'll let the numbers do the talking.
CPU-Z screenshot of the processor capabilities.
Cache information. L1 remains the same, but check out the new 12MB 24-way associative L2.
The test bed setup used to benchmark both processors is as follows:-
- Intel Core 2 Extreme QX9650 processor (3.00GHz, 1333MHz FSB, 12MB L2)
- Intel Core 2 Extreme QX6850 processor (3.00GHz, 1333MHz FSB, 8MB L2)
- Intel X38 reference motherboard
- 2 x 1GB Kingston HyperX DDR3-1333 @ 7-7-20 CAS 7.0
- Seagate Barracuda 7200.10 200GB SATA hard disk drive (one single NTFS partition)
- MSI GeForce 8600 GTS 256MB - with ForceWare 162.18 drivers
- Intel INF 8.3.1.1013 and AHCI 7.5.0.1017 driver set
- Microsoft Windows XP Professional with Service Pack 2 (and DirectX 9.0c)
Benchmarks
The following benchmarks were used to determine the performance of the Intel Core 2 Extreme QX9650:-
- BAPCo SYSmark 2007 Preview 1.02
- SPEC CPU2000 v1.3
- Lightwave 3D 7.5
- Futuremark PCMark 2005 Pro
- SPECviewperf 10.0
- Cinebench 10
- XMpeg 5.0.3 (DivX 6.7 encoding)
- Futuremark 3DMark06
- AquaMark3
- Quake 4 v1.20
- F.E.A.R.
Results - SPECCPU 2000 v1.3
Normally, we register almost similar scores for the base results of SPEC CPU2000 with processors in the same class. This was even the case between the dual-core QX6800 and the quad-core QX6850. This is because this particular test is single-threaded in nature. With the Yorkfield based QX9650 however, even this single threaded test showed some improvements in performance over the QX6850 and although the integer results were close enough, the QX9650 pumped out some admirable floating point numbers.
Results from the multi-threaded rate tests were more or less expected as well after what we've seen. The new QX9650 was able to maintain a pretty consistent advantage over the QX6850 be it with a single or multi user. The four user results again showed a healthy lead in favor of the QX9650. Remember, both processors run at the same speed with the same FSB. The only difference here is the larger L2 cache and cache algorithm improvements on the Yorkfield.
Results - BAPCo SYSmark 2007 Preview
With all the new architecture enhancements and threaded optimized features of new processors, we've decided to phase is some of the newest benchmark updates as well. Retiring SYSmark 2004, we benchmark the QX9650 using the latest SYSmark 2007 Preview. However, the results here weren't as generous as SPECCPU 2000, although they are telling on its own as well.
The overall results show that the QX9650 and QX6850 are both equally matched, which seems to indicate that the benchmark isn't optimized for SSE4. If you look at the breakdown scores, the Video Creation test shows no improvement over the QX6850. There was also no real gain (and even a minute loss) in the E-Learning workloads. The QX9650 did however, manage to gain a lead in both Productivity and 3D segments of the benchmark, once again thanks to its larger L2 cache and overall architecture improvements rather than specific technology support.
Results - Lightwave 3D 7.5
In Lightwave 3D, we again get to see a constant and consistent improvement of multi-threaded performance on the QX9650 over the QX6850. This was evident even in less in a less stressful rendering workload such as Sunset.
Results - Futuremark PCMark 2005 Pro
PCMark05 was another staple benchmarking platform that didn't really register much performance differences between the QX9650 and QX6850. At such speeds and power, both CPUs breezed through the benchmark without breaking much of a sweat. Even though PCMark05 is multi-threaded, its workloads did not seem to stress the processors enough. However, thanks to the QX9650's larger and more efficient cache and memory management enhancements, memory subsystem specific workloads showed a 3% performance boost.
Results - SPECviewperf 10.0
SPECviewperf 10.0 is the latest version of this industry performance benchmark that we decided to move on to. One of its features now include multi-threaded benchmarking as well. However, SPECviewperf does its threaded tests in a different way. Instead of enabling latent multi-threading within one benchmark instance, SPECviewperf actually runs two or four seperate instances of the benchmark to flood each processor/core that you have. Because of this, if you look at the scores below, performance actually drops in the quad runs. This is because we only have one GPU running four copies of professional OpenGL tests and not an indication of processor performance.
Now, let's look at the scores here. In all scenarios, the QX9650 performed better than the QX6850 in the single instance runs and in most cases, the dual runs. Quad runs were a little erratic due to the stress on the GPU.
Results - Cinebench 10 and XMpeg 5.03
Media performance is an area that the Core microarchitecture has shown great improvement in the past and the new SSE4 is supposed to give the QX9650 a much bigger boost. In these two benchmarks, we have two industry standard tools in imaging and video compression. Cinebench 10, which is not SSE4 optimized and DivX 6.7, which is.
Even without SSE4 optimizations, the QX9650 showed marked improvements in rendering speed in both single and multi-threaded operations. Here, the single-threaded speeds were improved by over 8%. With quad-core rendering, the QX9650 completed the render nearly 10% faster then the QX6850.
Running our DVD compression test using the latest SSE4 enabled DivX CODECs, the QX9650 completed our test a whole 40 seconds faster than the QX6850, bringing conversion time down to a mere 4min and 44sec for a 1GB DVD sample file. Considering both processors are of the same class and speeds, this is an impressive achievement indeed.
Results - Futuremark 3DMark06 and AquaMark3
The QX9650 managed to wring around 5% boost in performance in 3DMark06's CPU rendering tests. However, when you look at the whole picture, it wasn't enough to actually make any visible difference in the overall performance numbers.
Results - Quake 4 and F.E.A.R.
Moving from synthetic gaming benchmarks to actual games, the QX9650 continues to prove that the Penryn isn't just a die-shrunk Conroe. Like AquaMark3, Quake 4 actually showed visible and tangible performance gains in both single and multi-threaded modes. In this particular game, we actually saw very impressive performance gains of more than 10fps, a gap that was previously seen between a mainstream Q6600 and the Extreme QX6850 (that's a 600MHz core frequency gap if you're taking score). Here, the QX9650 seemed to work wonders.
We don't usually use F.E.A.R. for CPU benchmarking anymore due to the fact that this game is more GPU hungry than anything, which was the main bottleneck. However, Intel seems to like this title and we decided to give it another spin for the QX9650, seeing the scores from Quake 4. Sadly, there was no noticeable performance gains here. Even with reduced graphics options, both processors registered similar scores in all resolutions.
Power Consumption
Power efficiency has been one of the main drivers for new PC technology across the market and Intel has been even resorted to developing and propagating mobile technology into desktop and server spaces to ensure that power efficiency is one of the main features of their Core microarchitecture. However, when we're talking about Extreme edition CPUs, power efficiency is isually thown to the wind. Gamers, overclockers and hardcore enthusiasts all have their souped-up rigs, ginormous coolers and four digit Watt PSU systems that really blow the whole power efficiency angle out of the water. But enough chit-chat.
The Penryn processors retain all of the Core microarchitecture's power saving features and benefit further from Intel's new 45nm process technology (If you still haven't taken a look at our "" article, now is the time to do it), but just how much of an improvement can this bring? If you look at the charts below, we can safely say that the improvements are amazing.
Idling in Windows Desktop
Plugging a power meter to the main socket, we measure the power draw of the whole system based on the scenarios below. All processor power saving features are enabled through the motherboard BIOS for this test, and we reuse the same system setup as our benchmarking test bed. Idling in Windows, the QX9650 draws in 106W of power, while the older QX6850 pulls nearly 20W more.
3DMark06 CPU and GPU Tests
3DMark06 proved to be an easy and effective method to test both CPU intensive loading as well as a more well rounded gaming scenario through its CPU Test and HDR/SM3.0 Deep Freeze Test respectively. In the CPU tests, the QX9650 drew a peak of 170W, while the QX6850 broke the 200W barrier with a max draw of 216W. The gaming tests were no different as well. The QX9650 was running cool at only 164W, while the QX6850 was still pretty much heating up at 211W. The power draw difference reached more than 45W here!
SPECviewperf 10.0 Extreme Loading - Quad Run
While running the power load tests above, we noticed that 3DMark06 wasn't really taxing the system, or the processors fully, even though it is supposed to be multi-threaded. In order to make sure that all four cores are constantly loaded near maximum (~90% load), we ran the quad threaded 3dsmax scenario for SPECviewperf 10.0 and took our readings. Amazingly, the QX9650 still maintained a sub 200W power draw even in this case, while the QX6850 was pushing 240W. Although the QX9650 shares the same TDP rating as its predecessors, it is very clear that Intel's new process technology has incredible power saving potential.
Intel for the Win...Again
At this point of time, it is a little hard to objectively compare Intel against anyone because they don't really have a competitor in a sense. Ever since Intel unleashed the Core microarchitecture unto the world, we've been waiting for AMD's response and even at this eve of AMD's supposed comeback with Phenom, Intel has to intervene to throw down another ace from their seemingly limitless bag of tricks, just for good measure. So, there will be no Penryn vs [insert AMD CPU here] comments as there is nothing to say that hasn't been already said since and no angle not explored. You'll have to just wait for Phenom to arrive like everybody else. Instead we'll just sum up what we've already seen from the Intel Core 2 Extreme QX9650.
There are two facets to the new 45nm core that have to be taken into consideration. Firstly, the whole Penryn family of processors is and will remain a minor 'refresh' to the year-old Conroe core from 65nm to 45nm and the anticipated architecture enhancements are welcome, but in general, the hasn't been any major changes to the Core microarchitecture to make the processor a significant upgrade if you already own a similar class Core 2 processor today.
Take processor performance for instance. From our benchmarks, the QX9650 is clearly partial to different types of applications, workloads and scenarios. We were able to see impressive performance gains that even exceed 10% in a clock to clock comparison in some, like Quake 4, Cinebench and DivX video encoding, but there were some that show little to no improvement at all. We suspect that the new and larger 24-way associative L2 cache plays a pivotal role in general performance gains, while the rest will require proper application optimization to realize the true potential of the CPU. The performance is there, SSE4 guarantees it, but whether your applications can take advantage of it is a different matter.
The second facet (and the most important) is performance/watt. This has been the rallying cry of the industry since power and heat have gone through the roof, no thanks to the Pentium 4 of course, but that's ancient history. Although the application performance gains of the QX9650 can be considered expected and in some cases, better than, the real star of the show is efficiency. If all the QX9650 was able to do was to perform 10% faster than the QX6850, we call it an upgrade. But if the QX9650 can perform 10% faster than the QX6850 while using 20% less power, that's value right there with a capital V. When Intel launched the Core microarchitecture, the Conroe took a whole 40% chunk off the Pressler XE's power consumption and the Penryn, from what we're seeing of the Yorkfield, seems set to cut it by another 20%. For what's is supposed to be the 'hot' Extreme edition processor, we couldn't really ask for more.
As always with new processor launches, the Core 2 Extreme QX9650 will find only a niche market and be bragging rights for the most hardcore of users. Still, it is a sweet, sweet taste of what's to come when the mainstream Wolfdale and Yorkfields hit the streets come first quarter 2008. Hold out on those upgrades, cause 45nm is here. Tick-tock, tick-tock.
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.