Larrabee - Intel Gets Serious on 3D Graphics
A full 10 years after Intel's last stab at a discrete graphics solution, they are back again and this time with the Larrabee project. Comprising of an array of x86 processing cores with added vector handling functions and a modified instruction set to tackle it, this isn't the usual approach. Find out what this all means for you.
By Vijay Anand -
An Early Overview
No longer a rumor or a project for the distant future, the once questionable intent of Intel getting into the 3D graphics solutions space hotly contested by AMD and NVIDIA is now an actual threat to the established visual leaders.
We've all known and accepted for several years where Intel offers basic graphics capabilities for general day-to-day productivity tasks that most users require, whereas those who require more graphics prowess for professional tasks or just playing games would head for one of the add-on options from NVIDIA and AMD. However in today's context where CPUs are getting ever more powerful with several computing cores and GPUs boasting ever higher count of shader processing units, the problem herein is that there is a large amount of untapped potential on both types of processors; some of which not being utilized (in CPUs especially) or not optimized for general tasks (GPUs in particular). Both processors are designed and built for differing needs and have also been using differing application programming interfaces (APIs) as well. As such the current problem exists where one can't utilize the potential of both processors combined at any one time (usually), but that's something that Intel hopes to tackle with their Larrabee project.
Intel Larrabee's significance as evolution on both the CPU and GPU fronts to create an eventual need of this architecture.
CPUs are very high precision processing cores with massive caches that are tuned to process/handle all sorts of general purpose computing tasks. GPUs are however designed for a very specialized purpose of crunching 3D graphics and thanks to the nature of the tasks, the processors are heavily engineered to tackle floating-point workloads. With their highly programmable nature ever since DirectX 9.0c, and especially since the DirectX 10 API, many users from researchers to end-users have been busy trying to take advantage of the properties of GPUs to accelerate specialized tasks on video processing and the likes since these tasks too highly depend on floating-point performance. A dedicated purpose-made processor is anytime much faster than a general purpose CPU. This is why we're currently seeing GP-GPU computing initiatives that are trying to offload video transcoding to the GPU (and other such tasks), which can complete the job several magnitudes faster than a general purpose CPU can.
** Updated on 5th August 2008 **
Intel's Larrabee Detailed
If you've been keeping abreast with developments from Intel, such as the , you would have had a good idea of Intel's Larrabee project. Unlike the established visual computing leaders of AMD and NVIDIA, Intel will approach this space with an architecture that they excel at, and what better than their x86 processing cores. Intel actually did some design experimentation to bring about a theoretical 10-core throughput-optimized processor with the same area and power consumption of a dual-core CPU.
Features | Intel Conroe | Theoretical Larrabee |
No. of CPU cores | 2 (out-of-order) | 10 (in-order) |
Instruction per Issue | 4 per clock | 2 per clock |
Vector Processing Unit (VPU) lanes per core | 4-wide SSE | 16-wide |
L2 cache size | 4MB | 4MB |
Single-stream throughput | 4 per clock | 2 per clock |
Total Vector operations throughput | 8 per clock | 160 per clock |
With a simpler x86 core design derived from the original dual-issue Pentium processor's in-order architecture versus that of the modern four-issue and out-of-order Core architecture, stream throughput takes a dive too on the Larrabee x86 core. The net result though, is that the simpler theoretical 10-core design is able to process more than 20 times the vector operations of a modern dual-core processor. This is the idea behind Larrabee and will contain numerous of these Intel x86 cores of undisclosed amount, each with a vector processing unit, a much wider SIMD unit (16-wide), support for 64-bit extensions and sophisticated pre-fetching. These requirements will introduce a new vector handling instruction set as well.
A simplified diagrammatic representation of one of the Larrabee x86 cores.
The entire Larrabee block diagram will look like this. The L2 cached will be partitioned among several of x86 cores to provide high bandwidth and promotes data sharing and replication. All the processing cores communicate on a wide 1024-bit ring bus which gives them fast access to memory controllers, cache and other fixed function blocks.
Take note that unlike traditional GPUs, there isn't a fixed function rasterization logic between the vertex and pixel shaders, nor is there a frame buffer blend in the backend. The functions are all programmable and don't follow the fixed pipeline format. According to data gathered by Intel, there is no single magic workload for any game and it actually varies quite widely. As such, all the processing blocks are fully programmable on the Larrabee to what extent is required by the task at hand.
This brings us to the last topic of order for Larrabee - the API used to interface the hardware. As mentioned previously, Larrabee will be able to tackle DirectX and OpenGL calls, but it won't be at the run-time level within the hardware; rather this will require a software translator/renderer to interface between DirectX and OpenGL instructions and Larrabee's x86-based hardware. Part of the goal is to ensure this takes place fast enough that it's seamless enough and is at run-time speeds. This is where Intel has to get their software stack ready and good to go as best as they can since game developers have already been accustomed to DirectX/OpenGL programming. Additionally, Intel's shoddy past for the IGP drivers aren't exactly a beacon of light, so it's imperative that Intel can get their act together if Larrabee is to be successful.
However, thanks to the x86 processing cores in Larrabee, it actually supports C/C++ just like your desktop processors. In that sense, it's actually a lot easier to program for the Larrabee hardware since programming in this language is nothing new. Only thing required is to be aware of the newer core extensions and how to utilize them to get the best out of Larrabee. Albeit C/C++ is a common programming language, it's not in the game development world. If the developers spend some time to focus on this 'new' code path, they may be able to harness more out of the Larrabee architecture, be it either in special implementations or much faster processing since it's native to the hardware.
There's certainly an interesting potential for Larrabee and how it complements both traditional processing tasks as well as stream processing tasks with an x86-based processing core at the heart of the Larrabee, but at this point of time, Larrabee has a long way to go. No expected results of any sort have been shared at the moment and with the existing launch timeline, Larrabee is expected to see light during 2010. Sounds like a long time away, but time flies faster than you think. Later this month, Intel will host their IDF Fall conference and we expect to see some nice updates on Larrabee among other developments on their next generation Server/Desktop architecture with Nehalem. For now, we'll leave you with this rough performance scaling chart that Intel has shared on how they expect Larrabee to perform (albeit not informative enough without any points of reference, this does leave you with an idea of how multiple Larrabee-based product SKUs will come about):-
Intel's representation of performance with regards to the number of processing cores in a Larrabee architecture.
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.