The Many-Core Future of Computing
For the first time since SIGGRAPH (short for Special Interest Group on GRAPHics and Interactive Techniques), the prestigious annual conference for computer graphics was organized, the gathering extended beyond American shores, with an Asian leg held in Singapore in December 2008. Known as SIGGRAPH Asia, the inaugural event was deemed a success, with up to 3,200 participants from 49 countries. While there were many exciting developments announced at the event, perhaps the most significant was the ratification of OpenCL version 1.0, a royalty-free, open standard for heterogeneous parallel computing involving CPUs, GPUs and any other form of processors one can think to implement it on (toasters included).
From the initial draft proposal that Apple (OpenCL is expected to be implemented in the next Mac OS X Snow Leopard) submitted to the Khronos Group six months ago, it is amazing that the specification has been released in such a record time and with a who's who of industry leaders and bigwigs, like AMD, NVIDIA, IBM and Intel involved in its genesis too. Yet it could not have come sooner for the industry. Why?
Well, one could think of OpenCL as the OpenGL equivalent for the exciting and still nascent world of parallel computing. It stands for Open Computing Language and not only is it open to everyone, it also unifies the disparate and non-compatible initiatives that companies like AMD and NVIDIA have been working on for some time now. Cross-vendors software portability and support for a diverse range of applications are stated goals for OpenCL. To understand why OpenCL is so important for the industry, let's take a step back and look at what's the fuss about parallel computing.
The Many-Core Future of Computing
As we noted in our look at the past decade in CPU developments, the clock race has hit a major barrier in the form of the laws of physics. The issues of heat and power have made sustaining the clock race all but impossible as the returns from adding that MHz increase fall short of the cost. Intel has long given up on its dreams for processors that scale up to 10GHz and beyond, shifting its focus to packing more processing cores onto a single die. With the return of HyperThreading in Intel's latest CPU architecture, Nehalem, one can expect more cores and more threads.
Meanwhile, the graphics industry has seen the number of processing units or shaders in the typical GPU increase by many magnitudes. Although it would not be accurate to compare a single processing 'core' on a GPU directly to the more complex one in a CPU, the GPU has evolved from an architecture dedicated solely to rendering graphics to a general purpose, programmable one, with APIs like DirectX 10 and OpenGL 2.0 paving the way for them to be utilized for other purposes, like parallel computing.
Thus far, developments in utilizing the power of the GPU has focused on stream processing, which takes advantage of the GPU's ability to work on problems that are high in data parallelism, meaning that there are multiple, independent data that can be worked on simultaneously. One can visualize it as a stream of data that's being processed continuously. It's no surprise that GPUs are good at this because 3D rendering is heavily data parallel in nature. Massively parallel applications like distributed computing project, Folding@home for instance, benefits tremendously from the many shader cores present in current GPUs.
Hence, if a problem can be solved by breaking it into many independent parts and executing similar operations on each part separately (like SIMD or Single Instruction, Multiple Data), chances are your GPU is a good candidate to do the calculations. Some areas where GPUs are eminently suitable for include video, digital signal processing, scientific computing applications and bioinformatics.
As you can see, both the CPU and GPU are converging into a similar, multi-core future despite starting from different points and presently, GPUs arguably have the head start over their CPU counterparts, at least with the data parallel applications that they have been adapted to solve. The hardware therefore is there but to utilize its full potential, we'll need a rather big change in software development at the fundamental programming model level.