Obsessed with technology?
Subscribe to the latest tech news as well as exciting promotions from us and our partners!
By subscribing, you indicate that you have read & understood the SPH's Privacy Policy and PDPA Statement.
News Categories

Pascal to replace Maxwell GPU architecture in 2016 with 10x performance boost

By Vijay Anand - on 19 Mar 2015, 8:56am

Pascal to replace Maxwell GPU architecture in 2016 with 10x performance boost

Having established deep learning as its new focus and direction, a GPU Tech Conference won't be complete without discussing about future GPU architecture plans.

At last year's GTC, NVIDIA has already sneaked in word about Pascal coming in 2016. It was also known that it will make use of 3D Memory (or also known as stacked memory) and NVLink. This year, NVIDIA added that it will be the first GPU to support mixed-precision computing and that all these three technologies combined will enable Pascal to be 10 times faster than Maxwell. Whether it refers to first generation or second generation Maxwell, NVIDIA can't comment on it, nor other specifics such as manufacturing process, etc.

Mixed-precision compute capability will allow the GPU to compute FP16 operations at twice the rate of FP32 operations, while still maintaining compute accuracy for both FP16/FP32. NVIDIA pointed out that this will greatly help speed up two key functions of deep learning - object classification and convolution.

NVLink is a new purpose-specific high speed GPU interconnect to replace the aging all-purpose PCIe bus between CPU and GPU, as well as between multiple GPUs. According to NVIDIA, its theoretical maximum bandwidth of more than 80GB/s, versus the data transfer ceiling of PCIe Gen 3.0, capped at 16GB/s. This greatly benefits applications like deep learning that have high inter-GPU communication needs. The NVLink technology is jointly developed by NVIDIA and IBM.

Lastly, with the ever increasing complexity and size of data that needs to be shuffled between the GPU and graphics memory, Pascal will take the leap to offer 3D memory that NVIDIA says will triple the bandwidth and triple the frame buffer size. Instead of the usual placement of memory chips that we see on a standard graphics card, Pascal's 3D memory will see RAM chips stacked atop each other (greatly increasing frame buffer size up to 32GB) and it will also be placed adjacent to the GPU, which cuts down on the distance and latency, thus boosting throughput efficiency.

NVIDIA's desktop GPU roadmap. Also, take note that the various GPu architectures are ranked according to performance per watt for single precision matrix matrix multiplication (SGEMM).

Interestingly, Volta - the next generation after Pascal - was supposed to have been the first to feature stacked memory technology, but it appears that this idea is being brought forward (or that Volta has been delayed). According to our previous report, Volta would actually have the stacked memory chips co-exist on the same silicon strata as the Volta GPU to increase its memory bandwidth to 1TB/s. Referring to NVIDIA's representation of Pascal, that description fits it like a glove:-

NVIDIA's representation of the upcoming Pascal GPU. Notice the 3D memory (stacked RAM) is actually on the same substrate as the GPU die? Expect a far greater memory throughput than ever before to enable accelerated deep learning and beyond.

Given that Volta's vaunted feature would be implemented on Pascal, that leaves Volta more mysterious as to what it will offer or if its implementation would differ. Unfortunately, such queries of how Pascal and Volta would differ were not entertained by NVIDIA this time round and we'll have to wait till later in the year to hear more from them.

Source: NVIDIA