NVIDIA's pro-grade GPUs adopt Ampere architecture for massive performance leaps

Get ready for NVIDIA's upgraded professional GPU solutions to flood the desktop and laptop market, including options for remote visual processing needs.

Note: This article was first published on 13 Apr 2021.

Simply said, it's a great time to be a creator.

Today's announcement at NVIDIA's Digital GTC 2021 has finally upgraded the Turing architecture-based Quadro RTX solutions (for both desktop and mobile) to newer Ampere based architecture that gamers have been lapping up via the GeForce RTX 3000 series for a while now (or at least as much as retail stocks would allow).

Just like the 2x generational performance leap seen on the GeForce RTX 3000 series, the new NVIDIA RTX A5000 and RTX A4000 GPUs take over the Quadro RTX lineup with the following key features derived from the Ampere architecture:-

  • Second-gen RT cores for twice the throughput of the previous generation with the ability to run concurrent ray tracing, shading and denoising tasks.

     
  • Thrid-gen Tensor cores also double AI inferencing and deep learning tasks thanks to the new TF32  and BFloat16 data formats, and these figures go up to 10x with structural sparsity to improve efficiency. Read more about these data formats and structural sparsity in our Ampere architecture overview.

     
  • It packs much more CUDA cores thanks to Ampere's newer 8nm lithography process by Samsung, as opposed to Turing's older 12nm FinFET process. This, in turn, affords it to process up to 2.5x the FP32 operations that greatly increase the graphics and compute workload capacity.
The NVIDIA RTX A4000 can be had in a single-slot form factor. Considering that it's a close cousin to an NVIDIA RTX 3070, that's impressive.

The NVIDIA RTX A4000 can be had in a single-slot form factor. Considering that it's a close cousin to an NVIDIA RTX 3070, that's impressive.

The desktop-bound RTX A4000 and A5000 GPUs boast support PCIe 4.0  connectivity and double the graphics memory from the precursors with 16GB GDDR6 and 24GB GDDR6 respectively - both with ECC memory support. Of them, the A5000 is bestowed with NVLink to SLI two of these cards for a total GPU memory of 48GB. The RTX A5000 is also powerful enough that it supports NVIDIA RTX vWS software for multiple virtual workstation instances that enable remote users to share resources and drive high-end design and compute workloads.

Speed is everything when we need to evaluate new concepts for the most adventurous vehicles, and the NVIDIA RTX A5000 really delivers what we need. The basic viewport rendering is incredibly fast in Octane Render — 5x faster — and unlocks things we couldn’t have even tried before. -- Erick Green, 3D / CGI Lead, Polaris

The RTX A4000 and A5000 follow-up to better complete the pro-visualization lineup which first launched the RTX A6000 late in 2020. Here's how they stack up and against older GPUs:-

Graphics Card
RTX A4000
RTX A5000
RTX A6000
GeForce RTX 3080
Quadro RTX 6000
GeForce RTX 2080 Ti
GPU
Ampere (GA104)
Ampere (GA103)
Ampere (GA102)
Ampere (GA102)
Turing (TU102)
Turing (TU102)
Process
8nm (Samsung)
8nm (Samsung)
8nm (Samsung)
8nm (Samsung)
12nm FinFET
12nm FinFET
Die Size (mm2)
392
628
628
628
754
754
Transistors
17.4 billion
28 billion
28 billion
28 billion
18.6 billion
18.6 billion
CUDA cores
6144
8192
10752
8704
4608
4352
Tensor Cores
192
256
336
336
576
544
Tensor Performance
153.4 TFLOPS
222.2 TFLOPS
238 TFLOPS
238 TFLOPS
130 TFLOPS
89 TFLOPS
RT Cores
48
64
84
84
72
68
RT Performance
37.4 TFLOPS
54.2 TFLOPS
58 TFLOPS
58 TFLOPS
?
34 RT TFLOPS
GPU base / boost clock speeds
-
-
1455MHz/ 1860MHz
1440MHz/ 1710MHz
1400MHz/ 1770MHz
1350MHz/ 1545MHz
Memory
16GB GDDR6 with ECC
24GB GDDR6 with ECC
48GB GDDR6 with ECC
10GB GDDR6X
24GB GDDR6
11GB GDDR6
Memory clock speed
1.75Gbps
2.0Gbps
2.0Gbps
2.375Gbps
14,000MHz
14,000MHz
Memory bus width
256-bit
384-bit
384-bit
320-bit
384-bit
352-bit
Memory bandwidth
448GB/s
768GB/s
768GB/s
760GB/s
672GB/s
616GB/s
TDP
140W
230W
300W
320W
295W
250W
Price
--
--
US$4,694
US$699
US$6,300
US$999

 

For creators on the-the-move

For professionals on the go needing thin and light form factors, the new NVIDIA RTX A2000, NVIDIA RTX A3000, RTX A4000 and RTX A5000 laptop GPUs deliver accelerated performance without compromising mobility. They include the latest generations of Max-Q 3.0 and RTX technologies and are backed by the NVIDIA Studio ecosystem, which includes exclusive driver technology that enhances creative apps for optimal levels of performance and reliability.

Graphics Card
RTX A5000 Laptop
RTX A4000 Laptop
RTX A3000 Laptop
RTX A2000 Laptop
T1200 Laptop
T600 Laptop
GPU
Ampere (GA104)
Ampere (GA104)
Ampere (GA106)
Ampere (GA107)
Turing (TU117)
Turing (TU117)
Process
8nm (Samsung)
8nm (Samsung)
8nm (Samsung)
8nm (Samsung)
12nm FinFET
12nm FinFET
CUDA cores
6144
5120
4096
2,560
1024
896
Tensor Cores
192
160
128
80
NIL
NIL
Tensor Performance
174TFLOPS
142.5 TFLOPS
102.2 TFLOPS
74.7 TFLOPS
NIL
NIL
RT Cores
48
40
32
20
NIL
NIL
RT Performance
75.6TFLOPS
34.8 TFLOPS
25 TFLOPS
18.2 TFLOPS
NIL
NIL
GPU base / boost clock speeds
-
-
-
-
-
-
Memory
16GB GDDR6
8GB GDDR6
6GB GDDR6
4GB GDDR6
4GB GDDR6
4GB GDDR6
Memory clock speed
1.75Gbps
1.5Gbps
1.375Gbps
1.5Gbps
1.5Gbps
1.25Gbps
Memory bus width
256-bit
256-bit
192-bit
128-bit
128-bit
128-bit
Memory bandwidth
448GB/s
384GB/s
264GB/s
192GB/s
192GB/s
160GB/s
TGP
80 - 165W
80 - 140W
60 -130W
35 - 95W
35 - 95W
25W

NVIDIA also introduced the NVIDIA T1200 and NVIDIA T600 laptop GPUs, based on its previous-generation Turing architecture. These are designed for multi-application professional workflows and are a significant upgrade in performance and capabilities from integrated graphics.

 

Delivering cutting edge graphics, video and AI services through enterprise servers

The NVIDIA A10 GPU for enterprise server deployment.

The NVIDIA A10 GPU for enterprise server deployment.

What if you're in an organisation that's leveraging on enterprise server solutions for shared utilization of CPU, GPU and AI performance? That's when you'll need a vendor who can deploy an accelerator to meet the cutting edge needs of designers, engineers, artists, scientists, and more who may not be equipped with desktops or laptops running the above covered RTX GPU solutions.

Enter the NVIDIA A10 Tensor Core GPU that combines with NVIDIA RTX Virtual Workstations (vWS) to deliver the necessary modern compute needs to clients while seated within an enterprise server. Featuring the same Ampere architecture GPU as some of the above solutions and 24GB GDDR6 memory, the A10 is a single slot, full-height, full-length card that's designed with a 150W TDP. With 72 RT Cores, the NVIDIA A10 seems to be using a variant of the GA102 GPU core that's used on the GeForce RTX 3080 and RTX A6000 GPUs.

NVIDIA says the A10 can deliver up to 2.5x faster virtual workstation performance and inference performance over the NVIDIA T4. The NVIDIA A10 is supported as part of NVIDIA-Certified Systems, in the on-prem data center, in the cloud, and at the edge. It builds on the rich ecosystem of AI frameworks from the NVIDIA NGC catalog, CUDA-X libraries, over 2.3 million developers, and over 1,800 GPU-optimized applications to help enterprises solve the most critical challenges in their business.

For enhanced virtual desktop interface (VDI) deployment for remote workers, NVIDIA also has the A16 GPU solution. The A16 crams four Ampere GPUs on one board, each with 16GB of graphics memory for a total of 64GB onboard. Through proper virtual PC configuration, the A16 can support up to 64 concurrent users per board. NVIDIA also says the A16 delivers an experience indistinguishable from a physical PC, which allows remote workers to seamlessly transition between working at the office and at home.

 

Market Availability

The new NVIDIA RTX desktop GPUs and NVIDIA data centre GPUs will be available from global distribution partners and OEMs starting later this month.

The new NVIDIA RTX laptop GPUs will be available in mobile workstations anticipated in Q2 this year from global OEMs.

Meanwhile, the NVIDIA A16 will be available later this year.

Here are more stories from NVIDIA's Digital GTC 2021 event:-

1) NVIDIA joins the ARMs race with their first data centre CPU called Grace

2) NVIDIA announces Drive Atlan, an SoC for cars that delivers over 1,000 TOPs

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.

Share this article