NVIDIA May Announce its Next-Gen Ada Lovelace GPU Architecture at GTC 2022 (21st March)

NVIDIA is slated to launch its next-gen RTX 40 series graphics cards later this year (most likely September 2022). Based on the Ada Lovelace microarchitecture and TSMC’s N5 (5nm) process node, these GPUs are slated to offer more than twice as much performance as their preceding RTX 30 “Ampere” family. All this will, of course, come at a cost not only for consumers but NVIDIA as well. According to the company’s Q4 earnings report, by the end of the quarter, Team Green had spent up to $9 billion for inventory purchases and prepayments for future products.

Although the hard launch of the RTX 40 series parts won’t come any sooner than Q3 2022, the first GPU featuring the Ada Lovelace microarchitecture may be announced at GTC 2022. We’re talking about the successor to the A100. It’s worth noting that the A100 was fabbed on TSMC’s 7nm node, whereas the RTX 30 series leveraged Samsung’s 8nm process.

Data Center GPU	NVIDIA Tesla P100	NVIDIA Tesla V100	NVIDIA A100
GPU Codename	GP100	GV100	GA100
GPU Architecture	NVIDIA Pascal	NVIDIA Volta	NVIDIA Ampere
GPU Board Form Factor	SXM	SXM2	SXM4
SMs	56	80	108
TPCs	28	40	54
FP32 Cores / SM	64	64	64
FP32 Cores / GPU	3584	5120	6912
FP64 Cores / SM	32	32	32
FP64 Cores / GPU	1792	2560	3456
INT32 Cores / SM	NA	64	64
INT32 Cores / GPU	NA	5120	6912
Tensor Cores / SM	NA	8	4²
Tensor Cores / GPU	NA	640	432
GPU Boost Clock	1480 MHz	1530 MHz	1410 MHz

Looking at the GV100 and the GA100, it’d be fair to assume that the AD100 will feature a different SM floorplan as well, and possibly increase the SM count to over 130. Like always, the FP32: FP64 will be 1:1, with Tensor cores and sparse matrix multiplication getting special attention. Furthermore, unlike the RTX 40 series, it’ll be paired with HBM2e memory (over 100GB of it).

The RTX 4080, on the other hand, should feature 16GB of GDDR6X memory running at around 21Gbps, while the RTX 4090 should pack somewhere between 20-30GB of GDDR6X memory. In terms of specifications, we’re looking at an FP32 core count of up to 18,432. The AD102 flagship is rumored to feature 144 SMs distributed across 12 GPCs. That results in a raw compute gain of over 2.5x (90 TFLOPs) over the GA102, granted the core is running close to 2GHz.

GPU	TU102	GA102	AD102	GH202
Arch	Turing	Ampere	Ada Lovelace	Hopper
Process	TSMC 12nm	Sam 8nm LPP	TSMC 5nm	3nm?
GPC	6	7	12	~20
TPC	36	42	72	~140
SMs	72	84	144	~300
Shaders	4,608	10,752	18,432	~36,000?
TFLOPs	16.1	37.6	90 TFLOPs?	150 TFLOPs+
Memory	11GB GDDR6	24GB GDDR6X	24GB GDDR6X	32GB GDDR7?
Bus Width	384-bit	384-bit	384-bit	512-bit
TGP	250W	350W	600W?	600W+
Launch	Sep 2018	Sep 20	Aug-Sep 2022	2024