NVIDIA RTX 4090 Final Specifications Leak Out: 16,128 Cores, 24GB GDDR6X, 450W, 2x Faster than the RTX 3090 [Report]

The specifications of NVIDIA’s Lovelace flagship have been supposedly finalized. The GeForce RTX 4090 leveraging the AD102 die will feature a total of 16,128 FP32 cores across 126 SMs, 63 TPCs, and 11 GPCs. This massive die will be paired with 24GB of 21Gbps GDDR6X memory across a 384-bit bus, the same as the RTX 3090 Ti. Lovelace will likely borrow some of the features of Hopper, especially Thread Block Memory Sharing which along with the massive 96MB of L2 cache drastically boost SM utilization and bandwidth, respectively.

In case you missed out on the Hopper Whitepaper, here’s a small primer on Thread Block Clusters and Distributed Shared Memory (DSM). To make scheduling on GPUs with over 100 SMs more efficient, Hopper and Lovelace will group every two thread blocks in a GPC into a cluster. The primary aim of Thread Block Clusters is to improve multithreading and SM utilization. These Clusters run concurrently across SMs in a GPC.

Thanks to an SM-to-SM network between the two threads blocks in a cluster, data can be efficiently shared between them. This is going to be one of the key features promoting scalability on Hopper and Lovelace which is a key requirement when you’re increasing the core/ALU count by over 50%.

Lastly, let’s not forget that the RTX 4090 won’t feature the full-fat AD102 die and yet offer twice as much performance as its predecessor. The TGP is eventually going to be “just” 450W, a far cry from the previously rumored 600-900W abominations. The RTX 4090 Ti which may launch later in the cycle with the fully enabled AD102 die is more likely to come with a 600W TGP.

GPU	GA102	AD102	RTX 4090	AD103	RTX 4080	RTX 4070 Ti (AD104)	RTX 4070
Arch	Ampere	Ada Lovelace		Ada Lovelace		Ada Lovelace
Process	Sam 8nm LPP	TSMC 5nm		TSMC 5nm		TSMC 5nm
GPC	7	12	11	7	7	5	5
TPC	42	72	64	42	40	30	30
SMs	84	144	128	84	80	60	60
Shaders	10,752	18,432	16,384	10,752	9,728	7,680	7,680
TP	37.6	~100 TFLOPs?	83 TFLOPs	~50 TFLOPs	47 TFLOPs?	~35 TFLOPs	35 TFLOPs?
Memory	24GB GDDR6X	48GB GDDR6X	24GB GDDR6X		16GB GDDR6X		12GB GDDR6X
L2 Cache	6MB	96MB	72MB	64MB		48MB
Bus Width	384-bit	384-bit		256-bit		192-bit
TGP	350W	600W	450W	450W	285-340W	300W	285W
Launch	Sep 2020		Sept 22?		Sept 22?		Q1 2023?

Source:

OK, let's do a new summary.
RTX 4090, AD102-300, 16128FP32, 21Gbps 24G GDDR6X, 450W, ~2×3090.
I am disappointed with RDNA3.
That's all.
— kopite7kimi (@kopite7kimi) May 16, 2022