As 2021 ends, we’re approaching the end of the existing GPU cycle. Both NVIDIA and AMD are expected to launch their next-gen gaming graphics cards in the second half of 2022. While AMD is going with a chiplet design for RDNA 3, NVIDIA is reportedly sticking to a monolithic approach. This means that the Ada Lovelace GPUs will be more expensive to manufacture compared to the Radeon RX 7900 XT/7800 XT, and will possibly be reflected in the pricing. There are plenty of rumors detailing the specifications and the potential performance targets of these graphics cards, and that’s what we’ll explore in this post.
AMD RDNA 3: Navi 31, Navi 32 and Navi 33
AMD’s Radeon RX 7900 XT flagship will be based on an MCM (chiplet) design with a total of over 15 thousand cores (15,360 to be exact) across 60 WGPs. It should easily be 2.2-2.5x faster than its predecessor, with the RX 7800 XT (Navi 33) beating the RX 6800 XT by 30-40%. The Radeon RX 7900 XT will leverage the Navi 31 dies, likely two of them. The Navi 31 will leverage up to 512 MB of L3 “Infinity Cache” 3D stacked on top of the Graphics Compute Die (GCD) or the interconnect bridge between the GCDs (much like Milan-X and Zen 3D).
As for the bus width, we’re looking at a 256-bit bus, paired with faster GDDR6 memory (likely Micron’s 1z) for improved external bandwidth, backed up by the Infinity Cache. Navi 31 will be based on TSMC’s 5nm and 6nm process nodes (GCD=5nm, IOD=6nm), and offer an overall throughput of up to 75 TFLOPs. The GPU clock speeds should also be a bit higher than Navi 21 as the transition to N5 makes the entire device more efficient.
It’s unclear whether the Radeon RX 7800 XT will be based on a cut-down version of Navi 31 or the completely different Navi 32 dies, but it should also be a major step up from the RX 6800 XT. Navi 32 is also going to be a chiplet design with two compute dies and one MCD. We’re looking at a core count of around 10,240 shaders (or 40WGP), and a bus width of 192-bit paired with 16GB of GDDR6 memory. The L3 “Infinity Cache” is most likely going to be under 400MB, 384MB is the most probable figure. Like Navi 31, it’ll be 3D stacked atop the GCDs or the bridge interconnect.
Navi 33 die will reportedly pack 4096 shaders (stream processors). It’s expected to power the Radeon RX 7600 XT, making it a massive upgrade over the existing Radeon RX 6600 XT and its 2,304 shaders. We’re looking at an increase of more than 2x if you include the IPC, compute, and frequency gains. The Radeon RX 7600 XT is expected to pack 128-256 MB of Infinity Cache. Rumors indicate a monolithic design, with four 32-bit memory controllers for an overall bus width of 128-bit.
AMD’s RDNA 3 graphics architecture is expected to get a major overhaul at the front-end, with redesigned Work Group Processors in place of Compute Units, or Dual Compute Units. With RDNA 1 and 2, the WGPs were the basic units for workload scheduling (from CUs on GCN/Vega), but it looks like that is going to change again with Navi 3x. Dual Compute Units are being discarded in favor of wider Work Group Processors, packing as many as 256 stream processors across eight 32-wide SIMDs.
In addition to a re-designed WGP, RDNA 3 should also beef up the ray-tracing capabilities. We should get advanced RT units, capable of BVH traversal and more. Furthermore, there will likely be an increase in the sheer number of RT cores as the overall SIMD32 count increases. The texture mapping units and the render backend and frontends will also see changes accordingly. The power draw of Navi 31 and 32 should stay between 350W-550W.
NVIDIA RTX 4080, RTX 4090: Ada Lovelace “AD102”
For NVIDIA’s GeForce RTX 4080 and 4090, we’re once again looking at twice as much performance as the contemporary Ampere parts, with an FP32 core count of up to 18,432. The AD102 flagship is rumored to feature 144 SMs distributed across 12 GPCs. That results in a 71% gain in raw compute performance (66 TFLOPs) over the GA102. Add to that the fact that Team Green is leveraging TSMC’s advanced N5 process node for Lovelace, and the resulting frequency boost should net a ~2.2x gain over the RTX 3090.
The bus width of the RTX 4080 and 4090 should be the same as their predecessors (384-bit and 320-bit), paired with faster GDDR6X chips, resulting in even higher memory bandwidth. The RTX 4090 should pack up to 24GB of GDDR6X memory, and clock speeds rivaling the Navi 31 parts (2.3-2.5GHz). As for the overall performance throughput, we’re looking at around 90 TFLOPs of FP32 performance, a big step up over the 3090’s 36 TFLOPs.
If the AD102 includes a total of 18,432 cores, we can expect roughly 16,000 cores on the RTX 4080 and 18,000 on the RTX 4090. According to Greymon and Kopitekimi, the Lovelace-based RTX 4080/4090 will draw as much as 500W of power under load. This is despite the use of one of the most advanced and efficient process nodes on the planet. However, running the numbers kind of adds up.
The AD102 flagship is expected to feature 144 SMs/12 GPCs, a gain of 71% in terms of logic compared to the GA102. Even if TSMC’s N5 node is 30% more power-efficient than Samsung’s 8nm LPP node, we’re looking at an increase of at least 80% in hardware units. That should easily result in a power draw at least 30-50% more than the top-end RTX 3080/3090 Ampere offerings.
The 12VHPWR power connector can supply up to 55A of continuous power to the graphics card via its 12V power rail with a maximum power of 600W. The PCI SIG specifies a pin current capability (excluding sideband contacts) of 9.2 A per pin/position with a limit of 30 °C T-rise above ambient temperature at + 12 VDC with all twelve contacts energized. This results in 55.2 amps in one direction for the 12-volt power rail or 662.4 watts.
Taking the tolerances and safety precautions into place, up to 600W of power can be safely supplied to the add-in board (a reduction of 11% over the max possible). The added power consumption may seem like a step back for GPUs, but the new connector will simplify plug and PCB design quite significantly. NVIDIA’s RTX 40 series graphics cards, most notably the RTX 4080 and 4090 should leverage this new power connector along with support for PCIe Gen 5.
The same can be said for the prices of the next generation of graphics cards. The reasons are twofold. For starters, TSMC’s N5 node is quite a bit pricier than Samsung’s 8LPP node. Secondly, the die sizes will also grow despite a node shrink, meaning worse yields and therefore, higher production costs. As per my personal estimates, the RTX 4080, 4090 and the remaining Ada parts will be priced as follows:
- RTX 4090 – $2,000
- RTX 4080 Ti – $1,500
- RTX 4080 – $999
- RTX 4070 – $549
- RTX 4060 – $399
- RTX 4050 Ti – $299
- RTX 4050 – $250