GamingGPUs

NVIDIA RTX 4080/4090 Launch Approaches as TSMC/Partners Prep, Chiplet Based Hopper to Pack 36,864 Cores

NVIDIA’s next-gen GeForce RTX 40 series graphics cards are slated to land sometime in the second half of 2022, possibly even at Computex in late May. On that front, the chipmaker’s Taiwanese backend partners, most notably TSMS, ASE Technology, and other IC providers/packers are gearing up for a smooth launch of the most powerful graphics architecture ever. The company plans to launch two different lineups, codenamed Hopper and Ada Lovelace. The former will leverage a chiplet based architecture with two dies (based on TMSC’s CoWoS 2.5D packaging technology) and the latter will retain the monolithic design of its predecessors.

GPU TU102 GA102 AD102
Arch Turing Ampere Ada Lovelace
Process TSMC 12nm Sam 8nm LPP 5nm
GPC 6 7 12
TPC 36 42 72
SMs 72 84 144
Shaders 4,608 10,752 18,432
TFLOPs 16.1 37.6 90 TFLOPs?
Memory 11GB GDDR6 24GB GDDR6X 24GB GDDR6X
Bus Width 384-bit 384-bit 384-bit
TGP 250W 350W 600W?
Launch Sep 2018 Sep 20 H2 2022

As we’re heard, Hopper will be aimed at data centers, HPC, and AI-centric workloads. Succeeding the A100, it’ll reportedly feature two AD102 (Ada) dies glued together using TSMC’s CoWoS packaging technology. This means that we’re looking at a massive 36,864 cores for the big fat Hopper “H100” GPU. For the memory, you can be sure that it’ll pair HBM2e or HBM3 with a 1,024-bit (or higher) bus resulting in a bandwidth of 3,000-4,000 GB/s. This will, of course, come at a much higher power consumption of around 1000W (from just 400W on the A100). Not exactly surprising considering the sheer increase in throughput. We’re talking about a jump from just 19.5 TFLOPs (FP64/FP32) to over 150 TFLOPs, with mixed precision compute modes offering even higher performance: A 7-8x increase in compute capabilities through a single generation!

This image has an empty alt attribute; its file name is E65gO0tVgAgIxGI.jpg
Hopper (Via: @Harukaze)

For Ada-based GeForce RTX 4080 and 4090, we’re looking at twice as much performance as the contemporary Ampere parts, with an FP32 core count of up to 18,432. The AD102 flagship is rumored to feature 144 SMs distributed across 12 GPCs. That results in a 71% gain in raw compute performance (66 TFLOPs) over the GA102. Add to that the fact that Team Green is leveraging TSMC’s advanced N5 process node for Lovelace, and the resulting frequency boost should net a ~2.2x gain over the RTX 3090.

The bus width of the RTX 4080 and 4090 should be the same as their predecessors (384-bit and 320-bit), paired with faster GDDR6X chips, resulting in even higher memory bandwidth. The RTX 4090 should pack up to 24GB of GDDR6X memory, and clock speeds rivaling the Navi 31 parts (2. (An 627) 3-2.5GHz). As for the overall performance throughput, we’re looking at around 90 TFLOPs of FP32 performance, a big step up over the 3090’s 36 TFLOPs.

If the AD102 includes a total of 18,432 cores, we can expect roughly 16,000 cores on the RTX 4080 and 18,000 on the RTX 4090. According to Greymon and Kopitekimi, the Lovelace-based RTX 4080/4090 will draw as much as 500W of power under load. This is despite the use of one of the most advanced and efficient process nodes on the planet. However, running the numbers kind of adds up.

The AD102 flagship is expected to feature 144 SMs/12 GPCs, a gain of 71% in terms of logic compared to the GA102. Even if TSMC’s N5 node is 30% more power-efficient than Samsung’s 8nm LPP node, we’re looking at an increase of at least 80% in hardware units. That should easily result in a power draw at least 30-50% more than the top-end RTX 3080/3090 Ampere offerings.

Source: DigiTimes

Areej Syed

Processors, PC gaming, and the past. I have written about computer hardware for over seven years with over 5000 published articles. I started during engineering college and haven't stopped since. On the side, I play RPGs like Baldur's Gate, Dragon Age, Mass Effect, Divinity, and Fallout. Contact: areejs12@hardwaretimes.com.
Back to top button