GamingGPUsNews

NVIDIA RTX 4080/4090 Launch Approaches as TSMC/Partners Prep, Chiplet Based Hopper to Pack 36,864 Cores

NVIDIA’s next-gen GeForce RTX 40 series graphics cards are slated to land sometime in the second half of 2022, possibly even at Computex in late May. On that front, the chipmaker’s Taiwanese backend partners, most notably TSMS, ASE Technology, and other IC providers/packers are gearing up for a smooth launch of the most powerful graphics architecture ever. The company plans to launch two different lineups, codenamed Hopper and Ada Lovelace. The former will leverage a chiplet based architecture with two dies (based on TMSC’s CoWoS 2.5D packaging technology) and the latter will retain the monolithic design of its predecessors.

GPUTU102GA102AD102
ArchTuringAmpereAda Lovelace
ProcessTSMC 12nmSam 8nm LPP5nm
GPC6712
TPC364272
SMs7284144
Shaders4,60810,75218,432
TFLOPs16.137.690 TFLOPs?
Memory11GB GDDR624GB GDDR6X24GB GDDR6X
Bus Width384-bit384-bit384-bit
TGP250W350W600W?
LaunchSep 2018Sep 20H2 2022

As we’re heard, Hopper will be aimed at data centers, HPC, and AI-centric workloads. Succeeding the A100, it’ll reportedly feature two AD102 (Ada) dies glued together using TSMC’s CoWoS packaging technology. This means that we’re looking at a massive 36,864 cores for the big fat Hopper “H100” GPU. For the memory, you can be sure that it’ll pair HBM2e or HBM3 with a 1,024-bit (or higher) bus resulting in a bandwidth of 3,000-4,000 GB/s. This will, of course, come at a much higher power consumption of around 1000W (from just 400W on the A100). Not exactly surprising considering the sheer increase in throughput. We’re talking about a jump from just 19.5 TFLOPs (FP64/FP32) to over 150 TFLOPs, with mixed precision compute modes offering even higher performance: A 7-8x increase in compute capabilities through a single generation!

This image has an empty alt attribute; its file name is E65gO0tVgAgIxGI.jpg
Hopper (Via: @Harukaze)

For Ada-based GeForce RTX 4080 and 4090, we’re looking at twice as much performance as the contemporary Ampere parts, with an FP32 core count of up to 18,432. The AD102 flagship is rumored to feature 144 SMs distributed across 12 GPCs. That results in a 71% gain in raw compute performance (66 TFLOPs) over the GA102. Add to that the fact that Team Green is leveraging TSMC’s advanced N5 process node for Lovelace, and the resulting frequency boost should net a ~2.2x gain over the RTX 3090.

The bus width of the RTX 4080 and 4090 should be the same as their predecessors (384-bit and 320-bit), paired with faster GDDR6X chips, resulting in even higher memory bandwidth. The RTX 4090 should pack up to 24GB of GDDR6X memory, and clock speeds rivaling the Navi 31 parts (2.3-2.5GHz). As for the overall performance throughput, we’re looking at around 90 TFLOPs of FP32 performance, a big step up over the 3090’s 36 TFLOPs.

If the AD102 includes a total of 18,432 cores, we can expect roughly 16,000 cores on the RTX 4080 and 18,000 on the RTX 4090. According to Greymon and Kopitekimi, the Lovelace-based RTX 4080/4090 will draw as much as 500W of power under load. This is despite the use of one of the most advanced and efficient process nodes on the planet. However, running the numbers kind of adds up.

The AD102 flagship is expected to feature 144 SMs/12 GPCs, a gain of 71% in terms of logic compared to the GA102. Even if TSMC’s N5 node is 30% more power-efficient than Samsung’s 8nm LPP node, we’re looking at an increase of at least 80% in hardware units. That should easily result in a power draw at least 30-50% more than the top-end RTX 3080/3090 Ampere offerings.

Source: DigiTimes

Areej

Computer hardware enthusiast, PC gamer, and almost an engineer. Former co-founder of Techquila (2017-2019), a fairly successful tech outlet. Been working on Hardware Times since 2019, an outlet dedicated to computer hardware and its applications.