NVIDIA RTX 4090 Specs, Price, and Release Date: Next-Gen GeForce Flagship Speculation

NVIDIA plans to launch its next-gen GeForce RTX 40 series graphics cards later this year starting with the RTX 4090 in September. Based on the AD102 die, this monster flagship will pack a whopping 16,128 FP32 cores across 126 SMs, 63 TPCs, and 11 GPCs. The remaining one and a half GPCs and the accompanying TPCs/SMs will be disabled to improve yields. Coming to the SM (Compute Unit), NVIDIA has a habit of rearranging it every generation, especially the core counts and the ratio of the INT32:FP32 cores per cluster. It’s unclear what changes will be made in this generation, but there are two possible scenarios:

The first possibility is the separation of the INT32 and FP32 datapaths or clusters. In this case, each SM would get 128 FP32 cores, and 32 INT32 divided into four separate clusters or sub-cores. The separation of the integer and floating-point ALUs would increase the overall FP32 throughput, thereby improving the gaming performance. The L1 data cache will be increased to 192KB, in addition to the L2 which will be pushed all the way to 96MB.

The second possible design is essentially the doubling of everything in the SM, by connecting two of them using the Asynchronous Media Accelerator. Twice as many cores, twice as many dispatches units, twice as many cache chunks, and twice as many schedulers. At least on paper. In reality, this would allow the warps running on one SM to access the other’s resources, thereby improving utilization and scalability. This is similar to how AMD coupled two CUs in a WGP as the basic unit of scheduling on the GPU using the wave32 format.

**SM design of NVIDIA GPUs Over the Years**

The L2 is partitioned into two parts shared across four GPCs each. The LLC has grown from just 6MB on Ampere to 96MB on Lovelace. The bus width should remain the same at 384-bit across 12 controllers. The lower-end SKUs such as the AD103 and AD104 may get slimmer buses and fewer controllers (256-bit).

I mean RTX 4090>2x RTX 3090. 😅
— kopite7kimi (@kopite7kimi) June 6, 2022

Coming to the performance, the GeForce RTX 4090 is speculated to be at least twice as fast as the RTX 3090 if not faster. It’ll draw somewhere between 450W to 600W to achieve this performance target. Expect significant improvements to the ray-tracing performance, both in terms of efficiency and RT Core counts. Finally, the price of the next-gen GeForce flagship is likely to remain in the $1,500-$2,000 range. The launch, as already stated will take place in September during an event led by the CEO Jensen Huang.