NVIDIA plans to launch its next-gen GeForce RTX 40 series graphics cards later this year starting with the RTX 4090 in September. Based on the AD102 die, this monster flagship will pack a whopping 16,128 FP32 cores across 126 SMs, 63 TPCs, and 11 GPCs. The remaining one and a half GPCs and the accompanying TPCs/SMs will be disabled to improve yields. Coming to the SM (Compute Unit), NVIDIA has a habit of rearranging it every generation, especially the core counts and the ratio of the INT32:FP32 cores per cluster. It’s unclear what changes will be made in this generation, but there are two possible scenarios:
The first possibility is the separation of the INT32 and FP32 datapaths or clusters. In this case, each SM would get 128 FP32 cores, and 32 INT32 divided into four separate clusters or sub-cores. The separation of the integer and floating-point ALUs would increase the overall FP32 throughput, thereby improving the gaming performance. The L1 data cache will be increased to 192KB, in addition to the L2 which will be pushed all the way to 96MB.
The second possible design is essentially the doubling of everything in the SM, by connecting two of them using the Asynchronous Media Accelerator. Twice as many cores, twice as many dispatches units, twice as many cache chunks, and twice as many schedulers. At least on paper. In reality, this would allow the warps running on one SM to access the other’s resources, thereby improving utilization and scalability. This is similar to how AMD coupled two CUs in a WGP as the basic unit of scheduling on the GPU using the wave32 format.
The L2 is partitioned into two parts shared across four GPCs each. The LLC has grown from just 6MB on Ampere to 96MB on Lovelace. The bus width should remain the same at 384-bit across 12 controllers. The lower-end SKUs such as the AD103 and AD104 may get slimmer buses and fewer controllers (256-bit).
Coming to the performance, the GeForce RTX 4090 is speculated to be at least twice as fast as the RTX 3090 if not faster. It’ll draw somewhere between 450W to 600W to achieve this performance target. Expect significant improvements to the ray-tracing performance, both in terms of efficiency and RT Core counts. Finally, the price of the next-gen GeForce flagship is likely to remain in the $1,500-$2,000 range. The launch, as already stated will take place in September during an event led by the CEO Jensen Huang.
NVIDIA RTX 3090 Specs Summary:
- x11 GPCs, x63 TPCs (6 TPCs/GPC), x2 SMs/TPC, x126 SMs per GPU.
- x128 FP32 CUDA Cores per SM, x16128 FP32 CUDA Cores per GPU.
- x4 4th Gen Tensor Cores per SM, x504 per GPU.
- x1 3rd Gen RT Cores per SM, 126 per GPU.
- 96MB L2 cache.
- x12 32-bit Memory Controllers.
- 24GB GDDR6X 21Gbps Memory.
- TDP: 450-600W.
- Price: $1,500-2,000.
- Speculated launch: September 2022.