AMD Instinct MI300 May Feature Over 20,000 Cores Across 4 Chiplets, More than NVIDIA Hopper’s 18,000 Shaders

With the launch of the MI200 accelerator, AMD leapfrogged past NVIDIA’s Tesla and Tensor core offerings in one fell swoop. Packing a chiplet based design, incorporating up to 14,080 stream processors across 220 CUs and two dies/chiplets, it is the most advanced accelerator ever designed. Each chiplet features eight shader engineers (total 16) with 16CUs (capable of full-rate FP64 and packed FP32 compute) each and a 2nd Gen Matrix Engine for mixed-precision compute (FP16 and BF16).

Just a few months after the launch of the MI200 and the MI250X, AMD is adding support for the next-gen MI300 range to its ROCm platform. The device IDs of the four chiplets powering the accelerator have been spotted. These are the 0x7408, 0x740C, 0x740F, and 0x7410. This basically confirms earlier rumors stating that the MI300 would feature up to four chiplets, significantly pushing the compute envelope. The MI200 is already a whopping 5x faster than AMD’s A100 Ampere accelerator in FP64 workloads. Doubling the compute capabilities could make it over 10x faster.

Four chiplets mean an overall core count of 28,000, with each die packing 110 Compute Units which in turn, combines 64 ALUs each. 440 CUs across four dies may be a bit too much as one of the key objectives of an MCM approach is to improve yields and bring down production costs. However, even with 80 CU dies, we get a total of 320 CUs across four dies which totals up to a motherlode of 20,480 cores.

In comparison, NVIDIA’s next-gen GH100 “Hopper” accelerator will be limited to 144 SMs or 18,432 FP32 cores, and half as many FP64 cores. It’s expected that Hopper will consist of two GH100 dies paired with over 128GB of 1600MHz HBM3 memory across six 1,024-bit stacks.

Data Center GPUNVIDIA Tesla P100NVIDIA Tesla V100NVIDIA A100NVIDIA H100
GPU CodenameGP100GV100GA100GH100
GPU ArchitectureNVIDIA PascalNVIDIA VoltaNVIDIA AmpereNVIDIA Hopper
SMs5680108144x 2
TPCs28405472x 2
FP32 Cores / SM64646464x
FP32 Cores / GPU3584512069129216x 2
FP64 Cores / SM32323232
FP64 Cores / GPU1792256034564608x 2
INT32 Cores / SMNA646464
INT32 Cores / GPUNA512069129216x 2
Tensor Cores / SMNA842?
Tensor Cores / GPUNA640432?
Texture Units224320432576x 2
Memory Interface4096-bit HBM24096-bit HBM25120-bit HBM26144-bit HBM3?
Memory Size16 GB32 GB / 16 GB40 GB128GB?
Memory Data Rate703 MHz DDR877.5 MHz DDR1215 MHz DDR1600 MHz DDR?
Memory Bandwidth720 GB/sec900 GB/sec1555 GB/sec?
L2 Cache Size4096 KB6144 KB40960 KB96000 KB?
TDP300 Watts300 Watts400 Watts500W?
TSMC Manufacturing Process16 nm FinFET+12 nm FFN7 nm N75 nm N5

The GPU in the Linux patch is called Aldebaran with the device ID GF940. Although similar to the GFX90a of the MI250 series, it’s probably a member of the MI300 lineup.

Via: Coelacanth’s Dream


Computer hardware enthusiast, engineering dropout, and PC gamer. Former co-founder of Techquila (2017-2019), a fairly successful tech outlet. Been working on Hardware Times since 2019, an outlet dedicated to computer hardware and its applications.
Back to top button