AMD is all set to announce its next-generation of data center offerings in roughly 24 hours from now. We’re talking about the Zen 3D-based Milan-X processors, featuring 3D stacked V-Cache and the chiplet based Instinct MI200 GPU accelerators. Milan-X will retain the Zen 3 core and the N7 process from TSMC, and as such, can be thought of as a special refresh or niche stack, much like the upcoming Sapphire Rapids-SP with on-die HBM memory.
CPU Name | Cores/Threads | Base Clock | Boost Clock | L3 Cache (V-Cache + L3 Cache) | L2 Cache | TDP |
---|---|---|---|---|---|---|
AMD EPYC 7773X | 64/128 | 2.2 GHz | 3.5 GHz | 512 + 256 MB | 32 MB | 280W |
AMD EPYC 7573X | 32/64 | 2.8 GHz | 3.6 GHz | 512 + 256 MB | 32 MB | 280W |
AMD EPYC 7473X | 24/48 | 2.8 GHz | 3.7 GHz | 512 + 256 MB | 12 MB | 240W |
AMD EPYC 7373X | 16/32 | 3.05 GHz | 3.8 GHz | 512 + 256 MB | 8 MB | 240W |
Looking at the specs, everything’s basically identical to the vanilla Milan parts, including the base and boost clocks, the TDP as well as the L2 cache (other than the crapton of L3 cache). This means that performance gains (as already indicated earlier) will vary from application to application, and won’t be much pronounced in every workload.
The exact specifications of the MI2150X have been shared. It’ll consist of a total of 110 CUs with a boost clock of 1.7GHz. This means that we’re likely looking at eight memory stacks, each featuring eight 2GB dies. This indicates a total bus width of 8,196-bits (1,024-bits x8 controllers), resulting in an overall bandwidth of 3.68 TB, roughly the same as the HBM variants of Sapphire Rapids-SP.
At the heart of the GPU core, there will be two 55 CU chiplets, resulting in an overall compute strength of 110 CU, with an impressive boost clock of 1.7GHz. Since Alderbaran can execute double-precision instructions (FP64) at native speeds, this will result in a double-precision throughput of 47.9 TFLOPs, an insane four times more than its predecessor, the MI100.
Even NVIDIA’s Ampere-based A100 Tensor core accelerator is capable of “only” 19.5 TFLOPs of FP64 compute. In terms of mixed-precision compute, we’re looking at 383 TFOPs of FP16 and BFLOAT16. In comparison, the MI100, topped out at “just” 184 and 92 TFLOPs in the two data types, respectively.
The MI250X will have a TDP of 500W which is a bit on the high side but is likely a result of the HBM memory. The MI250 should come will a lower boost clock and possibly lesser memory as well. A scalpel to the GPU core is unlikely but I wouldn’t rule it out.
The AMD Radeon Instinct MI200 GPU will, over the next year, begin to power three massive systems on three continents: the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system.