AMD Data Center Revenue Crosses 1 Billion for the First Time in Over a Decade, Courtesy of Trento and MI200

AMD reported yet another record-breaking quarter earlier this week, with a yearly revenue increase of 54% and operating income of +101%. While both the CG and EES segments did well across the board, the data center was the star of the show. AMD’s data center CPU and GPU business (Epyc and Instinct) crossed the $1 billion mark for the first time in more than a decade.

According to stats from NextPlatform, AMD’s data center business grew 38% sequentially from Q2 and Q3 2021, and more than doubled compared to the previous year. Digging deeper, we know that the data center GPU “Instinct” sales saw an increase of more than 100% to reach $164 million, leaving the Epyc processor sales with the bulk of the revenue. The latter rose by 30% quarterly and 105% YoY to yield an overall revenue of $914 million.

The bulk of the increase in CPU and GPU enterprise parts comes from the adoption of the latest Epyc and Instinct processors by hyperscalers and cloud providers. AMD offers the kind of compute density and I/O that Intel simply can’t for the time being. More cores, more PCIe lanes, wider memory bandwidth, etc.

The custom Epyc “Trento” CPUs and the chiplet based Instinct MI200 accelerators have already started to ship to select customers. The official release for these (and the Zen 3D-based Milan-X processors) is set for early November. Offering a uniform memory architecture with coherency is one of the primary advantages of this custom design.

Trento is based on the same Zen 3 core architecture as Milan, with some modifications allowing AMD to pair each chip with four Instinct MI200 GPUs using the Infinity Fabric 3.0 interconnect. The exact specifications of the MI2150X have also been shared. It’ll consist of a total of 110 CUs with a boost clock of 1.7GHz. This means that we’re likely looking at eight memory stacks, each featuring eight 2GB dies. This indicates a total bus width of 8,196-bits (1,024-bits x8 controllers), resulting in an overall bandwidth of 3.68 TB, roughly the same as the HBM variants of Sapphire Rapids-SP.

At the heart of the GPU core, there will be two 55 CU chiplets, resulting in an overall compute strength of 110 CU, with an impressive boost clock of 1.7GHz. Since Alderbaran can execute double-precision instructions (FP64) at native speeds, this will result in a double-precision throughput of 47.9 TFLOPs, an insane four times more than its predecessor, the MI100.

Even NVIDIA’s Ampere-based A100 Tensor core accelerator is capable of “only” 19.5 TFLOPs of FP64 compute. In terms of mixed-precision compute, we’re looking at 383 TFOPs of FP16 and BFLOAT16. In comparison, the MI100, topped out at “just” 184 and 92 TFLOPs in the two data types, respectively.

The Frontier SC will be based on a unique design with each Epyc Trento CPU paired with four MI200 accelerators using the IF 3.0 interconnect, with each GPU directly connected to the CPU and every other GPU. This mesh design is what really makes the Trento-MI200 combo unique, as each chip has access to the data stored in the associated memory in a coherent manner, completely eliminating the need for direct management of memory copies on the program side.

Although Trento is being designed for Cray, it will also be available to other OEMs/ODMs. The main advantage of this platform is scalability, bandwidth, latency, and of course, the relatively simpler programmability thanks to the use of the Infinity Fabric 3.0 interconnect and a unified memory pool across the CPU and GPU. To reach the 1.5+ exaflops performance, Frontier will combine more than 9,000 Trento Epyc CPUs and more than 36,000 Instinct MI200 accelerators.

In the last five years or so, AMD’s data center CPU sales have grown by a factor of 592x to $914 million (105.6% YoY) while the accelerator sales have soared by 6.8x to $164 million (152.3% YoY). With both Trento, Milan-X, and then Genoa slated to launch in the coming year, these figures are only going to swell. The MI200 meanwhile is going to be the first chiplet based GPU accelerator, bringing with it never-before-seen levels of FP64 compute capabilities. [Source]