Intel Sapphire Rapids-SP to Feature AVX-512, AMX, HBM2e Features as Add-On "Paid DLC" Functions

Intel is gearing up to launch its server and HEDT processors later this year based on the potent Golden Cove core architecture. The 4th Gen Xeon Scalable lineup, codenamed Sapphire Rapids-SP will be the first chiplet (MCM or tiled, call it what you want) based design featuring up to 56 cores across four 15 core tiles. Each die will have one core fused off to improve yields. Like Ice Lake-SP, AVX-512 and the newly added AMX instructions will be Intel’s key advantages against AMD’s Epyc Milan, Milan-X, and the soon to be launched Genoa lineups.

Shit… The AVX512 and AMX instruction sets are paid DLC items… pic.twitter.com/i9JROjciEO

Related Articles

ASUS X670/B650/A620 Motherboards are Ready for AMD’s Ryzen 9000 “Zen 5” CPUs
April 21, 2024

Intel Graphics Driver Boosts Performance by up to 48% on Meteor Lake Processors
April 21, 2024

— 结城安穗-YuuKi_AnS🍥 (@yuuki_ans) February 12, 2022

There’s been a major discovery though. Turns out Intel will be gating AVX-512 and AMX instructions behind a paywall of sorts, much like paid DLCs in video games. The feature known as Software Defined Silicon “SDSi” will be enabled primarily on server and data center nodes via the Linux kernel, allowing vendors to offer special “accelerators” or “add-ons” to clients for a price.

SapphireRapids with the HBM2e version will have a set of embedded OS features that can be unlocked for a fee, perhaps loading the system into HBM2e memory.
(The HBM2e on the processor has multiple control modes)
— 结城安穗-YuuKi_AnS🍥 (@yuuki_ans) February 12, 2022

AVX-512 is primarily leveraged in certain data center workloads with densely packed instructions, allowing for a doubling in execution throughput compared to AVX2. On the downside, not many applications support AVX-512, and even if they do, the added performance comes at the cost of significantly increased power draw.

AMX, on the other hand, is similar to NVIDIA’s Tensor cores, accelerating matrix multiplication and other related 2D data types. These run in parallel with AVX-512 and other traditional x86 instructions without affecting the primary pipeline.

According to the source (YuuKi_Ans), in addition to AVX-512 and AMX, certain features of the HBM2e on-die memory will also be an add-on feature. We already know that the HBM memory will be usable in flat mode or cache mode, in addition to traditional. It’s likely that these features will be gated out of the box.

QY36 (maybe 8480)
56Core TDP:350w
2.3GHz – 3.6GHz（Early ES）
105MB L3 Setting：C2
— 结城安穗-YuuKi_AnS🍥 (@yuuki_ans) February 12, 2022

Looking at the memory latency of the sample shared by the source, it’s clear that Intel has massively beefed up the memory bandwidth, making it (nearly) twice as wide as AMD’s Epyc Milan-X and 50% faster than its Ice Lake predecessor. It’s unclear whether this is the result of the faster/larger cache or the inclusion of HBM2e memory. The same can be said for the cache, albeit the latency has taken a hit across the board.