We still find this incredibly hard to believe. But Intel GPUs are here and today, Intel unveiled details for their Ponte Vecchio Xe GPU. Before you start salivating, Ponte Vecchio is targeted towards the HPC market. Much like how everyone got excited about GP100 in 2016 (and then disappointed again), Ponte Vecchio’s not for gaming. It might not even have display outputs. That doesn’t make it any less interesting, though. Intel’s Xe gaming SKUs are almost certainly built on a similar platform. Most of what Intel said about Ponte Vecchio at their HPC Developer Conference today will likely apply to consumer Xe.
Ponte Vecchio: An MCM Design
Ponte Vecchio represents the third and highest tier of Intel Xe products in the product stack and will have a TDP in excess of 250W (which makes sense for an HPC part). What’s interesting, though, is that Intel stated that Ponte Vecchio would scale up to “1000s of EUs,” over multiple discrete GPUs, connected over what they called the “Rambo Cache,” a massive unified cache accessible to GPUs and the CPU.
The EUs themselves would be connected over the XE Memory Fabric. So high speed interconnects, large cache, and multiple discrete cores? This sounds a lot like what AMD’s doing in the processor space with Ryzen. But as Moore’s law collapses and it becomes harder and harder to shrink transistors down to…well…literally nothing, going wide–high efficiency scaling across discrete cores–becomes the only viable option.
Xe Memory Fabric, Rambo Cache and 40x more FP64 Performance
As already mentioned above, the HPC space is the main focus of Ponte Vecchio. As per Raja Koduri, Intel’s Xe GPUs will offer 40x better double-precision (FP64) compute performance. This incredible feat will be achieved using three core technologies:
Scalability: Intel plans to build systems with multiple GPUs and CPUs working in tandem resulting in Compute Units in excess of a grand. This will deliver never before seen levels of FP64 compute performance crucial in HPC and Data Center workloads. Two CPUs (Sapphire Rapids) and six Ponte Vecchio GPUs in one node.
Xe Memory Fabric (XEMF): Intel is taking a page from AMD’s rulebook here and connecting these CUs (and GPUs) to a new scalable memory fabric (not Infinity Fabric) dubbed XEMF. I’m not sure if it will be used to connect CUs or just the memory to them or both. CXL will also be used to connect the CPUs and GPUs.
At the heart of Xe architecture, we have a new fabric called XEMF. It is the heart of the performance of these machines. We called it the Rambo Cache. It is a unified cache that is accessible to CPU and GPU memory.Raja Koduri
Rambo Cache: To make up for the latency penalty induced by the Xe Memory Fabric, these GPUs will also include a large unified cache known as Rambo Cache. This cache will be arranged using the Foveros packaging technology
There’s also HBM memory which will be paired alongside these GPUs for maximum bandwidth. As Foveros connects the Rambo Cache, EMIB will be used to connect the HBM memory to the GPUs.
These Ponte Vecchio GPUs will be paired with Sapphire Rapids-based CPUs, both based on 7nm node and the resultant SuperComputer will be called Aurora. It will feature all of Intel’s technologies (both old and new) from Optane to Foveros to Xe as well as the interconnects such as CXL, XEMF and EMIB in one machine.
When exactly will we see Ponte Vecchio? Intel wants to have it powering the Aurora Supercomputer by 2021. Consumer Xe is expected to arrive in 2020, so we’ll hopefully have an idea of where Intel is in the GPU market by then.