One of the primary subjects of Intel’s 2020 Architecture Day was the Gen12 Xe graphics powering the upcoming Tiger Lake-U SoCs and subsequent GPUs built atop the Xe design. Dubbed as the Xe-LP, this graphics architecture will form the building blocks of Intel’s Graphics Odyssey.
Comparing Gen11 and Gen12, there’s not a whole lot of difference between the two. The basics are the same, and the graphics pipeline is largely unchanged. What’s different is the size: Everything from the shader (EU) count, ROPs, cache, geometry, texture units as well as the clocks are notably higher.
Overall, the Gen12 GPU is 50% larger than Gen11 and takes nearly half of the SoC space.
The changes made to Gen12 are much similar to the architectural tweaks AMD recently made to give birth to the new Navi GPUs. The number of Sub-Slices has gone down, but each one packs more oomph than Gen11.
Gen11 had 8 EUs per SS while Gen12 has twice as much (16), but at the same time, the total number of sub-slices has been reduced from eight to six. Furthermore, two sub-slices now share one geometry front-end unit instead of four, increasing the throughput from 1 triangle per clock per unit to 2, basically doubling it.
Gen12: Gen11 x 1.5
The compute performance and the GPU backend all get a 50% increase in raw performance, in line with the expanded hardware capabilities.
Looking at the sub-slices in detail, one of the major additions is the inclusion of L1/Texture cache (64KB per SS). Like NVIDIA’s shared L1 cache, this can be dynamically shared between the L1 and texture cache, as per requirements.
With Gen11, each EU was composed of two SIMD units with four pipelines each. One of the ALU groups ran integer or floating-point loads while the other handled floating-point and/or special functions. Unlike NVIDIA’s warps and AMD’s waves, Intel’s wavefronts are much smaller at 8 (SIMD8). Being vector in nature, each wavefront or thread as Intel calls it takes multiple cycles to execute. Of course, similar to GCN, multiple instances are loaded at a time but that becomes detrimental in the case of applications with shorter dispatches.
Surprisingly, like AMD’s Dual-Compute Unit design or Work Group Processor (WGP), Intel’s Gen12 graphics also makes two EUs as the basic functional block. Two EUs share the thread control unit amongst themselves, co-ordinated by software score boarding (thread control is handled by software). This part is in stark contrast to AMD’s design where the Command Processor handles most of it.
Of CUs and EUs: Wider SIMDs and Shared EUs
Continued on next page…