A patent from AMD’s RTG team has surfaced that shows the company’s upcoming plans to implement the chiplet design in future GPUs (likely RDNA 4). Before we move on, please keep in mind that this is still a patent and a proof-of-concept and the final design may and will likely vary significantly.
A higher-level overview of the design in the above figure shows how AMD plans to go down this road. Similar to the Ryzen CPUs, there are identical GPU cores or chiplets connected via a high-bandwidth crossbar. This could be the Infinity Fabric or something faster. The primary chiplet is connected to the CPU and main memory. Note that it has direct access to the DRAM, courtesy of DirectX 12.
The above figure shows a four chiplet GPU connected using the High-Bandwidth crossbar. Looking at the dimensions, you can tell that the interconnect is going to take around 25% of the chiplet die space. For a four chiplet design, there are four blocks of interconnect fabric per die, indicating that each will be connected to every other die on the substrate.
Looking at the individual chiplets, you can see that the internals aren’t that different from RDNA 2: You’ve got the Work-Group Processors (WGPs) that provide the computational power, the fixed-function units for rasterization, texture mapping, tessellation, alpha-blending, etc. The L3 cache or the Infinity Cache has also been retained, meaning that the crossbar will likely be composed of the same fabric.
Similar to Imagination’s MCM GPU design, this implementation uses a master (primary) GPU chiplet that coordinates the workload among the various secondary or slave dies. Considering the drawbacks of AFR and SFR, it’s likely that this will also leverage tile-based rendering with the OS and developers seeing only a single GPU, instead of multiple chiplets.
One core difference is that unlike the former, AMD will stick to the traditional push method of load-scheduling where the CPU pushes the work to the GPUs. As you can see in the above flowchart, the master chiplet can direct the memory access on any of the other chiplets on the GPU and return it to the CPU. Considering that GPUs are more bandwidth-sensitive than CPUs and highly parallel unlike CPUs, this shouldn’t pose a major problem with the proper setup.