AMD announced its Ryzen 5000 CPUs based on the “Zen 3” core architecture a few days back, but most of the shared details were with respect to expected performance across gaming and other mainstream workloads. The technical data was quite scant, but we still got a look at what has changed with Zen 3 compared to Zen 2. Keep in mind that the changes mentioned in this article are the expected architectural refinements, and haven’t been confirmed by AMD specifically. It’s a rough estimate:
Unified L3 cache and 8 Core CCX: This is one change that has been confirmed by AMD. Zen will use 8-core complexes (CCXs), with each core having access to 32MB of L3 cache, improving the overall latency and cache bandwidth. Furthermore, the inter-core latencies between the eight cores will also be much lower as each core will now have direct access to the other seven cores, rather than having to rely on the inter-die Infinity Fabric connection. This should directly result in significantly improved gaming performance and workloads that are latency sensitive.
Branch Predictor: The other major improvement is with respect to the branch predictor. AMD claims a “Zero-bubble” branch prediction with Zen 3. In case you didn’t know a bubble in the core pipeline is when it’s idle either due to a memory stall or incorrect branch prediction wherein the entire pipeline has to be flushed and loaded with a new set of instructions, wasting precious execution cycles. AMD introduced the TAGE predictor for L2 based predictions while the Hashed Perceptron Predictor was retained for L1. Considering that the former is already quite efficient, it’s likely that the latter or perhaps both will be improved.
Micro-op Cache: AMD doubled its micro-op cache from 2K to 4K going from Zen to Zen 2, and there’ll certainly be an upgrade in this department as well, albeit a smaller one. We’ll likely see a 5K or 6K op-cache.
Wider decode and prefetch: As the op-cache and cache prefetching is being improved, it’s almost certain that the instruction fetch and queue will be expanded as well along with the L1 instruction cache (48KB). There’s a chance that the decoder will also be changed to a 5-way issue from a 4-way issue in Zen and Zen 2, but I can’t say for sure. As the process node is unchanged, the retire buffer and register renaming will mostly be the same.
Wider Execution Engines: As AMD mentions in the first slide, the INT and FP issue has also been expanded. Don’t expect AVX-512 support, but there should be a wider execution window, especially in the integer cluster. It’s very likely that the AGU will be expanded/refined to allow for a higher number of loads/stores.
Higher Load|Store: As already explained in our Skylake/Ice Lake/Zen 2 comparison, AMD was lagging in this department. It’ll most likely be brought on par with Ice Lake with 2 loads and/or stores per cycle, and a wider load/store queue.
That about all I can say from the info provided by AMD on the announcement day. We should receive more details regarding the Zen 3 core architecture as the launch on 5th November draws near. We’ll keep you posted 🙂