Intel’s Sunny Cove microarchitecture featured in the 10th Gen Ice Lake processors is the first update to the Core design in over four years (Skylake was the last one). Lack of competition and a mature 14nm node gave the company little reason to innovate…until now that is. Just a few years back, Intel had a massive IPC advantage over AMD and Bulldozer’s shared logic design meant that the multi-threaded capabilities of the FX processors weren’t impressive either. Now though, you’ve got the Ryzen 3000 chips with nearly the same IPC as competing Coffee Lake chips (if not better) and multi-threaded performance vastly superior at lower price points.
Basically the tables have turned in a short span of time, and Intel was caught with their pants down. The Ice Lake chips, although still limited to the low-power notebook space bring some healthy improvements along with them. Keep in mind that 10nm CPUs are codenamed Ice Lake while the microarchitecture powering them is Sunny Cove.
Sunny Cove Front End: Increased Micro-op Cache and Larger Prefetch
Unlike AMD, Intel is rather stingy with details regarding the front end. In the case of Sunny Cove, the front end is largely similar to Skylake. According to Intel, the branch predictor has been fine-tuned and the load/store latencies are slightly better than the preceding design. The branch predictor and prefetcher are also reportedly larger, but there are no concrete details on what has changed.
The primary change to the front-end is with respect to the cache sizes. The L1 Data cache is now 50% larger with a 12 way 48KB set-associative setup, up from 8-way 32KB in Skylake. The instruction cache is unchanged at 32KB. AMD’s Zen 2 core, on the other hand, has a 32KB L1 instruction and data cache each.
The L2 cache has also been increased from 256KB 4-way on Skylake to 512KB 8-way on Sunny Cove. This puts it on par with Zen 2’s L2 cache, at least in terms of size. The latencies will still vary.
The most important change that Sunny Cove’s front-end has undergone is with respect to the micro-ops cache. It has been increased from 1.5k entries to 2.25k entries. This was a much-needed improvement, as AMD already has a micro-op size of 4k entries with Zen 2. Overall, these cache size increments will drastically improve cache hit rates.
Backend: Wider Execute, Reorder and AVX 512
The re-order/retire buffer has also been massively overhauled. Sunny Cove has a huge 352 entry reorder buffer for micro-op renaming and reallocation (plus retirement). Skylake has a 224 entry reorder buffer, and so does Zen 2. However, in the case of the latter, the retire queue is separate from the main execution pipeline, and there are individual rename buffers for the integer and FP pipelines. Overall, Zen 2’s dispatch can send 6 micro-ops to the integer rename buffer, four to the FP rename and 8 to the 224 entry independent retire queue from where they are sent to either of the other two.
Ice Lake’s 10nm Sunny Cove has 10 execution ports, four going to the ALUs, two to the Data Store and the remaining four to the Address Generation Units (AGUs) with two loads and two stores. This allows for two loads/stores per clock cycle, a 2x improvement over Skylake.
Overall, Sunny Cove can send ten micro-ops from the reorder buffer, a 25% increase over Sky Lake’s 8. Now, moving to the Execution Units. Ice Lake supports native AVX 512 execution (without division into micro-ops) on the client platform. Sunny Cove can do one 512-bit FMA (fused multiply and add) or two 256-bit FMA per cycle. The integer execution gets some additional units in the form of MUL, MULHi and iDIV. The LEA units have also been doubled. AnandTech has a neat comparison of Sky Lake and Sunny Cove:
Comparing the Sunny Cove core to Zen 2, we can see that like Sky lake, it lacks AVX-512. However, it still does support four 256-bit instructions per cycle (2 MUL and 2 ADD) along with four INT executions in parallel. Zen lacked native support for AVX 256 and had to rely on breaking the instructions into two micro-ops.
Zen 2 matches Sunny Cove in terms of the Load Store lanes (2 each) but the LS block is more complex or at least better detailed than Sunny Cove and Skylake.
Conclusion: 18% Higher IPC
In essence, these improvements have resulted in an IPC increase of 18% (on average) with the 10th Gen Ice Lake chips. This has allowed Team Blue to retain its IPC lead over AMD, though with Zen 2 it’s nowhere as large as it used to be. We’re quite interested in seeing what Intel does with the 11th Gen Tiger Lake CPUs which will feature an updated core design, Willow Cove.
Diagram Credits go to WikiChip and Hiroshige Goto