Intel’s 10th Gen Ice Lake CPUs are the first major update to the company’s Core architecture since 2016’s Skylake. All the processor lineups from Kaby Lake, Coffee Lake as well as the “new” Comet Lake chips leverage the 14nm Skylake core. In this post, we compare Intel’s 10nm Ice Lake CPUs against the 14nm Comet Lake chips as well as AMD’s Ryzen 3000 processors. We’ll have a look at the core architectures powering these CPUs, namely Sunny Cove, Skylake and Zen 2, respectively and compare the differences between them.
Intel’s 10th Gen mobile lineup is composed of Comet Lake and Ice Lake CPUs. While the former features the Skylake core, Ice Lake is based on the newer Sunny Cove design.
10nm Sunny Cove vs 7nm Zen 2: Front End and Branch Predictors
Unlike AMD, Intel is rather stingy with details regarding the front end. In the case of the latest Sunny Cove core, the front end is largely similar to Skylake. According to Intel, the branch predictor has been fine-tuned and the load/store latencies are slightly better than the preceding design. The branch predictor and prefetcher are also reportedly larger, but there are no concrete details on what exactly has changed.
The primary change to the front-end is with respect to the cache sizes. The L1 Data cache is now 50% larger with a 12 way 48KB set-associative setup, up from 8-way 32KB in Skylake. The instruction cache is unchanged at 32KB. AMD’s Zen 2 core, on the other hand, has a 32KB L1 instruction and data cache each, same as Skylake.
The decoder is mostly unchanged in Sunny Cove with the new 10nm core packing the same old 5-way decoder as Skylake. The instruction queue size is also the same with 50 entries (25×2) while the instruction fetch is also unchanged with six issues per cycle.
AMD’s Zen 2 architecture differs greatly from the Core design here. While the data loaded from the L2 cache is around 64KB per cycle for Intel, the Zen 2 core is limited to 32KB per cycle. At the same time, the instruction fetch is twice as wide compared to both Intel Skylake and Sunny Cove (32B vs 16B), but the decoder is slightly narrower with four entries per cycle.
The reason for this is that the two core architectures have fundamentally different front ends. The decoder on the Zen 2 core sends four instructions to the micro-ops queue while the op-cache sends another four received from the branch predictor.
In the case of Skylake and Sunny Cove, the decoder sends five while the branch predictor sends another six instructions to the allocation queue. The Microcode ROM also exists at different stages of the pipeline for the two designs. In Intel’s Sunny Cove and Skylake, it’s paired with the decoder while in Zen 2 it’s paired with the micro-ops queue.
On the Intel side, the L2 cache has also been doubled from 256KB 4-way on Skylake to 512KB 8-way on the 10nm Sunny Cove core. This puts it on par with Zen 2’s L2 cache, at least in terms of size.
The most important change that Sunny Cove’s front-end has undergone is with respect to the micro-ops cache. It has been increased from 1.5k entries in Skylake to 2.25k entries in Ice Lake (SC). This was a much-needed improvement, as AMD already has a micro-op size of 4k entries with Zen 2. These cache size increments will drastically improve cache hit rates.
Moving down you have the allocation queue. Both designs send up to six micro-ops to the backend for renaming/reordering and execution.
AMD vs Intel Core Backend: AVX256 vs AVX512
The re-order/retire buffer has been massively overhauled with Sunny Cove. The new 10nm core has a huge 352 entry reorder buffer for micro-op renaming and reallocation (plus retirement). Skylake had a 224 entry reorder buffer, and so does Zen 2.
The Zen 2 core has individual rename buffers for the integer and FP instructions. Furthermore, the retire queue is shared between the FP and integer pipelines and it’s separate from the rename buffer
Overall, Zen 2’s dispatch can send 6 micro-ops to the integer rename buffer, four to the FP rename and 8 to the 224 entry independent retire queue which is shared between the two. On the Intel side, there’s a common reorder buffer for INT and FP that receives six micro-ops from the front-end.
Ice Lake’s 10nm Sunny Cove has 10 execution ports, four going to the ALUs, two to the Data Store and the remaining four to the Address Generation Units (AGUs) with two loads and two stores. This allows for two loads/stores per clock cycle, a 2x improvement over Skylake.
Overall, Sunny Cove can send ten micro-ops from the reorder buffer, a 25% increase over Sky Lake’s 8. Now, moving to the Execution Units. Ice Lake supports native AVX 512 execution (without division into micro-ops) on the client platform. Sunny Cove can do one 512-bit FMA (fused multiply and add) or two 256-bit FMA per cycle. The integer execution gets some additional units in the form of MUL, MULHi, and iDIV, but the number of INT instructions executed per cycle is still four. The inclusion of the iDIV unit should help significantly reduce integer division time which usually takes several dozen clock cycles.
AnandTech has a neat comparison of Sky Lake and Sunny Cove:
Comparing the Sunny Cove core to Zen 2, we can see that like Skylake, it lacks native AVX-512 support. However, it still does support four 256-bit instructions per cycle (2 MUL and 2 ADD) along with four INT executions in parallel. Zen lacked native support for AVX-256 and had to rely on breaking the instructions into two micro-ops.
Sunny Cove has a much wider load and store buffer compared to Skylake and Zen 2. It has a total of 128 entries in the load buffer and 72 in the store buffer. Skylake, on the other hand, has 72 entries in the load buffer and 56 in the store buffer. Similar to the older Skylake core, Zen 2, has can do two loads and one store per cycle.
The load and store queues are also much narrower with 44 and 48 entries, respectively. That’s lower than both Skylake as well as Sunny Cove.
Skylake < Zen 2 < Sunny Cove
In essence, the introduction of the 10nm Sunny Cove core has resulted in an IPC increase of 18% (on average) with the 10th Gen Ice Lake chips. This has allowed Team Blue to retain its IPC lead over AMD, though with Zen 2 it’s nowhere as large as it used to be. Furthermore, poor yields mean that the Sunny Cove based Ice Lake chips are limited to quad-core designs. The recent introduction of octa-core Zen 2 processors (Renoir) not only nullifies Sunny Cove’s IPC advantage but also leaves them far behind in multi-threaded workloads which form the majority of modern applications.
Diagram Credits go to WikiChip and Hiroshige Goto