Intel’s adoption of a hybrid core architecture has significantly changed the roadmap of the PC chipmaking industry. More and more applications are now taking advantage of the “secondary” low-power P-cores to boost performance as well as efficiency. Of course, this approach has its own shortcomings, which Intel plans to iron out in the coming years. The first, most radical change involves replacing hyper-threading with a more efficient pseudo-multi-threaded solution: Rentable Units.
As far as the different applications running on your PC are concerned, they can’t differentiate between the physical and logical cores born out of hyper-threading. They see all as equal. On the hardware side, enabling it requires some additional registers on each core to keep track of the data associated with the logical thread. The program counter is one such register.
At any particular instance, an 8-core CPU with hyper-threading will still have only eight executing threads. The reason is that the cache (L1 and L2) and the Execution Units (ALUs) on each core can only work on one thread at a time. So, then what does hyper-threading, also Simultaneous Multi-Threading, do on a CPU?
Hyper-threading ensures that the CPU cores (mostly the Execution Units) don’t slack off. In the above figure, you can see the core utilization with and without hyperthreading. As you can see, the logical or hyper-thread takes over when the primary thread is stalled or waiting for an input, thereby utilizing the otherwise wasted CPU time, also known as bubbles.
Hyper-threading is often compared to a dock worker collecting baggage from two conveyer belts at once. He can do more work if the intervals between two baggage units on a belt are filled by the units coming from a second belt. Of course, this doesn’t always work as planned. Not all workloads have the stalls needed to make hyper-threading relevant, in which case the two threads start competing for resources, lowering the performance in the process.
An Intel patent is the first to officially hint at the coming of the
reapers, err, I mean the Rentable Units. The patent calls the RU the “Instruction Processing Circuit“, meaning that it may be called something else at release. In the below figure, you can see the difference between hyper-threading and Rentable Units and how it’s especially useful on hybrid-core processors.
On a hybrid-core CPU, the more demanding task is assigned to the P-core, leaving the others to the E-core. Being considerably faster, the former often ends up finishing the task much earlier than its counterpart, leaving the core idle for a notable period (bubble).
The Rentable Unit splits the first thread of incoming instructions into two partitions, assigning two different cores to each based on the complexity. In a simple example, the longer, more complex half would be assigned to the P-core, while the simpler part would be sent to the E-core. Whichever is more efficient.
Both the threads would be executed simultaneously on the two cores. As you can see, this approach is much more flexible than hyper-threading (thanks, Tom, you were right). Without going into too much detail, Rentable Units will employ various timers and counters to track the utilization of each P and E core, passing the next thread of instructions to whichever is idle and most suited.
Different paths will be weighted for the most optimal outcome, that is, which core is best suited for the task given current resource usage and partition complexity.
This implementation will come with its own challenges. For example, keeping track of the various thread buffers will require a fair amount of registers/cache. That said, this method looks much cleaner and more efficient than existing hyper-threading designs.