Multiple execution models utilized in modern microprocessors. Out of these, three are most popular: SIMD (Single Instruction Multiple Data), SIMT (Single Instruction Multiple Threads) and SMT (Simultaneous Multithreading). In this post, we have a look at each of them and distinguish them from each other.
SIMD: Single Instruction Multiple Data
SIMD or Single Instruction Multiple Data is the earliest execution model used in almost every modern CPU and GPU. As the name suggests, it works by employing a single instruction on multiple data sets. What that means is: One particular instruction is executed by multiple Execution units on multiple data sets. The EUs may be ALUs (Arithmetic Logic Units) or FPUs (Floating Point Units), but the key point here is that they all receive the same instruction from a shared Control Unit and then execute it on multiple different data sets.
This improves data-level parallelism (not instruction level or concurrency) by letting the CPU perform identical tasks on different operands. In the above example, you can see that the lines of code include many functions that require the same operator. In the first column, all four lines basically involve the addition to two different matrices. SIMD allows all four to be executed in the same clock cycle. One important thing to note here is that SIMD uses execution units, not threads or cores.
SIMT: Single Instruction Multiple Threads
SIMT is the thread equivalent of SIMD. Where the latter uses Execution Units or Vector Units, SIMT expands it to leverage threads. In SIMT, multiple threads perform the same instruction on different data sets. The main advantage of SIMT is that it reduces the latency that comes with instruction prefetching.
Every time the GPU needs to execute a particular instruction, the data and instructions are fetched from the memory and then decoded and executed. In this case, all the data sets (up to a certain limit) that need the same instruction for execution are prefetched and executed simultaneously using the various threads available to the processor.
SMT: Simultaneous Multi-Threading
SMT or Simultaneous Multithreading allows a CPU core to leverage multiple threads at a time. Although theoretically, you can have up to 8 threads per core via SMT, it’s only feasible to have two. SMT is analogous to having two cargo belts at the airport luggage sorting, and one person sorting them.
There will be times when one belt is empty while the other isn’t. In this instance, the person will switch to the other belt and vise versa. This is similar to how SMT operates in CPUs. There are times when there’s memory delay or a cache miss, at this time, the CPU core will stay idle. SMT aims to take advantage of this to fully saturate the CPU time.
The CPU core architecture needs to be modified internally to support SMT. This usually involves increasing the register size (and in some cases the cache size as well) to allow the distribution of resources among the two threads.
Although modern CPUs leverage SMT quite well, there are still times when it’s redundant. That is mostly in latency intensive tasks where there is little to no delay in the pipeline. SMT can even hamper performance in applications that are resource intensive (register and cache). Here the two threads are forced to compete against one another for resources, leading to reduced performance in many cases.