SIMD vs SIMT vs SMT: What’s the Difference Between these CPU Execution Models?

Multiple execution models utilized in modern microprocessors. Out of these, three are most popular: SIMD (Single Instruction Multiple Data), SIMT (Single Instruction Multiple Threads) and SMT (Simultaneous Multithreading). In this post, we have a look at each of them and distinguish them from each other.

SIMD: Single Instruction Multiple Data

SIMD or Single Instruction Multiple Data is the earliest execution model used in almost every modern CPU and GPU. As the name suggests, it works by employing a single instruction on multiple data sets. What that means is: One particular instruction is executed by multiple Execution units on multiple data sets. The EUs may be ALUs (Arithmetic Logic Units) or FPUs (Floating Point Units), but the key point here is that they all receive the same instruction from a shared Control Unit and then execute it on multiple different data sets.

This improves data-level parallelism (not instruction level or concurrency) by letting the CPU perform identical tasks on different operands. In the above example, you can see that the lines of code include many functions that require the same operator. In the first column, all four lines basically involve the addition to two different matrices. SIMD allows all four to be executed in the same clock cycle. One important thing to note here is that SIMD uses execution units, not threads or cores.

SIMT: Single Instruction Multiple Threads

SIMT is the thread equivalent of SIMD. Where the latter uses Execution Units or Vector Units, SIMT expands it to leverage threads. In SIMT, multiple threads perform the same instruction on different data sets. The main advantage of SIMT is that it reduces the latency that comes with instruction prefetching.

SIMD is generally used in CPUs while SIMT is used in GPUs

Every time the GPU needs to execute a particular instruction, the data and instructions are fetched from the memory and then decoded and executed. In this case, all the data sets (up to a certain limit) that need the same instruction for execution are prefetched and executed simultaneously using the various threads available to the processor.

SMT: Simultaneous Multi-Threading

SMT or Simultaneous Multithreading allows a CPU core to leverage multiple threads at a time. Although theoretically, you can have up to 8 threads per core via SMT, it’s only feasible to have two. SMT is analogous to having two cargo belts at the airport luggage sorting, and one person sorting them.

There will be times when one belt is empty while the other isn’t. In this instance, the person will switch to the other belt and vise versa. This is similar to how SMT operates in CPUs. There are times when there’s memory delay or a cache miss, at this time, the CPU core will stay idle. SMT aims to take advantage of this to fully saturate the CPU time.

The CPU core architecture needs to be modified internally to support SMT. This usually involves increasing the register size (and in some cases the cache size as well) to allow the distribution of resources among the two threads.

Although modern CPUs leverage SMT quite well, there are still times when it’s redundant. That is mostly in latency intensive tasks where there is little to no delay in the pipeline. SMT can even hamper performance in applications that are resource intensive (register and cache). Here the two threads are forced to compete against one another for resources, leading to reduced performance in many cases.



Computer Engineering dropout (3 years), writer, journalist, and amateur poet. I started Techquila while in college to address my hardware passion. Although largely successful, it suffered from many internal weaknesses. Left and now working on Hardware Times, a site purely dedicated to. Processor architectures and in-depth benchmarks. That's what we do here at Hardware Times!
Back to top button