CPUs

Tracing AMD’s Journey to Zen: From FX to Ryzen; Differences Between the Two

Not only did the Bulldozer architecture have a shared L2 cache between every two cores in a module, but it was also slower than the preceding K10 chips. The L2 on the AMD FX-8150 had a latency penalty of more than twice as much compared to Intel’s competing Sandy Bridge Core i7 (lesser is better).

Although the caches on Bulldozer are larger than the ones on Sandy Bridge and Zen, they are considerably slower. Even at 4.6GHz, the FX chip’s L2 cache is 2x slower compared to the 2600K. The shared logic and cache architecture forced AMD to use a larger L2 cache, but the company went with cheaper and slower memory. Cache memory is one of the most expensive components of a CPU and considering how the Bulldozer parts perform, the company couldn’t afford to spend money on a faster, pricier cache reserve.

Narrower Front-End

Bulldozer’s front-end was another bottleneck. The same 4-way fetch and decode were shared between two cores. AMD utilized interleaved multi-threading to track individual threads and give priority to the one that needs most of the work completed. One thread could be worked upon per cycle, meaning that the other sat idle.

Sandy Bridge

Intel’s Sandy Bridge and the new Zen architecture like a standard design have a dedicated front-end for each core. The former fetches six macro-ops per cycle, sending five to the 4-way decoder.

AMD’s Zen architecture also has a four-way decoder, but it can also fetch up to eight instructions from the op-cache. Bulldozer doesn’t even have a micro-op cache.

Windows 7 Scheduler

While all the other bottlenecks are relatively well-known, this last one is often overlooked. Back in the days of Bulldozer, Windows 7 was the de-facto Windows OS. The resource allocation and scheduling of the operating system weren’t suited for the Bulldozer architecture. AMD’s octa-core models were basically four modules consisting of two cores each, sharing resources. However, Windows 8 didn’t see it that way. It saw 8 independent cores running simultaneously.

When the scheduler assigned a new thread 1b (dependent on data from 1a) to a separate module, it created a problem. AMD’s Turbo Boost technology couldn’t be enabled in this scenario. And Windows 7 did this quite often. Considering that the Bulldozer architecture was built for higher clock speeds to offset the low IPC, this was a major drawback.

AMD’s Turbo Core kicks in only when four cores from two modules are active. However, there are many times when the above-explained situation will cause four cores from three modules to be leveraged. Turbo Boost can’t be enabled in this scenario. Tests show that disabling two modules actually yields better performance than stock in many applications.

This is due to the poor IPC and single-threaded performance of the Bulldozer FX processors. Here are the complete diagrams of the Bulldozer, Sandy Bridge, Sky Lake, and Zen architectures to compare side-by-side:

Bulldozer

Sources: Wikichip, AMD, Intel and PC Watch Japan.

Previous page 1 2

Areej Syed

Processors, PC gaming, and the past. I have written about computer hardware for over seven years with over 5000 published articles. I started during engineering college and haven't stopped since. On the side, I play RPGs like Baldur's Gate, Dragon Age, Mass Effect, Divinity, and Fallout. Contact: areejs12@hardwaretimes.com.
Back to top button