DDR4 vs GDDR6 Memory: Which One is Faster?

Computer memory is mainly of two types: main memory (RAM), and graphics memory (VRAM). The former leverages DDR4 (and soon DDR5) while the latter makes use of the GDDR6 standard. But what’s the difference between the two, and which one is faster. In this post, we compare DDR4 against GDDR5 and GDDR6 memory and examine the differences and similarities between them.

DDR4 vs GDDR5 Memory

Before we move onto GDDR6, it’s important to have a look at GDDR5 memory and understand where it fits in the picture. Similar to GDDR4, GDDR5 is based on the DDR3 memory standard. However, it stacks up rather well in comparison to DDR4:

Both the RTX 20 series cards are connected to 1GB memory chips via 8 (for 2080) and 12 (for the Ti) 32-bit memory controllers or channels. GDDR5 and GDDR6 can also operate in what is called clamshell mode, where each channel instead of being connected to one memory chip is split between two. This also allows manufacturers to double the memory capacity and makes hybrid memory configurations like the GTX 660 with its 192-bit bus width possible.

A GTX 660 Ti has six memory stacks, the ones on top (packing two chips per stack) in clamshell mode. This reduces the bus width to 192-bit rather than 256-bit
A GTX 660 PCB
clamshell mode

All this might put DDR4 memory in a bad light, but this configuration actually suits both setups. CPUs are largely sequential in nature while GPUs run thousands of parallel cores. The former benefits from low latency and slimmer channels, while GPUs require a much higher bandwidth with loose timings.

GDDR5 vs GDDR5X vs GDDR6

Similar to how the transition from GDDR5-GDDR6 doubled the burst length and prefetch (8 to 16), DDR5 does the same with some additional features:

Continued on the next page…

To understand what burst length means, you need to know how memory is accessed. When the CPU or cache requests new data, the address is sent to the memory module and the required row, after which the column is located (if not present, a new row is loaded). Keep in mind that there’s a delay after every step. Then the entire column is sent across the memory bus, in bursts. For DDR4 and GDDR5, each burst was 8 (or 16B). With DDR5 (and GDDR5X/6), it has been increased to as much as 32 (up to 64B). There are two bursts per clock and they happen at the effective data rate.

GDDR6, like GDDR5X, has a 16n (BL16) prefetch but it’s divided into two channels. Therefore, GDDR6 fetches 32 bytes per channel for a total of 64 bytes just like GDDR5X and twice that of GDDR5. While this doesn’t improve memory transfer speeds over GDDR5X, it allows for more versatility. The burst length is also the same as GDDR5X at 16 (64B).

Like DDR4, both GDDR5 and GDDR6 feature a 16 memory bank config

GDDR6 can fetch the same amount of data as GDDR5X but across two separate channels, allowing it to function like two smaller chips instead of one, in addition to a wider single one. Other than that, GDDR6 also increases the density to 16Gb (2x compared to GDDR5X, with a JEDEC max of 32Gb) and significantly improves bandwidth by increasing the base clock from 12Gbps to up to 14Gbps (16Gbps max).

DDR4/DDR5/GDDR5= DDR; GDDR5X/GDDR6= QDR

DDR3, DDR4, GDDR5, and the newer DDR5 standards use a double data rate or DDR data transmission scheme. This means that bits (equal to BL) are transferred at the rising and falling edge of the word clock (WCK). With GDDR5X, graphics memory moved to a quad-rate mode

Therefore, data bits toggle four times per cycle (twice as fast as DDR) or four times faster than the word clock (WCK). Both GDDR5X and GDDR6 can be run in both DDR or QDR modes. However, when running the former in DDR, the effective speed drops to half as much. With GDDR6, you can use both DDR and QDR modes at full speeds of up to 14 Gbps. For example, a GDDR6 module running at 14Gbps, the WCK will run at 7GHz for a DDR device, and at 3.5 for a QDR device. In both cases, CK, the command and address clock, will run at 1.75GHz, with command and address lines themselves running at 1.75Gbps.

GDDR6 vs GDDR6X

NVIDIA is the first vendor to opt for GDDR6X memory in its RTX 30 series GPUs, at least the higher-end ones. It increases the per-pin bandwidth from 14Gbps to 21Gbps and the overall bandwidth to 1008GB/s, even more than a 3072-bit wide HBM2 stack.

 GDDR6XGDDR6GDDR5XHBM2
B/W Per Pin21 Gbps14 Gbps11.4 Gbps1.7 Gbps
Chip capacity1 GB (8 Gb)1 GB (8 Gb)1 GB (8 Gb)4 GB (32 Gb)
No. Chips/KGSDs1212123
B/W Per Chip/Stack84 GB/s56 GB/s45.6 GB/s217.6 GB/s
Bus Width384-bit384-bit352-bit3072-bit
Total B/W1008 GB/s672 GB/s548 GB/s652.8 GB/s
DRAM Voltage1.35 V1.35 V1.35 V1.2 V
Data RateQDRQDRDDRDDR
SignalingPAM4BinaryBinaryBinary

The secret sauce behind GDDR6X memory is PAM4 encoding. In simple words, it doubles the data transfer per clock compared to GDDR6 which uses NRZ or binary coding.

With NRZ, you had just two states, 0 and 1. PAM4 doubles it to four, 00, 01,10, and 11. Using these four states, you can send four bits of data per cycle (two per edge). The drawback with PAM4 is the high price especially at the higher frequencies of GDD6X. This is the reason why no one has tried to implement it in consumer memory before.

This is one downside with this. While GDDR6 has a burst length of 16 bytes (BL16), GDDR6X is limited to BL8 or 8 bytes, but because of PAM4 signaling, each of its 16-bit channels will also deliver 32 bytes per operation. Therefore, most of the improvement in bandwidth has come from higher operating frequency on GDDR6X. Keep in mind that GDDR6X is not a JEDEC standard, rather a proprietary solution from Micron.

High Bandwidth Memory (HBM)

First popularized by AMD’s Fiji graphics cards, high bandwidth memory or HBM is a low power memory standard with a wide bus. HBM achieves substantially higher bandwidth compared to GDDR5 while drawing much lesser power in a small form factor.

HBM adopts clocks as low as 500 MHz to conform to a low TDP target and makes up for the loss in bandwidth with a massive bus (usually 4096 bits). AMD’s Radeon RX Vega cards are the best example of HBM2 implementation in consumer hardware. HBM2 solved the 4GB limit of the HBM1, but limited yields coupled with memory shortage prevented AMD from capitalizing on the consumer GPU front.