Computer memory is mainly of two types: system memory (RAM), and graphics memory (VRAM). The former leverages DDR4 while the latter makes use of the GDDR5 (and GDDR6) standard. But what’s the difference between the two. In this post, we compare DDR4 vs GDDR5 (and GDDR6) and examine the differences and similarities between them.
DDR4 Vs GDDR5
- DDR4 runs at a lower voltage than GDDR5, 1.2 volts to be exact. GDDR5, on the other hand, can go as high as 1.5v. This is because the latter is based on the DDR3 memory standard which also has a stock voltage of 1.5v.
- As far as the memory frequencies are concerned, DDR4 runs at roughly the same speed as GDDR5X and GDDR6 (~1750 to 1800MHz), but the way graphics memory works means that the effective bandwidth is 4x as much (1750 x 4= 7,000MHz). More on that below.
- GDDR5X brings the voltage down to 1.35v, all the while increasing the per-pin bandwidth to 16Gbit/s.
- Both DDR4 and DDR3 use a 64-bit memory controller per channel which results in a 128-bit bus for dual-channel memory and 256 bit for
quad-channel. GDDR5 memory, on the other hand, leverages a puny 32-bit controller per channel.
- While CPU memory configurations have wider but fewer channels, GPUs can support any number of 32-bit memory channels. This is the reason many high-end GPUs like the GeForce RTX 2080 Ti and RTX 2080 have a 384-bit and 256-bit bus width, respectively.
Both the RTX 20 series cards are connected to 1GB memory chips via 8 (for 2080) and 12 (for the Ti) 32-bit memory controllers or channels. GDDR5 and GDDR6 can also operate in what is called clamshell mode, where each channel instead of being connected to one memory chip is split between two. This also allows manufacturers to double the memory capacity and makes hybrid memory configurations like the GTX 660 with its 192-bit bus width possible.
- Another core difference between DDR4 and GDDR5/6 memory involves the I/O cycles. Just like
SATA, DDR4 can only perform one operation (read or write) in one cycle. GDDR5 and GDDR6 can handle input (read) as well as output (write) on the same cycle, essentially doubling the bus width.
- All this might put DDR4 memory in a bad light, but this configuration actually suits both setups. CPUs are largely sequential in nature while GPUs run thousands of parallel cores. The former benefits from low latency and slimmer channels, while GPUs require a much higher bandwidth with loose timings.
GDDR5 vs GDDR5X vs GDDR6
GDDR6 was preceded by GDDR5X which was more of a half-generation upgrade of sorts. GDDR5X features transfer rates of up to 14GBit/s per pin, twice as much as GDDR5 while also reducing the voltage from 1.5v to 1.35v.
This was achieved by using a higher prefetch. Unlike GDDR5, GDDR5X has a 16n prefetch architecture (vs 8n on G5). This allows it to fetch 64-bytes (512-bits) of data per cycle (per channel) while GDDR5 was limited to 32-bytes (256-bits). Similarly, GDDR5X also has a higher burst length of 16 (like DDR5) which allows the memory to fetch up to a 64B cache line per transfer. GDDR5 and DDR4 are limited to a burst length of 8 (or 32B x 2 per cycle) and an 8n prefetch.
To understand what burst-length means, you need to know how memory is accessed. When the CPU or cache requests new data, the address is sent to the memory module and the required row, after which the column is located (if not present, a new row is loaded). Keep in mind that there’s a delay after every step. Then the entire column is sent across the memory bus, in bursts. For DDR4 and GDDR5, each burst was 8 (or 16B). With DDR5 (and GDDR5X/6), it has been increased to as much as 32 (up to 64B). There are two bursts per clock and they happen at the effective data rate.
GDDR6, like GDDR5X, has a 16n prefetch but it’s divided into two channels. So GDDR6 fetches 32 bytes per channel for a total of 64 bytes just like GDDR5X and twice that of GDDR5. While this doesn’t improve memory transfer speeds over GDDR5X, it allows for more versatility. The burst length is also the same as GDDR5X at 16 (64B).
GDDR6 can fetch the same amount of data as GDDR5X but in two separate channels, allowing it to function like two smaller chips instead of one, in addition to a wider single one.
Other than that, GDDR6 also increased the density to 16Gb (2x compared to GDDR5X, with a JEDEC max of 32Gb) and significantly improves bandwidth by increasing the base clock from 12Gbps to up to 14Gbps (16Gbps max).
GDDR6 vs GDDR6X
NVIDIA is the first vendor to opt for GDDR6X memory in its RTX 30 series GPUs, at least the higher-end ones. It increases the per-pin bandwidth from 14Gbps to 21Gbps and the overall bandwidth to 1008GB/s, even more than a 3072-bit wide HBM2 stack.
|B/W Per Pin||21 Gbps||14 Gbps||11.4 Gbps||1.7 Gbps|
|Chip capacity||1 GB (8 Gb)||1 GB (8 Gb)||1 GB (8 Gb)||4 GB (32 Gb)|
|B/W Per Chip/Stack||84 GB/s||56 GB/s||45.6 GB/s||217.6 GB/s|
|Total B/W||1008 GB/s||672 GB/s||548 GB/s||652.8 GB/s|
|DRAM Voltage||1.35 V||1.35 V||1.35 V||1.2 V|
The secret sauce behind GDDR6X memory is PAM4 encoding. In simple words, it doubles the data transfer per clock compared to GDDR6 which uses NRZ or binary coding.
With NRZ, you had just two states, 0 and 1. PAM4 doubles it to four, 00, 01,10, and 11. Using these four states, you can send four bits of data per cycle (two per edge). The drawback with PAM4 is the high price especially at the higher frequencies of GDD6X. This is the reason why no one has tried to implement it in consumer memory before.
This is one down-side with this. While GDDR6 has a burst length of 16 bytes (BL16), GDDR6X is limited to BL8 or 8 bytes, but because of PAM4 signaling, each of its 16-bit channels will also deliver 32 bytes per operation. Therefore, most of the improvement in bandwidth has come from higher operating frequency on GDDR6X. Keep in mind that GDDR6X is not a JEDEC standard, rather a proprietary solution from Micron.
High Bandwidth Memory (HBM)
First popularized by AMD’s Fiji graphics cards, high bandwidth memory or HBM is a low power memory standard with a wide bus. HBM achieves substantially higher bandwidth compared to GDDR5 while drawing much lesser power in a small form factor.
HBM adopts clocks as low as 500 MHz to conform to a low TDP target and makes up for the loss in bandwidth with a massive bus (usually 4096 bits). AMD’s Radeon RX Vega cards are the best example of HBM2 implementation in consumer hardware. HBM2 solved the 4GB limit of the HBM1, but limited yields coupled with memory shortage prevented AMD from capitalizing on the consumer GPU front.