Computer memory is mainly of two types: main memory (RAM), and graphics memory (VRAM). The former leverages DDR4 (and soon DDR5) while the latter makes use of the GDDR6 standard. But what’s the difference between the two, and which one is faster. In this post, we compare DDR4 against GDDR5 and GDDR6 memory and examine the differences and similarities between them.
DDR4 vs GDDR5 Memory
Before we move onto GDDR6, it’s important to have a look at GDDR5 memory and understand where it fits in the picture. Similar to GDDR4, GDDR5 is based on the DDR3 memory standard. However, it stacks up rather well in comparison to DDR4:
- DDR4 runs at a lower voltage than GDDR5, 1.2 volts to be exact. GDDR5, on the other hand, can go as high as 1.5v. This is because the latter is based on the DDR3 memory standard which also has a stock voltage of 1.5v.
- Both DDR4 and DDR3 use a 64-bit memory controller per channel which results in a 128-bit bus for dual-channel memory and 256 bit for -channel. GDDR5 memory, on the other hand, leverages a puny 32-bit controller per channel.
- While CPU memory configurations have wider but fewer channels (one per DIMM for DDR3/DDR4), GPUs can support any number of 32-bit memory channels. This is the reason many high-end GPUs like the GeForce RTX 2080 Ti and RTX 2080 have a 384-bit and 256-bit bus width, respectively.
Both the RTX 20 series cards are connected to 1GB memory chips via 8 (for 2080) and 12 (for the Ti) 32-bit memory controllers or channels. GDDR5 and GDDR6 can also operate in what is called clamshell mode, where each channel instead of being connected to one memory chip is split between two. This also allows manufacturers to double the memory capacity and makes hybrid memory configurations like the GTX 660 with its 192-bit bus width possible.
- Another core difference between DDR4 and GDDR5 memory involves the I/O cycles. Just like , DDR4 can only perform one operation (read or write) in one cycle. GDDR5 can handle input (read) as well as output (write) on the same cycle, essentially doubling the bus width.
- There’s also the matter of the burst length (data transferred per transmission) and prefetch. Both DDR4 and GDDR5 have a BL of 8, and a prefetch of 8n. (32 bytes per cycle).
All this might put DDR4 memory in a bad light, but this configuration actually suits both setups. CPUs are largely sequential in nature while GPUs run thousands of parallel cores. The former benefits from low latency and slimmer channels, while GPUs require a much higher bandwidth with loose timings.
GDDR5 vs GDDR5X vs GDDR6
- GDDR6 was preceded by GDDR5X which was more of a half-generation upgrade of sorts. GDDR5X features transfer rates of up to 14GBit/s per pin, twice as much as GDDR5 while also reducing the voltage from 1.5v to 1.35v.
- This was achieved by using a higher prefetch. Unlike GDDR5, GDDR5X has a 16n prefetch architecture (vs 8n on G5). This allows it to fetch 64-bytes (512-bits) of data per cycle (per channel) while GDDR5 was limited to 32-bytes (256-bits).
- Similarly, GDDR5X also has a higher burst length of 16 which allows the memory to fetch up to a 64B cache line per transfer. GDDR5 and DDR4 are limited to a burst length of 8 (or 32B x 2 per cycle) and an 8n prefetch.
Similar to how the transition from GDDR5-GDDR6 doubled the burst length and prefetch (8 to 16), DDR5 does the same with some additional features:
- Similar to GDDR5, DDR5 leverages two independent 32-bit memory controllers/channels per DIMM. Therefore, every DDR5 DIMM is dual-channel while a pair results in a quad-channel configuration.
- In addition to this, each DDR5 channel has a burst length (BL) and prefetch of 16, allowing each channel on a DIMM to transfer the same amount of data as two DDR4 DIMMs. There’s also support for 32-length mode, which allows up to 64-byte cache line fetch with just one transfer.
- DDR5 will have JEDEC speeds of up to 8,400 Mbps while DDR4 is limited to 3200 Mbps. Note that vendors these days have 4000MHz (MT/s) kits but those are actually overclocked.
- DDR5 has a 32-bank structure, with 8 bank groups (four per BG), twice as much as DDR4. This effectively doubles the memory access availability. To complement this, DDR5 also adopts the Same Bank Refresh Function. Unlike DDR4, this allows the next-gen memory to access other memory banks while the rest are operating or refreshing.
- In comparison, GDDR5X and GDDR6 have a 16-bank structure similar to DDR4 while GDDR5 was limited to just eight.
- Like DDR4, the I/O bus will interact with two BGs (per channel) simultaneously via a MUX, resulting in a higher effective prefetch and transfer rate.
Continued on the next page…
- DDR5 also increases the memory density all the way (up) to 64Gb from 16Gb and both the VDD and VPP voltages have gone down from 1.2v to 1.1v to reduce the power draw.
To understand what burst length means, you need to know how memory is accessed. When the CPU or cache requests new data, the address is sent to the memory module and the required row, after which the column is located (if not present, a new row is loaded). Keep in mind that there’s a delay after every step. Then the entire column is sent across the memory bus, in bursts. For DDR4 and GDDR5, each burst was 8 (or 16B). With DDR5 (and GDDR5X/6), it has been increased to as much as 32 (up to 64B). There are two bursts per clock and they happen at the effective data rate.
GDDR6, like GDDR5X, has a 16n (BL16) prefetch but it’s divided into two channels. Therefore, GDDR6 fetches 32 bytes per channel for a total of 64 bytes just like GDDR5X and twice that of GDDR5. While this doesn’t improve memory transfer speeds over GDDR5X, it allows for more versatility. The burst length is also the same as GDDR5X at 16 (64B).
GDDR6 can fetch the same amount of data as GDDR5X but across two separate channels, allowing it to function like two smaller chips instead of one, in addition to a wider single one. Other than that, GDDR6 also increases the density to 16Gb (2x compared to GDDR5X, with a JEDEC max of 32Gb) and significantly improves bandwidth by increasing the base clock from 12Gbps to up to 14Gbps (16Gbps max).
DDR4/DDR5/GDDR5= DDR; GDDR5X/GDDR6= QDR
DDR3, DDR4, GDDR5, and the newer DDR5 standards use a double data rate or DDR data transmission scheme. This means that bits (equal to BL) are transferred at the rising and falling edge of the word clock (WCK). With GDDR5X, graphics memory moved to a quad-rate mode
Therefore, data bits toggle four times per cycle (twice as fast as DDR) or four times faster than the word clock (WCK). Both GDDR5X and GDDR6 can be run in both DDR or QDR modes. However, when running the former in DDR, the effective speed drops to half as much. With GDDR6, you can use both DDR and QDR modes at full speeds of up to 14 Gbps. For example, a GDDR6 module running at 14Gbps, the WCK will run at 7GHz for a DDR device, and at 3.5 for a QDR device. In both cases, CK, the command and address clock, will run at 1.75GHz, with command and address lines themselves running at 1.75Gbps.
GDDR6 vs GDDR6X
NVIDIA is the first vendor to opt for GDDR6X memory in its RTX 30 series GPUs, at least the higher-end ones. It increases the per-pin bandwidth from 14Gbps to 21Gbps and the overall bandwidth to 1008GB/s, even more than a 3072-bit wide HBM2 stack.
GDDR6X | GDDR6 | GDDR5X | HBM2 | |
B/W Per Pin | 21 Gbps | 14 Gbps | 11.4 Gbps | 1.7 Gbps |
Chip capacity | 1 GB (8 Gb) | 1 GB (8 Gb) | 1 GB (8 Gb) | 4 GB (32 Gb) |
No. Chips/KGSDs | 12 | 12 | 12 | 3 |
B/W Per Chip/Stack | 84 GB/s | 56 GB/s | 45.6 GB/s | 217.6 GB/s |
Bus Width | 384-bit | 384-bit | 352-bit | 3072-bit |
Total B/W | 1008 GB/s | 672 GB/s | 548 GB/s | 652.8 GB/s |
DRAM Voltage | 1.35 V | 1.35 V | 1.35 V | 1.2 V |
Data Rate | QDR | QDR | DDR | DDR |
Signaling | PAM4 | Binary | Binary | Binary |
The secret sauce behind GDDR6X memory is PAM4 encoding. In simple words, it doubles the data transfer per clock compared to GDDR6 which uses NRZ or binary coding.
With NRZ, you had just two states, 0 and 1. PAM4 doubles it to four, 00, 01,10, and 11. Using these four states, you can send four bits of data per cycle (two per edge). The drawback with PAM4 is the high price especially at the higher frequencies of GDD6X. This is the reason why no one has tried to implement it in consumer memory before.
This is one downside with this. While GDDR6 has a burst length of 16 bytes (BL16), GDDR6X is limited to BL8 or 8 bytes, but because of PAM4 signaling, each of its 16-bit channels will also deliver 32 bytes per operation. Therefore, most of the improvement in bandwidth has come from higher operating frequency on GDDR6X. Keep in mind that GDDR6X is not a JEDEC standard, rather a proprietary solution from Micron.
High Bandwidth Memory (HBM)
First popularized by AMD’s Fiji graphics cards, high bandwidth memory or HBM is a low power memory standard with a wide bus. HBM achieves substantially higher bandwidth compared to GDDR5 while drawing much lesser power in a small form factor.
HBM adopts clocks as low as 500 MHz to conform to a low TDP target and makes up for the loss in bandwidth with a massive bus (usually 4096 bits). AMD’s Radeon RX Vega cards are the best example of HBM2 implementation in consumer hardware. HBM2 solved the 4GB limit of the HBM1, but limited yields coupled with memory shortage prevented AMD from capitalizing on the consumer GPU front.