Modern computers use many different kinds of memory: DDR4, GDDR5, GDDR6, LPDDR4, etc. While these are all based on DRAM, there are some key differences between them. DDR4 is used in most PCs as the main memory and is the most popular form of DRAM. GDDR5 and GDDR6 are used in graphics cards as dedicated graphics memory. Although it’s also based on DRAM, it’s somewhat different from DDR4.
Many people get confused between the two and use them interchangeably. There’s also LPDDR4 memory used in smartphones and other mobile devices, and HBM utilized in servers and exascale computers. In this post, we explore the differences between DDR4 and GDDDR5 memory along with a brief explanation of HBM, LPDDR4 and the newer GDDR6 standard.
Double Data rate Generation Four (DDR4)
Nearly every kind of memory is based on dynamic random access memory or DRAM.
DDR4 is the latest iteration of DRAM. Released in 2014, it initially focused on reducing the voltage and power consumption rather than increasing the operating frequencies. With the coming of AMD’s Ryzen processors and the MCM design
DDR4 vs DDR3
Aside from the obvious (faster frequencies and lower latency), the primary advantages of DDR4 memory over DDR3 are higher DIMM sizes (up to 64 GB, DDR3 is limited to 16GB). It also draws considerably less power and runs at a lower voltage.
With that out of the way,
DDR4 Vs GDDR5
- DDR4 runs at a much higher voltage than GDDR5, 1.2 volts to be exact. GDDR5, on the other hand, is usually limited to 1V.
- Both DDR4 and DDR3 use a 64-bit memory controller per channel which results in a 128-bit bus for dual-channel memory and 256 bit for
quad-channel. GDDR5 memory, on the other hand, leverages a puny 32-bit controller per channel.
- Where CPU memory configurations have wider but fewer channels, GPUs can support any number of 32-bit memory channels. This is the reason many high-end GPUs like the GeForce RTX 2080 Ti and RTX 2080 have a 384-bit and 256-bit bus width, respectively.
Both the RTX 20 series cards are connected to 1GB memory chips via 8 (for 2080) and 12 (for the Ti) 32-bit memory controllers or channels. GDDR5/6 can also operate in what is called clamshell mode, where each channel instead of being connected to one memory chip is split between two. This also allows manufacturers to double the memory capacity and makes hybrid memory configurations like the GTX 660 with its 192-bit bus width possible.
- Another core difference between DDR4 and GDDR5/6 memory involves the I/O cycles. Just like
SATA, DDR4 can only perform one operation (read or write) in one cycle. GDDR5 can handle input (read) as well as output (write) on the same cycle, essentially doubling the bus width.
- All this might put DDR4 memory in a bad light, but this configuration actually suits both setups. CPUs are largely sequential in nature while GPUs run thousands of parallel cores. The former benefits from low latency and slimmer channels, while GPUs require a much higher bandwidth with loose timings.
GDDR5 vs GDDR5X vs GDDR6
GDDR6 was preceded by GDDR5X which was more of a half-generation upgrade of sorts. GDDR5X features transfer rates of up to 14GBit/s per pin, twice as much as GDDR5.
This was achieved by using a higher prefetch. Unlike GDDR5, GDDR5X has a 14n prefetch architecture (vs 8n on G5). This allows it to fetch 64-bytes (512-bits) of data per cycle while GDDR5 was limited to 32-bytes.
GDDR6, like GDDR5X, has a 16n prefetch but it’s divided into two channels. So GDDR6 fetches 32 bytes per channel for a total of 64 bytes just like GDDR5X and twice that of GDDR5. While this doesn’t improve memory transfer speeds over GDDR5X, it allows for more versatility.
GDDR6 can fetch the same amount of data as GDDR5X but in two separate channels, allowing it to function like two smaller chips instead of one, in addition to a wider single one.
Other than that, GDDR6 also increased the density to 16Gb (2x compared to GDDR5X) and significantly improves bandwidth by increasing the base clock from 12Gbps to up to 16Gbps.
High Bandwidth Memory (HBM)
First popularized by AMD’s Fiji graphics cards, high bandwidth memory or HBM is a low power memory standard with a wide bus. HBM achieves substantially higher bandwidth compared to GDDR5 while drawing much lesser power in a small form factor.
HBM adopts clocks as low as 500 MHz to conform to a low TDP target and makes up for the loss in bandwidth with a massive bus (usually 4096 bits). AMD’s Radeon RX Vega cards are the best example of HBM2 implementation in consumer hardware. HBM2 solved the 4GB limit of the HBM1, but limited yields coupled with memory shortage prevented AMD from capitalizing on the consumer GPU front.
LPDDR4 vs DDR4
LPDDR4 is the mobile equivalent of DDR4 memory. Compared to DDR4, it offers reduced power consumption but does so at the cost of bandwidth. LPDDR4 has dual 16-bit channels resulting in a 32-bit total bus. In comparison, DDR4 has 64-bit channels.
However, at the same time, LPDDR4 has a prefetch of 16n per channel for a total of (16 words x 16 bit) 256 bits/32 bytes. That results in an overall of 512 bits or 64 bytes for both the channels.
DDR4, on the other hand, has two 8n prefetch banks per channel. The two banks are separate and can execute two independent 8n prefetches. This is done by using a multiplexer to time division multiplex its internal banks.
LPDDR4 also has a more flexible burst length ranging from 16 to 32 (256 or 512 bits, 32 or 64 bytes. DDR4, on the other hand, is limited to 8 bursts per cycle (or 128 bits), although each bank can perform additional transfers.
To understand what burst-length means, you need to know how memory is accessed. When the CPU or cache requests new data, the address is sent to the memory module and the needed row, then the column is located (if not present, a new row is loaded). Keep in mind that there’s a delay after every step.
After that, the entire column is sent across the memory bus, but instead in bursts. For DDR4, each burst was 8 (or 16B). With DDR5, it has been increased to as much as 32 (up to 64B). There are two bursts per clock and they happen at the effective data rate.
This design makes LPDDR4 much more power efficient compared to standard DDR4 memory, making it ideal for use in smartphones with battery standby times of up to 8-10 hours. Micron’s LPDDR4 RAM tops out the standard with a 2133 MHz clock for a transfer rate of 4266 MT/s while Samsung follows shortly after with a clock of 1600MHz and a transfer rate of 3200 MT/s.
DDR4 vs DDR5 Memory
The specifications of the next-gen DDR5 memory standard have been announced and they’re a substantial step above the existing DDR4 modules. DDR5 aims to reach bandwidths as high as 4800Mbps per DIMM, a hefty 50% gain over DDR4’s 3200Mbps. This massive uplift is achieved via the following advances in the memory structure:
32-Bank Structure: DDR5 uses a 32 bank structure with 8 bank groups, twice as much as DDR4’s 16 bank design. This effectively doubles the memory access availability. To complement this, DDR5 also adopts the Same Bank Refresh Function. Unlike DDR4, this allows the next-gen memory to access other memory banks while the rest are operating or refreshing.
Burst Length: With DDR4, the burst rate was limited to 8, allowing transfers of up to 16B from the cache at a time. DDR5 increases this to 16, with support for even 32-length mode, which allows up to 64B cache line fetch with just one DIMM.
To understand what burst-length means, you need to know how memory is accessed. When the CPU or cache requests new data, the address is sent to the memory module and the required row, after which the column is located (if not present, a new row is loaded). Keep in mind that there’s a delay after every step.
Then the entire column is sent across the memory bus, in bursts. For DDR4, each burst was 8 (or 16B). With DDR5, it has been increased to as much as 32 (up to 64B). There are two bursts per clock and they happen at the effective data rate.
16n Prefetch: The prefetch has also been scaled up to 16n to keep up with the increased burst length. Like DDR4, there will be two memory-bank arrays per channel connected via a MUX resulting in a higher effective prefetch rate.
Lastly, by adopting a Decision Feedback Equalization (DFE) circuit, which eliminates reflective noise during the channels’ high-speed operation, DDR5 increases the speed per pin considerably.
|Data rates||1600-3200 MT/s||3200-6400 MT/s||Increases performance and bandwidth|
|Internal VREF||VREFDQ||VREFDQ, VREFCA, VREFCS||Improves voltage margins, reduces BOM costs|
|Device densities||2Gb-16Gb||8Gb-64Gb||Enables larger monolithic devices|
|Prefetch||8n||16n||Keeps the internal core clock low|
|DQ receiver equalization||CTLE||DFE||Improves opening of the received DQ data|
eyes inside the DRAM
|Duty cycle adjustment (DCA)||None||DQS and DQ||Improves signaling on the transmitted DQ/DQS pins|
|Internal DQS delay|
|None||DQS interval oscillator||Increases robustness against environmental changes |
|On-die ECC||None||128b+8b SEC, error check and scrub||Strengthens on-chip RAS|
|CRC||Write||Read/Write||Strengthens system RAS by protecting read data|
|Bank groups (BG)/banks||4 BG x 4 banks (x4/x8)|
2 BG x 4 banks (x16)
|8 BG x 2 banks (8Gb x4/x8)|
4 BG x 2 banks (8Gb x16)
8 BG x 4 banks (16-64Gb x4/x8)
4 BG x 4 banks (16-64Gb x16)
|Command/address interface||ODT, CKE, ACT, RAS,|
CAS, WE, A<X:0>
|CA<13:0>||Dramatically reduces the CA pin count|
|ODT||DQ, DQS, DM/DBI||DQ, DQS, DM, CA bus||Improves signal integrity, reduces BOM costs |
|Burst length||BL8 (and BL4)||BL16, BL32 (and BC8 OTF, BL32 OTF) ||Allows 64B cache line fetch with only 1 DIMM subchannel. |
|MIR (“mirror” pin)||None||Yes||Improves DIMM signaling|
|Bus inversion||Data bus inversion (DBI)||Command/address inversion (CAI)||Reduces VDDQ noise on modules|
|CA training, CS training||None||CA training, CS training||Improves timing margin on CA and CS pins|
|Write leveling training modes||Yes||Improved||Compensates for unmatched DQ-DQS path|
|Read training patterns||Possible with the MPR||Dedicated MRs for serial (userdefined), clock and LFSR-generated training patterns||Makes read timing margin more robust|
|Mode registers||7 x 17 bits||Up to 256 x 8 bits (LPDDR type read/write) ||Provides room to expand|
|PRECHARGE commands||All bank and per bank||All bank, per bank, and same bank||PREsb enables precharging-specific bank in each BG|
|REFRESH commands||All bank||All bank and same bank||REFsb enables refreshing of specific bank in each BG|
|Loopback mode||None||Yes||Enables testing of the DQ and DQS signaling|
DDR5 also increases the memory density all the way (up) to 64Gb from 16Gb and both the VDD and VPP have gone down to reduce the power draw. Finally, on-chip ECC has also been added and the Mode Registers have also been significantly upgraded. You can see the entire change-list in the above table.