NVIDIA Next-Gen Hopper GH100 Data Center GPU Unveiled: 4nm, 18432 Cores, 700W Power Draw, 4000 TFLOPs of Mixed Precision Compute

NVIDIA announced its much anticipated Hopper data center graphics architecture today. Retaining the building blocks of the GA100 “Ampere” die, the GH100 significantly expands its low precision compute capabilities. We’re looking at an incredible 4000 TFLOPs of INT4, 2000 TFLOPs of INT8, 1000 TFLOPs of BF16 and FP16, and a respectable 500 TFLOPs of TF32 performance when leveraging sparse matrices.

Via: Redfire

The non-matrix performance is less impressive, promising 60 TFLOPs of FP64, 60 TFLOPs of FP32, and 120 TFLOPs of FP16/BF16 compute. In comparison, AMD’s recently launched Instinct MI250X boasts an incredible 96 TFLOPs of FP32 and 47 TFLOPs of FP64 compute performance.

Going by the specs on paper, NVIDIA has invested heavily in integer matrix multiplication, offering 10x more performance than the MI250X in these workloads (4000/2000 TFLOPs vs 383 TFLOPs). AMD, on the other hand, has focused on traditional FP32 and FP64 performance.

Internally, the FP64 and INT32 core counts are unchanged, but FP32 has been bumped up to 128 per SM, just like Ampere and Ada. There are four Tensors per SM for a total of 528 for the entire H100 GPU. For memory, we’re looking at a moderate 60MB of L3 cache and (up to) six 512-bit HBM2e memory stacks. The memory is said to be clocked at 1600MHz.

Data Center GPUNVIDIA Tesla P100NVIDIA Tesla V100NVIDIA A100NVIDIA H100
GPU CodenameGP100GV100GA100GH100
GPU ArchitectureNVIDIA PascalNVIDIA VoltaNVIDIA AmpereNVIDIA Hopper
FP32 Cores / SM646464128
FP32 Cores / GPU35845120691216896
FP64 Cores / SM32323232
FP64 Cores / GPU1792256034568448
INT32 Cores / SMNA646464
INT32 Cores / GPUNA512069128448
Tensor Cores / SMNA8424
Tensor Cores / GPUNA640432528
Texture Units224320432528
Memory Interface4096-bit HBM24096-bit HBM25120-bit HBM2512-bit x5
Memory Size16 GB32 GB / 16 GB40 GB128GB?
Memory Data Rate703 MHz DDR877.5 MHz DDR1215 MHz DDR1600 MHz DDR?
Memory Bandwidth720 GB/sec900 GB/sec1555 GB/sec?
L2 Cache Size4096 KB6144 KB40960 KB60MB
TDP300 Watts300 Watts400 Watts700W
TSMC Manufacturing Process16 nm FinFET+12 nm FFN7 nm N74 nm N4

The other highlight is the inclusion of PCIe Gen 5 and the NVLink bus interface, enabling up to 900GB/s of GPU-to-GPU bandwidth. Overall, the H100 offers a substantial 4.9 TB/s of external bandwidth. Finally, this monster GPU has a TDP of 700W despite featuring TSMC’s N4 node. The 4nm (N4) process node is a refinement of the N5 (5nm) node.


Computer hardware enthusiast, engineering dropout, and PC gamer. Former co-founder of Techquila (2017-2019), a fairly successful tech outlet. Been working on Hardware Times since 2019, an outlet dedicated to computer hardware and its applications.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top button