NVIDIA today launched two new Ampere-based Data Center GPUs (accelerators), namely the A10 and A30. Out of the two, the former is more interesting as it based on the 8nm GA102 core that powers the GeForce RTX 3080 and 3090 (and soon the 3080 Ti). This is a bit surprising as there has been a very limited supply of GA102 chips. The fact that NVIDIA is now diverting part of its GA102 supply to the data center market means that the number of chips reserved for gamers will be even lower.
Similar to the gaming graphics cards, the A10 uses GDDR6 memory, 24GB of it just like the 3090 while the A30 is based on an unspecified variant of the GA100 (likely a cut-down SKU) with the same amount of HBM 2 memory. The NVIDIA A10 Tensor Core GPU leverages the GA102-890 core which features 72 SMs or 9,216 FP32 cores. That’s lesser than the 82 SMs or 10,496 cores on the 3090 but more than the 68 SMs (8,704 cores) on the RTX 3080.
This means that NVIDIA is salvaging GA102 dies that aren’t good enough for the RTX 3090 but at the same time can’t be used for the 3080 without disabling additional SMs. It has a base clock of 885MHz and a boost of 1,695MHz. Once again, as you can see, the lower boost clock is another contributing factor to the use of these parts for the Tensor line. The memory buffer is 24GB strong, but instead of GDDR6X, it’s paired with standard GDDR6 memory (likely as a result of shortages) along with a 384-bit bus, resulting in a bandwidth of 600GB/s.
|FP64 Tensor Core||–||10.3 teraFLOPS|
|FP32||31.2 teraFLOPS||10.3 teraFLOPS|
|TF32 Tensor Core||62.5 teraFLOPS | 125 teraFLOPS*||82 teraFLOPS | 165 teraFLOPS*|
|BFLOAT16 Tensor Core||125 teraFLOPS | 250 teraFLOPS*||165 teraFLOPS | 330 teraFLOPS*|
|FP16 Tensor Core||125 teraFLOPS | 250 teraFLOPS*||165 teraFLOPS | 330 teraFLOPS*|
|INT8 Tensor Core||250 TOPS | 500 TOPS*||330 TOPS | 661 TOPS*|
|INT4 Tensor Core||500 TOPS | 1,000 TOPS*||661 TOPS | 1321 TOPS*|
|RT Core||72 RT Cores||–|
2 decoder (+AV1 decode)
|1 optical flow accelerator (OFA)|
1 JPEG decoder (NVJPEG)
4 video decoders (NVDEC)
|GPU memory||24GB GDDR6||24GB HBM2|
|GPU memory bandwidth||600GB/s||933GB/s|
|Interconnect||PCIe Gen4 64GB/s||PCIe Gen4: 64GB/s|
Third-gen NVLINK: 200GB/s**
|Form factors||Single-slot, full-height, full-length (FHFL)||Dual-slot, full-height, full-length (FHFL)|
|Max thermal design power (TDP)||150W||165W|
|Multi-Instance GPU (MIG)||–||4 GPU instances @ 6GB each|
2 GPU instances @ 12GB each
1 GPU instance @ 24GB
|vGPU software support||NVIDIA Virtual PC, NVIDIA Virtual Applications, NVIDIA RTX Virtual|
Workstation, NVIDIA Virtual Compute Server
|NVIDIA AI Enterprise for VMware|
NVIDIA Virtual Compute Serve
The A30 is based on the same GA100 GPU as the A100. NVIDIA hasn’t revealed the core count but we’re definitely looking at a cut-down SKU paired with 24GB of HBM2 memory running at 1,215MHz across a 3,072-bit wide bus (3 HBM stacks). The GPU core has a base clock of 930MHz and a boost of 1,440MHz, with the memory bandwidth pegged at 933GB/s.