NVIDIA has announced the PCIe variant of the A100 GPU accelerator based on the new Ampere microarchitecture. While the core specs and configuration are identical to the original SXM4 based A100 “Tensor Core GPU“, the bus interface and power draw have been changed. The PCIe version of the A100 supports up to PCIe 4.0 speeds and comes with a significantly reduced TDP of 250W.
A100 (PCIe) | A100 (SXM4) | V100 (PCIe) | P100 (PCIe) | |
---|---|---|---|---|
FP32 CUDA Cores | 6912 | 6912 | 5120 | 3584 |
Boost Clock | 1.41GHz | 1.41GHz | 1.38GHz | 1.3GHz |
Memory Clock | 2.4Gbps HBM2 | 2.4Gbps HBM2 | 1.75Gbps HBM2 | 1.4Gbps HBM2 |
Memory Bus Width | 5120-bit | 5120-bit | 4096-bit | 4096-bit |
Memory Bandwidth | 1.6TB/sec | 1.6TB/sec | 900GB/sec | 720GB/sec |
VRAM | 40GB | 40GB | 16GB/32GB | 16GB |
Single Precision | 19.5 TFLOPs | 19.5 TFLOPs | 14.1 TFLOPs | 9.3 TFLOPs |
Double Precision | 9.7 TFLOPs (1/2 FP32 rate) | 9.7 TFLOPs (1/2 FP32 rate) | 7 TFLOPs (1/2 FP32 rate) | 4.7 TFLOPs (1/2 FP32 rate) |
INT8 Tensor | 624 TOPs | 624 TOPs | N/A | N/A |
FP16 Tensor | 312 TFLOPs | 312 TFLOPs | 112 TFLOPs | N/A |
TF32 Tensor | 156 TFLOPs | 156 TFLOPs | N/A | N/A |
Relative Performance (SXM Version) | 90% | 100% | N/A | N/A |
Interconnect | NVLink 3 6 Links? (300GB/sec?) | NVLink 3 12 Links (600GB/sec) | NVLink 2 4 Links (200GB/sec) | NVLink 1 4 Links (160GB/sec) |
GPU | GA100 (826mm2) | GA100 (826mm2) | GV100 (815mm2) | GP100 (610mm2) |
Transistor Count | 54.2B | 54.2B | 21.1B | 15.3B |
TDP | 250W | 400W | 250W | 300W |
Manufacturing Process | TSMC 7N | TSMC 7N | TSMC 12nm FFN | TSMC 16nm FinFET |
Interface | PCIe 4.0 | SXM4 | PCIe 3.0 | SXM |
Architecture | Ampere | Ampere | Volta | Pascal |
As per NVIDIA’s official documentation, the PCIe 4.0 variant offers 90% performance of the SMX4 version, all the while cutting down power by 150W. The interconnect speeds are the main differentiating factor between the two models, with the PCIe variant topping out at 300GB/s and 6 NVLink 3 GPU interconnects while the full-fledged SXM variant supporting up to 600GB/s and up to 12 GPU links via the NVLink 3 interface.

Other than this, both the GPUs are practically identical with the same GPU core, bus width and HBM2 memory. The new PCIe 4 variant should significantly increase the target market of the Ampere based A100 GPU accelerator. You can read more about the finer details of the A100 and the Ampere architecture powering it here:
NVIDIA Ampere Architectural Analysis: A Look at the A100 Tensor Core GPU