According to a well-reputed source on NVIDIA products, kopite7kimi, the company is working on a monster GPU for miners based on the 7nm GA100 die. This is the same GPU that powers the Ampere-class A100 Tensor Core GPU designed for accelerating neural networks and other AI-intensive workloads that benefit from mixed-precision compute and a ton of bandwidth. Fabricated on TSMC’s 7nm N7 manufacturing process, the NVIDIA Ampere architecture-based GA100 GPU that powers A100 includes 54.2 billion transistors with a die size of 826 mm2.
Coincidentally, Ether mining requires the same two things: compute and as much bandwidth you can muster, making the A100 ideal for deep-pocketed miners. The NVIDIA A100 GPU comes with 40GB of slow but wide HBM2 memory with a massive bandwidth of 1,555 GB/s. That’s nearly 70% more than the bandwidth of the present fastest mining GPU, the GeForce RTX 3090. To act as an intermediate, the A100 also features a ton of on-die cache in the form of 40MB L2, nearly 7 times more than the RTX 3090. With such a massive bandwidth capability, the A100 is going to be at least twice as fast as the RTX 3090 in Ether mining, offering hash rates north of 200 MH/s.
NVIDIA might get rid of the Tensor and high-precision (FP64) cores and replace them with FP32 units to improve the compute capabilities of the GPU, although that will require a significant rework of the SM design, and is therefore unlikely. Either way, if it does happen, we might see hash rates in excess of a whopping 300MH/s. The full implementation of the GA100 GPU includes the following units:
- 8 GPCs, 8 TPCs/GPC, 2 SMs/TPC, 16 SMs/GPC, 128 SMs per full GPU
- 64 FP32 CUDA Cores/SM, 8192 FP32 CUDA Cores per full GPU
- 4 Third-generation Tensor Cores/SM, 512 Third-generation Tensor Cores per full GPU
- 6 HBM2 stacks, 12 512-bit Memory Controllers
The NVIDIA A100 Tensor Core GPU implementation of the GA100 GPU includes the following units:
- 7 GPCs, 7 or 8 TPCs/GPC, 2 SMs/TPC, up to 16 SMs/GPC, 108 SMs
- 64 FP32 CUDA Cores/SM, 6912 FP32 CUDA Cores per GPU
- 4 Third-generation Tensor Cores/SM, 432 Third-generation Tensor Cores per GPU
- 5 HBM2 stacks, 10 512-bit Memory Controllers
Because of economic reasons, we’re most likely going to see the A100 implementation of the GA100 GPU rather than the full-fledged die. Even this GPU should be an absolute monster in mining, offering at least 200-300 MH/s in Ether mining. However, such a GPU will cost you, a lot. Considering that the GA100 Tensor core GPU is priced at $11,000, a mining variant won’t be priced lower than
10 grand $3,000, making it nearly 7x more expensive than the consumer-grade RTX 3090.