NVIDIA Working on the World’s Fastest Mining GPU Based on the 7nm GA100

According to a well-reputed source on NVIDIA products, kopite7kimi, the company is working on a monster GPU for miners based on the 7nm GA100 die. This is the same GPU that powers the Ampere-class A100 Tensor Core GPU designed for accelerating neural networks and other AI-intensive workloads that benefit from mixed-precision compute and a ton of bandwidth. Fabricated on TSMC’s 7nm N7 manufacturing process, the NVIDIA Ampere architecture-based GA100 GPU that powers A100 includes 54.2 billion transistors with a die size of 826 mm2.

Coincidentally, Ether mining requires the same two things: compute and as much bandwidth you can muster, making the A100 ideal for deep-pocketed miners. The NVIDIA A100 GPU comes with 40GB of slow but wide HBM2 memory with a massive bandwidth of 1,555 GB/s. That’s nearly 70% more than the bandwidth of the present fastest mining GPU, the GeForce RTX 3090. To act as an intermediate, the A100 also features a ton of on-die cache in the form of 40MB L2, nearly 7 times more than the RTX 3090. With such a massive bandwidth capability, the A100 is going to be at least twice as fast as the RTX 3090 in Ether mining, offering hash rates north of 200 MH/s.

NVIDIA might get rid of the Tensor and high-precision (FP64) cores and replace them with FP32 units to improve the compute capabilities of the GPU, although that will require a significant rework of the SM design, and is therefore unlikely. Either way, if it does happen, we might see hash rates in excess of a whopping 300MH/s. The full implementation of the GA100 GPU includes the following units:

  • 8 GPCs, 8 TPCs/GPC, 2 SMs/TPC, 16 SMs/GPC, 128 SMs per full GPU
  • 64 FP32 CUDA Cores/SM, 8192 FP32 CUDA Cores per full GPU
  • 4 Third-generation Tensor Cores/SM, 512 Third-generation Tensor Cores per full GPU
  • 6 HBM2 stacks, 12 512-bit Memory Controllers

The NVIDIA A100 Tensor Core GPU implementation of the GA100 GPU includes the following units:

  • 7 GPCs, 7 or 8 TPCs/GPC, 2 SMs/TPC, up to 16 SMs/GPC, 108 SMs
  • 64 FP32 CUDA Cores/SM, 6912 FP32 CUDA Cores per GPU
  • 4 Third-generation Tensor Cores/SM, 432 Third-generation Tensor Cores per GPU
  • 5 HBM2 stacks, 10 512-bit Memory Controllers

Because of economic reasons, we’re most likely going to see the A100 implementation of the GA100 GPU rather than the full-fledged die. Even this GPU should be an absolute monster in mining, offering at least 200-300 MH/s in Ether mining. However, such a GPU will cost you, a lot. Considering that the GA100 Tensor core GPU is priced at $11,000, a mining variant won’t be priced lower than 10 grand $3,000, making it nearly 7x more expensive than the consumer-grade RTX 3090.

Areej Syed

Processors, PC gaming, and the past. I have been writing about computer hardware for over seven years with more than 5000 published articles. Started off during engineering college and haven't stopped since. Mass Effect, Dragon Age, Divinity, Torment, Baldur's Gate and so much more... Contact:
Back to top button