Another unknown NVIDIA GPU has been spotted which can only be an Ampere part with as many as 8,000 CUDA cores (nearly). The graphics processor is paired with 32GB of HBM2 memory across a potential 1,024-bit (if one stack) or 2,048-bit (if two stacks) bus. It has a total of 124 SMs and a whopping 32MB of L2 cache, five times more than the V100’s (Pascal-based Tesla) 6MB.
The score is also much higher than that of the earlier two GPUs, meaning that this is probably the A100 flagship. The others could be cut-down variants or perhaps Quadro cards based on the Ampere architecture.
The other GPU has the same core count but features lesser memory (24GB HBM2). It’s possible that this is the same GPU with a slimmer bus of 1024-bits as 24GB is the upper limit for one stack.
Another important characteristic of these GPUs (Thanks @rogame) is the presence of two asynchronous compute engines. As far as I know, NVIDIA’s GPUs don’t have separate async engines. The warp scheduler along with the dispatch units handle the scheduling in parallel with the drivers. Although with Volta and Turing, there was support for simultaneous FP32 and INT32 compute in the same cycle, there were still no dedicated async engines. So this is indeed a big change, one that will likely carry on to the gaming GeForce cards.
Lastly, just because we’re seeing Data Center Ampere cards, this doesn’t mean that the launch of consumer Ampere is near. The Tesla GPU for a new generation comes several months before the GeForce counterparts.