An unknown NVIDIA graphics card has surfaced on Geekbench that packs an insane 108 CUs (SM) or 6,912 cores. We’re going by the Pascal SM as the Volta or Turing design would mean 16K+ cores which is unlikely. This monster GPU is paired with 48 GB of VRAM, most likely HBM2E with a potential 2,048/3,072-bit bus. And you guessed right. This won’t be a gaming GPU. It’ll be for Data Centers and professional workloads and cost a lot more than your average consumer flagship. Okay, let’s have a closer look at the GPUs. Yes, there are two new Ampere parts that’ll most likely be the new Tesla GPUs:
The first sample we have is a 108 CU GPU with a VRAM of 48GB (GB says 46.8 but that’s inaccurate) and a 1.01 GHz boost clock. First, let’s address the GPU core. 108 CU could mean 13,824 cores if we go by the newer Volta/Turing SM, but considering that that’s a little over the top, we’ll stick to the older Pascal SM with 64 cores each. That amounts to 6,912 cores which is more reasonable if NVIDIA goes with TSMC’s 7nm or Samsung’s 8nm process.
Let’s have a look at the memory. As already mentioned, Workstation-Ampere will leverage HBM2E. Since the mentioned VRAM size is 46.8 GB (we’ll take it as 48GB), there can be two configurations: 8-hi or 12-hi. The former would result in eight 16Gbit (2GB) memory chips stacked one above the other, resulting in three packages and a total bus width of 3,072-bits (1,024 per stack). The other would be a 12-chip setup with two stacks and a total width of 2,048-bits. Considering that no vendor till now has used this possible config, I believe it’ll be the former. The resulting bandwidth should be an absolutely groundbreaking ~2TB/s. The memory bandwidth is really important in Workstation and Data Center workloads that process a crap-ton of data every day.
The other Ampere GPU has a higher core count but lower memory size. The core count is pegged at 118 CUs which would mean 7,552 by the older SM config (or…15,104 by Volta standards). The 23.8 GB memory size also backs the HBM2E theory. There can be two possible scenarios here. A 12-hi stack with a 1,024-bit bus size or two 8-hi with a 2,048-bit bus size. I’m once again going to go with the latter. It’ll provide much higher bandwidth and will be easier to implement.
An interesting thing to note here is that the second rumored Ampere GPU with a higher core count has a smaller VRAM size and vise versa. It also scores much higher than the first one. Here’s a comparison:
Ampere GPU1: 141,656 points
Ampere GPU2: 184,096 points
Titan RTX: 132,804 points
Tesla V100: 154,606 points
Pretty neat, innit? That’s a sizeable uplift, and I reckon this is just an engineering sample and not the final product. This means that there’s a good chance that we’ll the Data Center Ampere flagship at GTC next month.