The announcement of the RTX 40 series GPUs brings with it a bunch of next-gen rendering technologies aimed at speeding up ray-traced workloads and AI-based upscaling, otherwise known as Deep Learning Super Sampling or DLSS. NVIDIA has teased the latest variant of its upscaling algorithm alongside the RTX 4080/4090 announcement. DLSS 3.0 is the latest iteration of the company’s temporal, neural-network powered upscaler and brings a new capability called Optical Multi Frame Generation.
Optical Multi Frame Generation generates entirely new frames, rather than just pixels, significantly boosting performance. The Optical Flow Accelerator incorporated into the Ada Lovelace microarchitecture analyzes two sequential in-game images and calculates motion vector data for objects and elements that appear in the frame, but are not modeled by traditional game engine motion vectors.
This reduces visual anomalies when the convolution network renders elements such as particles, reflections, shadows, and lighting. Pairs of super-resolution frames from the game, along with both engine and optical flow motion vectors, are then fed into a convolutional neural network that analyzes the data and automatically generates an additional frame for each game-rendered frame. NVIDIA claims a performance uplift of 4x in ray-traced titles when DLSS 3.0 frames are paired with DLSS super-resolution
DLSS 3 builds on top of DLSS 2 integrations, allowing game developers to quickly enable it in existing
titles that already support DLSS 2 or NVIDIA Streamline. DLSS 3 is (only?) supported in GeForce RTX 40 Series GPUs and will debut on Wednesday, Oct. 12.
DLSS 3 has also received support from many of the world’s leading game developers, with more
than 35 games and applications announcing support, including:
● A Plague Tale: Requiem
● Atomic Heart
● Black Myth: Wukong
● Bright Memory: Infinite
● Conqueror’s Blade
● Cyberpunk 2077
● Dakar Rally
● Deliver Us Mars
● Destroy All Humans! 2 – Reprobed
● Dying Light 2 Stay Human
● F1® 22
● F.I.S.T.: Forged In Shadow Torch
● Frostbite Engine
● HITMAN 3
● Hogwarts Legacy
● Jurassic World Evolution 2
● Microsoft Flight Simulator
● Midnight Ghost Hunt
● Mount & Blade II: Bannerlord
● Naraka Bladepoint
● NVIDIA Omniverse™
● NVIDIA Racer RTX
● Portal With RTX
● S.T.A.L.K.E.R 2: Heart of Chornobyl
● Sword and Fairy 7
● The Lord of the Rings: Gollum
● The Witcher 3: Wild Hunt
● THRONE AND LIBERTY
● Tower of Fantasy
● Unreal Engine 4 & 5
● Warhammer 40,000: Darktide
DLSS 3 integrations also incorporate NVIDIA Reflex, which synchronizes the GPU and CPU,
ensuring optimum responsiveness and low system latency.
In addition to improved DLSS performance, Lovelace also features upgraded SMs, offering double the compute performance as Ampere, 3rd Gen RT Cores that are nearly thrice as fast as Ampere, and Tensor cores that are 5x more powerful with FP8 acceleration.
Shader Execution Reordering (SER) is another key feature of the Ada Lovelace architecture. Akin to instruction re-ordering which is a fundamental principle of modern CPUs, it supposedly improves ray-tracing performance by up to 3x and in-game frame rates by up to 25%. I believe we’re merely looking at some kind of warp switching rather than fine-grained thread switching.
Dual NVIDIA Encoders (NVENC) cut export times by up to half and feature AV1 support. The NVENC AV1 encode is being adopted by OBS, Blackmagic Design DaVinci Resolve, Discord, and more.