Ray-tracing is the next big thing. It’s the future. With its shiny reflections, it’s something that makes games look super realistic. NVIDIA was the first to popularize the term among the masses via its RTX 20 series “Turing” GPUs, and now everyone including AMD as well as console-makers, Sony and Microsoft have accepted its superiority. But hold on a minute there, what exactly is ray-tracing. Is NVIDIA’s RTX technology any different from ray-tracing or is the same thing? What about DirectX 12? Why do only games built atop DX12 have ray-tracing support? That and everything else you need to know about ray-tracing is what we’ll cover in this post.
For starters, let’s make one thing clear. Ray-tracing isn’t anything new. It’s been around for as long as…computer graphics. However, until now, ray-tracing was mostly just employed in movies or pre-rendered cut-scenes. This is because the technology is pretty taxing and even the top-end GPUs would take days to render standard resolution sequences with full-fledged use of ray-tracing. Now with modern hardware, we are able to do it in real-time.
NVIDIA’s RTX 20 series graphics cards first give us a taste of what ray-tracing is capable of and how it can revolutionalize computer graphics. The fact that both the next-gen PS5 and Xbox Scarlett consoles will include hardware-level support for real-time ray-tracing is another clear marker. AMD is also going to launch its 2nd Gen RDNA based graphics cards (Yes, Big Navi) this year with real-time ray-tracing support. Suffice to say, by the end of 2020, it will be an important technology and available on most mainstream GPUs.
What is Ray Tracing
Ray-Tracing essentially involves casting a bunch of rays from the camera, through the 2D image (your screen) into the 3D scene that is to be rendered. As the rays propagate through the scene, they get reflected, pass through semi-transparent objects and are blocked by various objects thereby producing their respective reflections, refractions, and shadows. After this, they return to the light source with all that data and the scene is rendered.
Neat isn’t it? Except, it’s very performance intensive. To produce an image with reasonable quality, there needs to be a ray for every pixel on the screen. This requires an unreasonable amount of graphics processing power. Hence, most games utilize something called rasterization. It’s a crass hack that works pretty well in creating a near realistic imagine without the need to caste a dozen million rays.
What is Rasterization?
So what is rasterization about? In simple words, rasterization is guessing which objects will be visible on the screen and where their shadows and reflections will be cast. It’s achieved through a bunch of mathematical equations and is mostly an estimation, rather than the real thing. To do this, the Z-buffer (or depth buffer) is used (read: the z-axis).
The vertices of all the objects in the scene (in the form of triangles or polygons) are rendered and placed along the Z-axis according to their distance from the Depth Buffer (the screen or POV). Then the outlines of the objects are extrapolated and those polygons (or parts of objects) that aren’t visible are culled or removed. This is called hidden surface removal. It saves a lot of processing power. Then the various effects such as shadows, reflections, AO, etc are produced for the remaining objects in the scene and the image is rendered.
Here there is no ray-tracing. The shadows are pre-baked into the scene using shadow maps which are basically guesses of where they should lie. When it comes to reflections, there are many ways to render them. The two most popular are cube-maps and screen-space reflections. In cube-mapping, the scene (texture) is usually pre-baked onto the six faces of a cube and stored as six square textures or unfolded into six square regions of a single texture.
Screen space reflections or dynamic reflections are more advanced and performance-intensive. In this technique, the various objects in the scene are re-rendered on transparent surfaces (the resolution will depend on the quality). However, the main drawback of SSR is that the objects that aren’t present in the screen but should be reflected nonetheless aren’t rendered.
For example in this image, the pipes on the ceiling aren’t visible on the screen, but their reflections should be visible on the water below, but it’s not. The broken-down car, however, has its reflection rendered because it’s present in the 2D scene. A similar technique called Screen Space Ambient Occlusion is used to render the ambient shadows but it also suffers from the same drawback. The shadows of objects not rendered on-screen aren’t drawn.
NVIDIA RTX: How Ray-Tracing Works
We’ve explained how ray-tracing works above but let’s get into the details. NVIDIA’s RTX is basically a way to implement ray-tracing via DirectX12. It uses the DXR or DirectX Ray-tracing component of the API. The main steps involved in NVIDIA’s implementation of ray-tracing (RTX) are:
Ray-Casting: This is pretty self-explanatory. The rays are cast into the scene through the pixels and the ones that hit an object (polygon) are considered. The information is calculated from this ray as it reaches the light source such as the distance of the object from the screen, possible shadows, reflections, refractions, etc.
Boundary Volume Hierarchy: BVH is used to make the whole ray-tracing process more efficient. Here, all the objects on the screen are enclosed in boxes of sorts and then the rays are cast. Only the boxes (read: the objects inside the boxes) that are intersected by the rays are considered for the rest of the process.
Instead of casting a ray for each polygon (which would be several million), a ray is cast for every BVH object. The RTCores found in NVIDIA’s Turing graphics cards accelerate this very process.
Denoising: Remember I said that there needs to be one ray for each pixel or possibly more? Yeah, present games that leverage ray-tracing use lesser than an ideal ray count (we’re still not that advanced). This results in blurring:
It’s very faint here as the image is already denoised, but because there is only one ray per pixel (or perhaps less), there isn’t enough data being gathered to render the reflections clearly. This blurring is taken care of by a technique called denoising. It’s a complex method that uses AI-based algorithms to get rid of artifacts and low-quality blur by approximating the color of a pixel, similar to upscaling. This is another feature of NVIDIA’s RTX implementation of ray-tracing uses to maximize performance while obtaining the highest image quality.
There’s also something called path tracing. This is ray-tracing but a more intensive form of it. Here the ray travels from the camera and then bounces off several objects before returning to the light source. Because there is an insane number of rays in the scene at a time due to the repeated bouncing/reflecting, this is quite taxing, but at the same time, the light effects are more precise.
Now, you may ask, what about AMD and the next-gen consoles? How will they leverage ray-tracing? Well, as far as AMD’s 2nd Gen Navi GPUs are concerned, they will for a fact utilize the DirectX 12 API (or more specifically DXR). However, the exact techniques used to make it viable in gaming such as BVH, denoising, etc will have to be developed by the company itself. The same goes for the consoles too (the API in PS5 be different though).
If you’d like to know more about NVIDIA’s RTX GPUs and the Turing architecture, here’s an in-depth piece on the topic: