We’ve been hearing a lot about DirectX 12 and ray-tracing (RTX in NVIDIA terms) over the last 10-15 months. While ray-tracing (DXR) is a core feature of the new low-level API, it’s not all there is to it. DirectX 12 changes how games are developed and rendered in many ways. Ever since games built atop the new API arrived, there’s been a lot of debate as to how it differs from DirectX 11 and whether it’s actually as big as the industry wants you to believe. In this post, we explore DirectX 12 and see what makes it such a step up from DirectX 11.
DirectX 11 vs DirectX 12: What Does it Mean for PC Gamers
There are three main advantages of the DirectX 12 API for PC gamers:
Better Scaling with Multi-Core CPUs
One of the core advantages of low-level APIs like DirectX 12 and Vulkan is improved CPU utilization. Traditionally with DirectX 9 and 11 based games, most games only used 2-4 cores for the various mechanics: Physics, AI, draw-calls, etc. Some games were even limited to one. With DirectX 12 that has changed. The load is more evenly distributed across all cores, making multi-core CPUs more relevant for gamers.
Maximum hardware utilization
Many of you might have noticed that in the beginning, AMD GPUs favored DirectX 12 titles more than rival NVIDIA parts. Why is that?
The reason is better utilization. Traditionally, NVIDIA has had much better driver support while AMD hardware has always suffered from the lack thereof. DirectX 12 adds many technologies to improve utilization such as asynchronous compute which allows multiple stages of the pipeline to be executed simultaneously (read: Compute and Graphics). This makes poor driver support a less pressing concern.
Closer to Metal Support
Another major advantage of DirectX 12 is that developers have more control over how their game utilizes the hardware. Earlier this was more abstract and was mostly taken care of by the drivers and the API (although some engines like Frostbyte and Unreal provided low-level tools as well).
Now the task falls to the developers. They have closer to metal access, meaning that most of the rendering responsibilities and resource allocation are handled by the game engines with some help from the graphics drivers.
This is a double-edged sword as there are multiple GPU architectures out in the wild and for indie devs, it’s impossible to optimize their game for all of them. Luckily, third-party engines like Unreal, CryEngine, and Unity do this for them and they only have to focus on designing.
How DirectX 12 Improves Hardware Utilization
Again, there are three main API advances that facilitate this gain:
Pipeline State Objects
In DirectX 11 the objects in the GPU pipeline exist across a wide range of states such as Vertex Shader, Hull Shader, Geometry shader, etc. These states are often inter-dependent on one another and can’t progress unless the previous stage completes execution. This reduces parallel execution in the DirectX 11 pipeline and leads to bottlenecks.
Each of them needs to be defined individually and the next state can’t be executed until the previous one has been finalized, despite the fact that each requires different resources. This effectively leaves the hardware under-utilized resulting in increased overhead and reduced draw calls.
DirectX 12 replaces the various states with Pipeline State Objects (PSO) which are finalized upon creation itself. These PSOs can be converted into any state as per requirement without depending on any other object or state. The PSOs can be dynamically switched to and fro from the registers by transferring a small amount of data.
Turing with the help of DirectX 12 helps introduces Task Shaders and Mesh Shaders. These two new shaders replace the various cumbersome shader stages involved in the DX11 pipeline for a more flexible approach.
The mesh shader performs the same task as the domain and geometry shaders but internally it uses a multi-threaded instead of a single-threaded model. The task shader works similarly. The major difference here is that while the input of the hull shader was patches and the output of the tessellated object, the task shader’s input and output are user-defined.
In the below scene, there are thousands of objects that need to be rendered. In the traditional model, each of them would require a unique draw call from the CPU. However, with the task shader, a list of objects using a single draw call is sent. The task shader then processes this list in parallel and assigns work to the mesh shader (which also works synchronously) after which the scene is sent to the rasterizer for 3D to 2D conversion.
This approach helps reduce the number of CPU draw calls per scene significantly, thereby increasing the level of detail.
Mesh shaders also include culling of unused triangles. This is done using the amplification shader. It runs prior to the mesh shader and determines the number of mesh shader threadgroups needed. They test the various meshlets for possible intersections and screen visibility and then carry out the required culling. Geometry culling at this early rendering stage significantly improves performance. You can read more here…NVIDIA’s Mesh and Hull Shaders also leverage DX12
With DirectX 11, there’s only a single queue going to the GPU. This leads to uneven distribution of load across various CPU cores, essentially crippling multi-threaded CPUs.
This is somewhat alleviated by using a deferred context, but even then, ultimately there’s only one stream of commands leading to the CPU at the final stage. DirectX 12 introduces a new model that uses command lists that are can be executed independently, increasing multi-threading. This includes dividing the workload into smaller commands requiring different resources, allowing simultaneous execution. This is how Asynchronous compute works by dividing the compute and graphics commands into separate queues and executing them concurrently.
In DirectX 11, resource binding was highly abstract and convenient but not the best in terms of hardware utilization. It left many of the hardware components unused or idle. Most game engines would use “view objects” to allocate resources and bind them to various shader stages of the GPU pipeline.
The objects would be bound to slots along the pipeline at draw time and the shaders would derive the required data from these slots. The drawback of this model is that when the game engine needs a different set of resources, the bindings are useless and must be re-allocated.
DirectX 12 replaces the resource views with descriptor heaps and tables. A descriptor is a small object that contains information about one resource. These are grouped together to form descriptor tables which in turn are stored in a heap.
Ideally, a descriptor table stores information about one type of resource while a heap contains all the tables required to render one or more frames. The GPU pipeline accesses this data by referencing the descriptor table index.
As the descriptor heap already contains the required descriptor data, in case a different set of resources is needed, the descriptor table is switched which is much more efficient than rebinding the resources from scratch.
Other features that come with the DirectX 12 are:
DirectX Raytracing (DXR): This is essentially the API support for real-time ray-tracing that NVIDIA so lovingly calls RTX.
Variable Rate shading: Variable Rate Shading shading allows the GPU to focus on areas of the screen that are more “visible” and affected per frame. In a shooter, this would be the space around the cross-hair. In contrast, the region around the border of the screen is mostly out of focus and can be ignored (to some degree).
It allows the developers to focus more on the areas that actually affect the apparent visual quality (the center of the frame in most cases) while reducing the shading in the peripheries.
VRS is of two types: Content Adaptive Shading and Motion Adaptive Shading:
CAS allows individual shading of each of the 16×16 screen tiles (tiled rendering), allowing the GPU to increase the shading rate in regions that stand out while reducing them in the rest.
Motion adaptive shading is as it sounds. It increases the shading rate of objects that are in motion (changing every frame) while reducing that of relatively static objects. In the case of a racing game, the car will get increased shading while the sky and off-road regions will be given reduced priority.
Multi-GPU Support: DirectX 12 has support for two types of multi-GPU support, namely implicit and explicit. Implicit is essentially SLI/XFX and leaves the job to the vendor driver. Explicit is more interesting and lets the game engine control how the two GPUs function in parallel. This allows for better scaling and mixing and matching different GPUs even ones from different vendors (including your dGPU and iGPU).
Another major advantage is that the VRAM images of the two GPUs aren’t mirrored and can be stacked to double the video memory. This and a ton of other features make DirectX 12 a major upgrade to the software side of PC gaming. It’ll take a while to leverage all these features but some are already apparent.
Some DirectX 12 titles like Ashes and Sniper Elite 4 achieve excellent multi-GPU scaling. Likewise, a lot of older AMD GPUs see a healthy boost in async compute enabled titles. The gains in the case of GeForce cards are relatively smaller as they already utilized most of the resources quite well thanks to excellent drivers.
DirectX 12 Ultimate: How is it Different from DirectX 12?
DirectX 12 Ultimate is an incremental upgrade over the existing DirectX 12 (tier 1.1), and its core advantage is cross-platform support: Both the next-gen Xbox Series X as well as the latest PC games will leverage it. This not only simplifies cross-platform porting but also makes it easier for developers to optimize their games for the latest hardware.
By the time, the Xbox Series X arrives later this year, game developers will have already had enough time with hardware using the same graphics API (NVIDIA’s Turing), simplifying the porting and optimization process. At the same time, this will also improve utilization on the latest PC hardware, improving the overall performance. All in all, it’s another step by Microsoft to unify the Xbox and PC gaming platforms.
Other than that, it introduces DirectX Raytracing 1.1, Sampler Feedback, Mesh Shaders and Variable Rate Shading. The last two were already supported by NVIDIA’s RTX Turing GPUs (and are explained above), but, this will result in widespread adoption by newer games and developers.
DirectX Raytracing is a minor upgrade over the existing 1.0 version:
- Raytracing is now fully GPU controlled and doesn’t require draw calls from the CPU, reducing the CPU overhead and improving performance.
- New raytracing shaders can be loaded as and when needed, depending upon the player’s location in the game world.
- Inline raytracing is one of the core additions to DirectX 12 Ultimate. It gives developers more control over the raytracing process. It’s available in any stage of the rendering pipeline and is feasible in cases where the shading complexity is minimal.