DirectX 12 debuted two years back, promising significant performance and latency boosts across the board. This includes better CPU utilization, closer to metal access as well as a host of new features most notably ray-tracing or DXR (DirectX Ray-tracing). But what exactly is DirectX 12 and how is it different from DirectX 11. Let’s have a look.
What is DirectX: It’s an API
Like Vulkan and OpenGL, DirectX is an API that allows you to run video games on your computer. However, unlike its counterparts, DX is a Microsoft proprietary platform and only supports Windows. OpenGL and Vulkan, on the other hand, run on Mac as well as Linux.
Now, what does a graphics API like DirectX do? It acts as an intermediate between the game engine and the graphics drivers, which in turn interact with the OS Kernel. A graphics API is a platform where the actual game designing and mechanics are figured out. Think of it as MS Paint where the game is the painting and the paint application is the API. However, unlike paint, the output program of a graphics API is only readable by the API used to design it. In general, an API is designed for a specific OS. That’s the reason why PS4 games don’t run on the Xbox One and vise versa.
DirectX 12 Ultimate is the first graphics API that breaks that rule. It will be used on both Windows as well as the next-gen Xbox Series X. With DX12 Ultimate, MS is basically integrating the two platforms.
DirectX 11 vs DirectX 12: What Does it Mean for PC Gamers
There are three main advantages of the DirectX 12 API for PC gamers:
Better Scaling with Multi-Core CPUs
One of the core advantages of low-level APIs like DirectX 12 and Vulkan is improved CPU utilization. Traditionally with DirectX 9 and 11 based games, most games only used 2-4 cores for the various mechanics: Physics, AI, draw-calls, etc. Some games were even limited to one. With DirectX 12 that has changed. The load is more evenly distributed across all cores, making multi-core CPUs more relevant for gamers.
Maximum hardware utilization
Many of you might have noticed that in the beginning, AMD GPUs favored DirectX 12 titles more than rival NVIDIA parts. Why is that?
The reason is better utilization. Traditionally, NVIDIA has had much better driver support while AMD hardware has always suffered from the lack thereof. DirectX 12 adds many technologies to improve utilization such as asynchronous compute which allows multiple stages of the pipeline to be executed simultaneously (read: Compute and Graphics). This makes poor driver support a less pressing concern.
Closer to Metal Support
Another major advantage of DirectX 12 is that developers have more control over how their game utilizes the hardware. Earlier this was more abstract and was mostly taken care of by the drivers and the API (although some engines like Frostbyte and Unreal provided low-level tools as well).
Now the task falls to the developers. They have closer to metal access, meaning that most of the rendering responsibilities and resource allocation are handled by the game engines with some help from the graphics drivers.
This is a double-edged sword as there are multiple GPU architectures out in the wild and for indie devs, it’s impossible to optimize their game for all of them. Luckily, third-party engines like Unreal, CryEngine, and Unity do this for them and they only have to focus on designing.
How DirectX 12 Improves Performance by Optimizing Hardware Utilization
Again, there are three main API advances that facilitate this gain:
Pipeline State Objects
In DirectX 11, the objects in the GPU pipeline exist across a wide range of states such as Vertex Shader, Hull Shader, Geometry shader, etc. These states are at times inter-dependent on one another and the next successive one can’t progress unless the previous stage is defined.
Each of the objects in DirectX 11 needs to be defined individually (at runtime) and the next state can’t be executed until the previous one has been finalized as they require different hardware units (shaders vs ROPs, TMUs, etc). This effectively leaves the hardware under-utilized resulting in increased overhead and reduced draw calls.
DirectX 12 replaces the various states with Pipeline State Objects (PSO) which are finalized upon creation itself. These PSOs include the bytecode for all shaders including, vertex, pixel, domain, hull, and geometry shader and can be converted into any state as per requirement without depending on any other object or state. The PSOs can be dynamically switched to and fro from the registers by transferring a small amount of data.
NVIDIA Turing GPUs with the help of DirectX 12 introduces Task Shaders and Mesh Shaders. These two new shaders replace the various cumbersome shader stages involved in the DX11 pipeline for a more flexible approach.
The mesh shader performs the same task as the domain and geometry shaders but internally it uses a multi-threaded instead of a single-threaded model. The task shader works similarly. The major difference here is that while the input of the hull shader was patches and the output of the tessellated object, the task shader’s input and output are user-defined.
In the below scene, there are thousands of objects that need to be rendered. In the traditional model, each of them would require a unique draw call from the CPU. However, with the task shader, a list of objects using a single draw call is sent. The task shader then processes this list in parallel and assigns work to the mesh shader (which also works synchronously) after which the scene is sent to the rasterizer for 3D to 2D conversion.
This approach helps reduce the number of CPU draw calls per scene significantly, thereby increasing the level of detail.
Mesh shaders also facilitate the culling of unused triangles. This is done using the amplification shader. It runs prior to the mesh shader and determines the number of mesh shader thread groups needed. They test the various meshlets for possible intersections and screen visibility and then carry out the required culling. Geometry culling at this early rendering stage significantly improves performance. You can read more here…NVIDIA’s Mesh and Hull Shaders also leverage DX12
With DirectX 11, there’s only a single queue going to the GPU. This leads to uneven distribution of load across various CPU cores, essentially crippling multi-threaded CPUs.
This is somewhat alleviated by using a deferred context, but even then, ultimately there’s only one stream of commands leading to the CPU at the final stage. DirectX 12 introduces a new model that uses command lists that are can be executed independently, increasing multi-threading. This includes dividing the workload into smaller commands requiring different resources, allowing simultaneous execution. This is how Asynchronous compute works by dividing the compute and graphics commands into separate queues and executing them concurrently.
Continue reading on the next page…