DirectX 12 debuted two years back, promising significant performance and efficiency boosts across the board. This includes better CPU utilization, closer to metal access as well as a host of new features most notably ray-tracing or DXR (DirectX Ray-tracing). But what exactly is DirectX 12 and how is it different from DirectX 11. Let’s have a look.
What is DirectX: It’s an API
Like Vulkan and OpenGL, DirectX is an API that allows you to run video games on your computer. However, unlike its counterparts, DX is a Microsoft proprietary platform and only runs on Windows natively. OpenGL and Vulkan, on the other hand, run on Mac as well as Linux.
Now, what does a graphics API like DirectX do? It acts as an intermediate between the game engine and the graphics drivers, which in turn interact with the OS Kernel. A graphics API is a platform where the actual game designing and mechanics are figured out. Think of it as MS Paint where the game is the painting and the paint application is the API. However, unlike paint, the output program of a graphics API is only readable by the API used to design it. In general, an API is designed for a specific OS. That’s the reason why PS4 games don’t run on the Xbox One and vise versa.
DirectX 12 Ultimate is the first graphics API that breaks that rule. It will be used on both Windows as well as the next-gen Xbox Series X. With DX12 Ultimate, MS is basically integrating the two platforms.
DirectX 11 vs DirectX 12: What Does it Mean for PC Gamers
There are three main advantages of the DirectX 12 API for PC gamers:
Better Scaling with Multi-Core CPUs
One of the core advantages of low-level APIs like DirectX 12 and Vulkan is improved CPU utilization. Traditionally with DirectX 9 and 11 based games, most games only used 2-4 cores for the various mechanics: Physics, AI, draw-calls, etc. Some games were even limited to one. With DirectX 12 that has changed. The load is more evenly distributed across all cores, making multi-core CPUs more relevant for gamers.
Maximum hardware utilization
Many of you might have noticed that in the beginning, AMD GPUs favored DirectX 12 titles more than rival NVIDIA parts. Why is that?
The reason is better utilization. Traditionally, NVIDIA has had much better driver support while AMD hardware has always suffered from the lack thereof. DirectX 12 adds many technologies to improve utilization such as asynchronous compute which allows multiple stages of the pipeline to be executed simultaneously (read: Compute and Graphics). This makes poor driver support a less pressing concern.
Closer to Metal Support
Another major advantage of DirectX 12 is that developers have more control over how their game utilizes the hardware. Earlier this was more abstract and was mostly taken care of by the drivers and the API (although some engines like Frostbyte and Unreal provided low-level tools as well).
Now the task falls to the developers. They have closer to metal access, meaning that most of the rendering responsibilities and resource allocation are handled by the game engines with some help from the graphics drivers.
This is a double-edged sword as there are multiple GPU architectures out in the wild and for indie devs, it’s impossible to optimize their game for all of them. Luckily, third-party engines like Unreal, CryEngine, and Unity do this for them and they only have to focus on designing.
How DirectX 12 Improves Performance by Optimizing Hardware Utilization
Again, there are a few main API advances that facilitate this gain:
Per-Call API Context
Like every application, graphics APIs like DirectX also feature a primary thread that keeps track of the internal API state (resources, their allocation, and availability). With DirectX 9 and 11, there’s a global state (or context). The games you run on your PC modify this state via draw calls to the API, after which it’s submitted to the GPU for execution. Since there’s a single global state/context (and a single main thread on which it’s run), it makes it difficult to multi-thread as multiple draw calls simultaneously can cause errors. Furthermore, modifying the global state via state calls is a relatively slower process, further complicating the entire process.
With DirectX 12, the draw calls are more flexible. Instead of a single global state (context), each draw call from the application has its own smaller state (see PSOs below for more). These draw calls contain the required data and associated pointers within and are independent of other calls and their states. This allows the use of multiple threads for different draw calls.
Pipeline State Objects
In DirectX 11, the objects in the GPU pipeline exist across a wide range of states such as Vertex Shader, Hull Shader, Geometry shader, etc. These states are often interdependent on one another and the next successive one can’t progress unless the previous stage is defined. When the geometry from a scene is sent to the GPU for rendering, the resources and hardware required can vary depending on the rasterizer state, blend state, depth stencil state, culling, etc.
Each of the objects in DirectX 11 needs to be defined individually (at runtime) and the next state can’t be executed until the previous one has been finalized as they require different hardware units (shaders vs ROPs, TMUs, etc). This effectively leaves the hardware under-utilized resulting in increased overhead and reduced draw calls.
In the above comparison, HW state 1 represents the shader code, 2 is a combination of the rasterizer and the control flow linking the rasterizer to the shaders. State 3 is the linkage between the blend and pixel shader. The Vertex Shader affects HW states 1 & 2, the Rasterizer state 2, Pixel shader states 1-3, and so on. As already explained in the above section, this introduces some additional CPU overhead as the driver generally prefers to wait till the dependencies are resolved.
DirectX 12 replaces the various states with Pipeline State Objects (PSO) which are finalized upon creation itself. A PSO in simple words is an object that describes the state of the draw call it represents. An application can create as many PSOs as required and can switch between them as needed. These PSOs include the bytecode for all shaders including, vertex, pixel, domain, hull, and geometry shader and can be converted into any state as per requirement without depending on any other object or state.
NVIDIA and AMD’s latest GPUs, with the help of DirectX 12 introduce Task Shaders and Mesh Shaders. These two new shaders replace the various cumbersome shader stages involved in the DX11 pipeline for a more flexible approach.
The mesh shader performs the same task as the domain and geometry shaders but internally it uses a multi-threaded instead of a single-threaded model. The task shader works similarly. The major difference here is that while the input of the hull shader was patches and the output of the tessellated object, the task shader’s input and output are user-defined.
In the below scene, there are thousands of objects that need to be rendered. In the traditional model, each of them would require a unique draw call from the CPU. However, with the task shader, a list of objects using a single draw call is sent. The task shader then processes this list in parallel and assigns work to the mesh shader (which also works synchronously) after which the scene is sent to the rasterizer for 3D to 2D conversion.
This approach helps reduce the number of CPU draw calls per scene significantly, thereby increasing the level of detail.
Mesh shaders also facilitate the culling of unused triangles. This is done using the amplification shader. It runs prior to the mesh shader and determines the number of mesh shader thread groups needed. They test the various meshlets for possible intersections and screen visibility and then carry out the required culling. Geometry culling at this early rendering stage significantly improves performance. You can read more here…
NVIDIA’s Mesh and Hull Shaders also leverage DX12
With DirectX 11, there’s only a single queue going to the GPU. This leads to uneven distribution of load across various CPU cores, essentially crippling multi-threaded CPUs.
Continue reading on the next page…