Today marks one week since Lords of the Fallen was released on PC and consoles. In this one week, the developers have released a total of 8 patches to try and address the various performance issues plaguing the game. Unfortunately, the frame rate averages remain more or less unchanged. Last week, when we tested “Lords of the Fallen” for the first time, we got the following performance figures:
The GeForce RTX 4090 fails to hit even 50 FPS at 1080p Ultra, while the Radeon RX 7900 XTX averages 34 FPS. The 1440p performance is more palpable but still far from acceptable, as the RX 7900 XT falls short of the 60 FPS mark.
We tested the game again to account for all the eight patches released and got the same averages. The frame pacing may be a little smoother, but the means and lows remain the same. A closer look at the rendering pipeline reveals shoddy compute shader utilization.
In this particular frame, the four most expensive Compute Shaders and a couple of Pixel Shaders account for most of the render time. The wavefront occupancy of the most intensive (Compute) Shader is as follows:
Of the 16 wavefronts per SIMD (half of a CU), only five are utilized, leaving the stream processors vastly underutilized. The vector registers are the limiting factor, preventing the occupancy of more than five wavefronts. The L0/L1 cache hit rates are also far from ideal. If you check the cache counter directly below 3923, you’ll see a widely fluctuating graph indicating data loads from a higher-level resource, increasing latency and render time.
This shader has a vALU utilization of only 25%. This utilization only includes the use of shaders by this particular shader. Other shaders may be concurrently executing and utilizing the same resource.
The other three compute shaders fare better, with (vALU) shader utilization of 53%, 60.7%, and 93%. The cache hit rates are also steady, with a couple of exceptions. Interestingly, the pixel shaders run in wave64 mode (64 work items per wave) while the compute shaders utilize wave32 (32 work items per wave).
The 1812 pixel shader is compute-bound with 100% vALU usage (despite only ten wavefronts out of sixteen per SIMD due to wave64?). The cache hit rates are also stable, with sufficient vGPRs to go around. The 1833 pixel shader (also wave64) has a wavefront occupancy of 16 out of 16, but the stream processor (vALU) utilization remains low at just 28.9%.
We look forward to testing the game after another half a dozen patches are pushed through. In case you were wondering why NVIDIA Nsight wasn’t used, it’s because it crashes the game at startup. The anti-cheat might have something to do with it.