Benchmarking Blackwell and RTX 50-series GPUs with Multi Frame Generation will require some changes, according to Nvidia

1 week ago 11
Nvidia Blackwell Benchmarking
(Image credit: Nvidia)

Nvidia's Blackwell RTX 50-series GPUs will require new tools for benchmarking, particularly if you're using DLSS 4 Multi Frame Generation (MFG). This was the key takeaway from the final session from Nvidia's Editors' Day on January 8, 2025, where it gave briefings to a few hundred press and influencers on neural rendering, RTX Blackwell architecture, the RTX 50-series Founders Edition cards, RTX AI PCs and generative AI for games, and RTX Blackwell for professionals and creators. Much of the core information isn't new, but let's cover the details.

First, performance isn't just about raw frames per second. We also need to consider latency and image quality. This is basically benchmarking 101, a subject near and dear to my heart (as I sit here benchmarking as many GPUs as possible while prepping for the Arc B570 launch and the impending RTX 5090 and 5080, not to mention AMD RDNA4 and the RX 9070 XT and RX 9070). Proper benchmarking requires not just a consistent approach, but a good selection of games and applications and an understanding of what the numbers mean.

Average FPS is the easiest for most people to grasp, but we also report 1% low FPS. For us, that's the average performance of the bottom 1% of frametimes. (I wrote a script to parse the CSV files in order to calculate this.) It's important because pure minimum FPS — the highest frametime out of a benchmark run — can vary wildly over multiple runs. A single bad frame could drop the minimum FPS from 100 down into the teens, and doing multiple runs only partially helps. So instead, we find the 99th percentile, the frametime above which only the worst 1% of frames reside, and then divide the count of those frames by the sum of all the time required. It's a good measurement of how consistent a game is.

Can you dig deeper? Yes, absolutely. The difficulty is that it starts to require more time, and the additional information gleaned from doing so suffers from a classic case of diminishing returns. It already takes about a full workday (ten hours, if I'm being real) to benchmark just one GPU on my current test suite. That's 20-something games, multiple applications, and multiple runs on every test. And when you're in a situation like right now where everything needs to be retested on as many GPUs as possible? You can't just decide to increase the testing time by 50%.

Nvidia Blackwell Benchmarking

(Image credit: Nvidia)

Nvidia's FrameView utility, which I've been using for the past two years, is a great tool for capturing frametimes and other information — including CPU use, GPU use, GPU clocks, GPU temperatures, and even real-time GPU power if you have a PCAT adapter (which we do). But there are multiple measurements provided, including the standard MsBetweenPresents (the default for PresentMon) and Nvidia's newer MsBetweenDisplayChange.

With Blackwell, Nvidia recommends everyone doing benchmarks switch to using MsBetweenDisplayChange, as it's apparently more accurate. Looking at the numbers, most of the time it's not all that different, but Nvidia says it will better capture dropped frames, frametime variation, and the new flip metering that's used by MFG. So, if you want to get DLSS 4 "framerates" — in quotes because AI-generated frames are not the same as fully rendered frames — you'll need to make the switch. That's easy enough, and it's what we plan to do, whether or not we're testing with MFG.

Nvidia Blackwell Benchmarking
(Image credit: Nvidia)

Nvidia then goes on to pose the (hopefully rhetorical) question: Is image quality important? The answer is yes, obviously, but here's where we run into problems. When everything renders the same way, we should have the same output — with only minor differences at most. But in the modern era of upscaling and frame generation, not to mention differences in how ray tracing and denoising are accomplished? It becomes very messy.

So, all you need to do is capture videos of every benchmark and then dissect them. Easy, right? [cough] Speaking from experience, best-case just the capturing of such videos adds 50% to the amount of time it takes to conduct benchmarking. Analyzing the videos and composing useful content from them is something more usable on an individual game basis, rather than for graphics reviews.

I wish that weren't the case. I wish it was possible to get all the benchmarks from all the potential configurations with clear image quality comparisons using every possible setting. It's not. And it's foolhardy to think otherwise. It's also why, for the time being, our GPU reviews will primarily focus on non-upscaling, non-framegen performance as the baseline, where image quality shouldn't differ too much. We can do some testing of upscaling and framegen as well, but that will be a secondary consideration.

And we feel that's pretty fair. Because no matter how much marketing gets thrown at the problem, frame generation differs from rendering and upscaling. It adds latency, and while it makes the visuals on your display smoother, 100 FPS with framegen doesn't feel the same as 100 FPS without framegen — and certainly not the same as 100 FPS with multi-frame generation! Without framegen, user input would get sampled every ~10ms. With framegen, that drops to every ~20ms. With MFG, it could fall as far as sampling every ~40ms.

Upscaling is a different matter. What we've seen in the past clearly shows that DLSS upscaling delivers superior image quality to FSR2/3 and XeSS upscaling. And now, Nvidia is about to overhaul DLSS to further improve image fidelity thanks to a transformers based AI model. It will run slower than the older CNN model but it will look better. How much better? Yeah, things just became that much more complex.

There's more to the benchmarking discussion, including AI and professional workloads. We test all these areas on our graphics card reviews, and what Nvidia shows already agrees with what we've been doing for the most part. If you have any thoughts or feedback on the matter, let us know in the comments. The full deck from the session is included below for reference.

Nvidia Blackwell Benchmarking
(Image credit: Nvidia)

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Jarred Walton is a senior editor at Tom's Hardware focusing on everything GPU. He has been working as a tech journalist since 2004, writing for AnandTech, Maximum PC, and PC Gamer. From the first S3 Virge '3D decelerators' to today's GPUs, Jarred keeps up with all the latest graphics trends and is the one to ask about game performance.

Read Entire Article