AMD's upcoming RDNA 5 GPUs might improve dual-issue execution & use shader units more efficiently — LLVM patch adds new FMA instruction to ease compiling

3 weeks ago 35

The next generation of Radeon GPUs from AMD are expected to be a significant upgrade over RDNA 4, and one of the issues Team Red seems to be tackling is dual issue execution. That's the GPU's ability to execute two instructions in the same cycle — AMD's cards have had this feature since RDNA 3, but strict pairing rules meant that compilers couldn't always take advantage of it, limiting theoretical peak performance. A new LLVM patch now suggests that AMD will be solving this on RDNA 5.

Go deeper with TH Premium: GPUs

Coelacanth's Dream, a Linux-focused outlet, examined the new changes and found out they reference gfx13, which is derived from gfx130, aka RDNA 5. AMD is apparently adding a new instruction format called "VOPD3" that is designed to better interface with the dual issue VALU (Vector Arithmetic Logic Unit; shader unit). It should be more lenient, making it easier for the compiler to use dual issue execution.

On a technical level, the existing system, known as VOPD, largely only worked with simpler 2-operand instructions, which made it harder for compilers to schedule compatible instruction pairs. VOPD3 will expand this to 3-operand instructions, so it would be able to support operations like fused multiply-add (FMA). In fact, V_FMA_F32 was added in this very pull request and that's how we can infer it'll be on RDNA 5.

Article continues below

This would allow dual issue execution to happen more often, leading to a potentially massive increase in FP32 throughput (in some cases). Shader units will spend less time waiting for clock cycles and instead get more work done, making each instruction more efficient. This could help in demanding scenarios, such as rendering, which means game engines will be able to able to optimize for dual issue VALU.

Reducing the number of cases where pairing fails due to restrictions is a key step to making the hardware more efficient without brute-forcing IPC uplifts through silicon. FMA instructions are also important when it comes to neural rendering, so things like upscaling and frame-gen tech can also get a boost here, even if the hardware itself is not more performant — since dual issue execution improves efficiency regardless.

You can check out the Coelacanth's Dream article linked above if you're interested in more specifics, but be warned that it's very dense. Moreover, RDNA 5 is a ways out at this point, and more consumer-facing updates like higher core counts would certainly be a more marketable trait. Still, seeing a GPU reach its advertised FP32 throughput more easily and more consistently is a big architectural win.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.

Read Entire Article