Igor's Lab received a defective PowerColor RX 9070 XT Hellhound from a reader who had purchased the card. While it had apparently passed quality control tests, Igor found a defective RDNA 4 die that resulted in unsustainable temperatures — which persisted even after re-pasting. According to the tech outlet, the culprit was "pronounced pitting," which can occur during the backgrinding process.
Igor's Lab does preface its report by noting that, for now, this is an isolated case and it cannot fully confirm what caused the damage to the die. It could be a single bad card, or perhaps a faulty production line may have resulted in a bad batch of dies. Either way, the extent of the problem isn't clear at present.
Functionally, while nearly invisible to the naked eye, the surface of the silicon had irregularities that translate directly into extremely high hotspot temperatures, making the GPU unusable. Igor's Lab recorded a whopping 46 degrees Celsius (C) delta between the average GPU temperature and the hotspot temperatures, with the latter touching 113 C. 110 C is the limit for RDNA-based products, so the high hotspot temp resulted in the RX 9070 XT card thermally throttling.
Further inspection with a microscope revealed more than 1934 craters or pits in the silicon, amounting to over 1% of the chip surface. Igor alleges that this is well outside the normal tolerance level for modern chips, particularly high power chips like the Navi 48 used in the 9070 XT.
Igor's Lab says it used general industry guideline values as a reference point for allowable pit size, as RDNA 4 currently lacks any publicly available specifications for the maximum depth of a defect. "... As a rule, depths ≤ 5-10 µm with a diameter ≤ 50-100 µm are not considered critical, provided they do not occur near die edges or bond surfaces. In more sensitive areas or in applications with high mechanical stress (such as particularly thin dies), a defect with a depth of more than 2-3 µm can already be critical."
The outlet measured one pit on the faulty card with a depth of 12.59 µm and a diameter of 212.36 µm, which is beyond the standard industry guidelines. Igor's Lab suspects the origin of the damage came from improper backgrinding of the die. Backgrinding is a process where the back of the silicon (the "top" once the chip is installed in a PCB) is ground down to an appropriate thickness, which can vary depending on the design and use case.
Similar to sanding, complications in the backgrinding process can occur. Debris from the grinding process can cause scratches and pitting, flaking, or other irregularities that affect the silicon's structural integrity and reduce cooling effectiveness. Inappropriate thermal mechanical stress can also occur, which can damage the die during the grinding process.
Regardless of where the damage came from, Igor's Lab concludes that there are multiple parties that failed to catch the problem. As the card comes from PowerColor's factory, it's ultimately responsible for inadequate Q&A. TSMC and other involved parties also passed this particular sample, potentially due to AI-based inspection algorithms that didn't have sufficient training to detect the problems.
At present, the issue doesn't appear to be widespread. AMD told Igor's Lab that the defective PowerColor RX 9070 XT is "an isolated incident." Hopefully, that's correct and there aren't further incidents. And considering the GPU prices and supply shortages, hopefully the owner of the card is able to RMA it — or get a replacement if Igor's Lab purchased the card from the reader.