This week at Computex 2026, we saw Nvidia reveal its RTX Spark, and last month, AMD detailed its Ryzen AI Max 400 "Gorgon Halo" lineup, a refresh of the Strix Halo APUs that lifts supported unified memory to 192GB and allows up to 160GB of that pool to be addressed as VRAM. AMD describes the flagship Ryzen AI Max+ PRO 495 as the first x86 client processor able to run a 300-billion-parameter language model locally, pitching the platform for use cases that need to keep multiple AI agents resident in memory at once.
The market for Gorgon Halo will likely be directly shared with other chips, such as Nvidia's RTX Spark, which debuted at Computex 2026. RTX Spark is also positioned as an on-device agentic computing device. With local AI computing demanding lots of on-device RAM, it poses a difficult issue for device vendors.
DRAM contract prices are forecast to climb another 58% to 63% this quarter, on top of the record 90% to 95% jump TrendForce recorded in Q1, which also saw Nvidia raise the price of its DGX Spark desktop from $3,999 to $4,699, citing memory supply. So, what happens to the dream of accessible local AI compute?
DRAM supply squeeze
The local AI PC has become a category defined by how much memory it carries, and it’s scaling that memory up at a time when memory has never cost more. AMD's three Gorgon Halo SKUs reuse the same Zen 5 cores, RDNA 3.5 graphics, and XDNA 2 NPU as the existing Ryzen AI Max 300 parts, with the Max+ PRO 495 gaining a 100 MHz boost-clock bump to 5.2 GHz, a 40-compute-unit Radeon 8065S, and a 55 TOPS NPU.
Memory capacity has been increased 50% from the 128GB ceiling on Strix Halo, with a leaked PassMark entry putting the 192GB figure as eight 24GB SK hynix LPDDR5X packages on an HP test board, though AMD hasn’t yet confirmed this. Partner systems from Asus, HP, and Lenovo are due in the third quarter of 2026.
It’s all well and good that Nvidia and AMD are releasing machines like the RTX Spark and the Gorgon Halo line-up. However, Samsung, SK hynix, and Micron have all shifted the bulk of their wafer capacity toward high-bandwidth memory for AI accelerators because HBM carries far higher margins than commodity DRAM, and the conventional memory supply has tightened as a direct result of this. HP told investors in February that memory now accounts for roughly 35% of the cost of building a PC, up from 15% to 18% a quarter earlier.
SK Group chairman Chey Tae-won, speaking at Computex 2026 on the show’s official opening day, repeated his position that the shortage will run through 2030, despite the company's intention to double wafer capacity within the next five years. New fabs from all three makers are under construction, but none will reach volume production before late 2027 at the earliest, and most forecasts now predict a structurally higher price floor that persists even after the acute shortage eases.
The 192GB in a Gorgon Halo box, the 128GB in an RTX Spark or DGX Spark, and the LPDDR5X soldered into every AI laptop announced at Computex all come off wafers the memory makers would otherwise sell as HBM. That’s why Nvidia raised the DGX Spark by $700 in February without changing a single spec, and why component makers have begun passing memory costs through directly. One vendor has even taken an extremely on-the-nose approach of adding a flat memory surcharge to every purchase, and in some cases, smaller buyers are now quoted prices that change by the hour.
Bandwidth caps inference speed
A single pool of 192GB would enable an APU to hold a model that would otherwise require a multi-GPU server. While it doesn’t make the model run quickly, dense language model inference reads close to the full set of active weights from memory for every token generated, so generation speed is governed by memory bandwidth divided by the per-token weight footprint, not by idle memory.
Gorgon Halo keeps the same 256-bit LPDDR5X-8000 interface as Strix Halo, which tops out around 256 GB/s in theory and which independent testers have measured closer to 212 GB/s on the GPU. By comparison, the Apple M3 Ultra that AMD and Nvidia are chasing on capacity is rated at 819 GB/s, and an RTX 5090 moves data at 1,792 GB/s.
This gap explains why a dense 70-billion-parameter model fully resident on a Strix Halo iGPU lands in the low single digits of tokens per second, regardless of how much headroom the memory pool has. Our own Corsair AI Workstation 300 review found that Nvidia's slightly higher-bandwidth GB10 pulled ahead of Strix Halo as context length grew, for exactly this reason.
Capacity matters most for mixture-of-experts models, which activate only a fraction of their parameters per token and run far faster than their total size suggests, and for long-context agentic workloads, where it’s KVcache rather than model weights that consume memory. It’s these use cases that AMD’s agentic pitch points at, with leaked details on the next-gen Medusa Halo parts showing a move to LPDDR6 and as much as 80% more bandwidth.
Holding the line on price
Agentic AI is also something of a pricing tool for vendors, beyond describing a workload. A 192GB workstation sold on the promise of running 300-billion-parameter models locally can hold a four-figure price more comfortably than a mini PC sold on cores and clocks, and it justifies loading the most expensive component in the build to its maximum. AMD's Ryzen AI Halo developer box, a 128GB Strix Halo system, opens pre-orders in June at $3,999 through Micro Center, matching the launch price of Acer's GB10-based Veriton GN100 and the original DGX Spark before its increase.
Apple, the one vendor with the scale to hold priority memory allocation, has moved the other way. It pulled the 512GB Mac Studio configuration from sale, raised the price of its 256GB upgrade, and in May removed several more high-memory Mac mini and Mac Studio options as supply tightened.
This shows us beyond doubt that expanding capacity while holding the line on premium pricing is a choice the AMD and Nvidia camps are making, not one that the market is forcing. Whether buyers accept it rests on whether local agentic inference delivers enough value over cloud services to justify the outlay, on machines shipping with memory capacities that outpace the bandwidth that ultimately determines what that memory can do.

1 hour ago
9




English (US) ·