Nvidia launches BlueField-4 STX storage architecture for agentic AI at GTC 2026
3 hours ago
7
(Image credit: Nvidia)
Nvidia announced BlueField-4 STX at GTC 2026 on March 16, a modular reference architecture for accelerated storage designed to address the data access bottleneck limiting agentic AI inference.
Built around a new storage-optimized BlueField-4 DPU and ConnectX-9 SuperNIC, the platform targets GPU underutilization that occurs when AI agents operating across extended sessions and expanding context windows exceed the throughput of conventional storage paths. Nvidia says STX delivers up to five times the token throughput, four times better energy efficiency, and twice the page ingestion speed compared with traditional CPU-based storage architectures.
The specific issue that Nvidia is targeting with STX is KV cache management. During transformer inference, the attention mechanism computes KV pairs for every token in context, which must be stored and retrieved for each subsequent generation step. But these context windows are growing into the hundreds of thousands of tokens, meaning that the KV cache is outgrowing GPU HBM capacity. The usual fallback is to offload to host DRAM or NVMe storage, but both routes pass through the CPU, adding latency that compounds with context length and stalls GPU execution as data transits.
Article continues below
STX bypasses the host CPU by routing data through a dedicated accelerated storage layer via RDMA over Spectrum-X Ethernet. BlueField-4 manages NVMe SSDs directly and handles data integrity and encryption for the KV cache, keeping context accessible at the storage processor rather than transiting the host. The full stack runs on the Vera Rubin platform and integrates the Vera CPU — also announced at GTC on March 16 — alongside ConnectX-9, Spectrum-X Ethernet, DOCA software, and AI Enterprise software. The first rack-scale implementation built on STX is the Nvidia CMX context memory storage platform.
Storage and infrastructure vendors co-designing systems based on STX include DDN, Dell Technologies, HPE, IBM, NetApp, and VAST Data, alongside manufacturing partners AIC, Supermicro, and Quanta Cloud Technology. Meanwhile, eight cloud and AI providers — including CoreWeave, Lambda, Mistral AI, and Oracle Cloud Infrastructure — committed to early adoption for context memory storage. STX-based platforms are expected from partners in the second half of 2026.
"Agentic AI is redefining what software can do — and the computing infrastructure behind it must be reinvented to keep pace," Jensen Huang, founder and CEO of Nvidia, said at GTC. "AI systems that reason across massive context and continuously learn require a new class of storage."
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.