Demand for data center CPUs has surged, and AI agents are responsible – why the CPU to GPU ratio is more important than ever for hyperscalers

3 weeks ago 66

A close-up view of Nvidia's Vera CPU Compute Tray

The AI revolution that shows no signs of stopping appears at times to have echoes of the gold rush. Whisper networks spread quickly through communities about new scarce commodities, and suddenly there’s a surge of interest as people snap up resources. For most of the ChatGPT era, you’ve struggled to get hold of a GPU for neither love nor money, with Nvidia practically able to manage its own waitlist, so great is the demand.

Much of the media’s attention – and plenty of investment – has been focused on the dash to grab as many GPUs as possible; most recently, memory has become a focal point.

“AI deployment at scale has forced organizations to look at the infrastructure underneath the hype,” said Jason Beckett, chief technology officer in Europe, the Middle East and Africa at Hitachi Vantara, in comments to Tom’s Hardware Premium. As Beckett points out, while most of the attention is focused on GPUs because they run the AI models, the CPUs are vital because they handle “everything else”.

But as the main use case of AI changes from chatbots to agents, the requirements have also altered. A slight delay for in-depth inference while an AI model ‘thinks’ was seen as an acceptable interface choice. But as agentic AI requires rapid responses and the smooth coordination of tool calls and much more, latency can be a killer. Bolstering CPU counts can help avoid any problems that can quickly spin out into something more significant, breaking the entire agentic stack.

Much of that CPU demand is being driven by hyperscalers, who recognize the integral role that CPUs play in developing the AI clusters that are likely to power the economy in the years to come. “As GPU clusters scale, CPUs are taking on larger roles in orchestration, memory management, networking, storage coordination, and inference handling,” said Jeff Moore, vice president of strategic partnerships at Aegis Cooling, which specializes in next-gen liquid cooling solutions for AI and high-performance computing infrastructure, in an interview with Tom’s Hardware Premium.

There’s a rise in CPU-to-GPU ratios inside AI deployments, said Moore, “particularly because distributed AI workloads generate significant demand for general-purpose compute, memory bandwidth, and east-west data movement.” A recent TrendForce analysis points out that CPUs’ contribution to latency – accounting for nearly 91% of all the delay in responses – is something that AI deployments are trying desperately to counteract.

That shift is now visible not just in financial forecasts, but in the physical design of AI infrastructure itself. In early generative AI deployments, racks were often built around dense GPU configurations, with CPUs effectively treated as supporting components – enough to keep the system running, but not a bottleneck concern. Things are shifting now. “In the media, an AI rack is pictured as a giant box of GPUs,” said Hommer Zhao, founder of OurPCB, a PCB manufacturer with more than 15 years’ experience, in comments to Tom’s Hardware Premium. “But from a hardware design perspective, a GPU is just a very fast, very dumb engine. It cannot talk to the internet or pull data from a hard drive.”

Arm, meanwhile, is benefiting from hyperscalers designing their own custom silicon. “Arm accounts for close to half of all compute shipped to top hyperscalers in 2025, with over a billion Neoverse cores deployed,” said Beckett. “Those are rack-level architectural decisions made years ago.” AWS’s Graviton, Google’s Axion, and Microsoft’s Cobalt chips all reflect a move toward CPU architectures tailored for specific workloads: high-throughput, energy-efficient, and tightly integrated with networking and storage. Arm’s licensing model positions it at the center of this trend, and its recent financial results highlight how significant that hyperscaler-driven demand has become.

Chris Stokel-Walker is a Tom's Hardware contributor who focuses on the tech sector and its impact on our daily lives— online and offline. He is the author of How AI Ate the World, published in 2024, as well as TikTok Boom, YouTubers, and The History of the Internet in Byte-Sized Chunks.

Read Entire Article