The AI revolution that shows no signs of stopping appears at times to have echoes of the gold rush. Whisper networks spread quickly through communities about new scarce commodities, and suddenly there’s a surge of interest as people snap up resources. For most of the ChatGPT era, you’ve struggled to get hold of a GPU for neither love nor money, with Nvidia practically able to manage its own waitlist, so great is the demand.
Much of the media’s attention – and plenty of investment – has been focused on the dash to grab as many GPUs as possible; most recently, memory has become a focal point.
But in recent weeks and months, there’s been a focus on ensuring that people have CPUs to match. For decades, the CPU has been the anonymous workhorse of the hardware stack, running operating systems, scheduling workloads, and keeping everything ticking over, rarely grabbing headlines unless there’s a supply crunch or a generational leap in performance.
Suddenly, it’s being talked about in the same breath as scarce-as-gold GPUs. What’s going on?
“AI deployment at scale has forced organizations to look at the infrastructure underneath the hype,” said Jason Beckett, chief technology officer in Europe, the Middle East and Africa at Hitachi Vantara, in comments to Tom’s Hardware Premium. As Beckett points out, while most of the attention is focused on GPUs because they run the AI models, the CPUs are vital because they handle “everything else”.
And as agentic AI becomes the norm, there’s a greater need for that CPU backbone to keep things running properly. “Always-on, multi-step reasoning systems don't create brief orchestration bursts around GPU workloads,” said Beckett. “They demand high-core-count CPUs running at sustained loads, continuously. The infrastructure requirement was always structural. It's just now unavoidable.”
Readjusting ratios
When data centers were previously being specced to deliver AI training and inference in the early days of the generative AI revolution, those building them accounted for a gargantuan bias in favor of GPUs. Chatbot conversations required between four and eight GPUs to every single CPU required, because the parallel equations required to meet user requests were GPU-inference heavy.
But as the main use case of AI changes from chatbots to agents, the requirements have also altered. A slight delay for in-depth inference while an AI model ‘thinks’ was seen as an acceptable interface choice. But as agentic AI requires rapid responses and the smooth coordination of tool calls and much more, latency can be a killer. Bolstering CPU counts can help avoid any problems that can quickly spin out into something more significant, breaking the entire agentic stack.
AMD, one of the major manufacturers of CPUs, has seen that shift first-hand. The company had previously forecast that the CPU market would grow at a rate of around 18% annually, but says that the change in requirements has materially changed the market. The rate of growth has now doubled to 35% a year, AMD claims, and will become a $120 billion market by the end of the decade.
“What AMD and Arm's results are telling us is that this is a structural, not cyclical requirement,” said Roger Cummings, CEO of PEAK:AIO, in an interview with Tom’s Hardware Premium. “In actuality, two structural shifts are driving the demand surge: the rise of agentic AI and the need for deterministic, predictable performance at rack scale.”
Much of that CPU demand is being driven by hyperscalers, who recognize the integral role that CPUs play in developing the AI clusters that are likely to power the economy in the years to come. “As GPU clusters scale, CPUs are taking on larger roles in orchestration, memory management, networking, storage coordination, and inference handling,” said Jeff Moore, vice president of strategic partnerships at Aegis Cooling, which specializes in next-gen liquid cooling solutions for AI and high-performance computing infrastructure, in an interview with Tom’s Hardware Premium.
There’s a rise in CPU-to-GPU ratios inside AI deployments, said Moore, “particularly because distributed AI workloads generate significant demand for general-purpose compute, memory bandwidth, and east-west data movement.” A recent TrendForce analysis points out that CPUs’ contribution to latency – accounting for nearly 91% of all the delay in responses – is something that AI deployments are trying desperately to counteract.
Changing designs
That shift is now visible not just in financial forecasts, but in the physical design of AI infrastructure itself. In early generative AI deployments, racks were often built around dense GPU configurations, with CPUs effectively treated as supporting components – enough to keep the system running, but not a bottleneck concern. Things are shifting now. “In the media, an AI rack is pictured as a giant box of GPUs,” said Hommer Zhao, founder of OurPCB, a PCB manufacturer with more than 15 years’ experience, in comments to Tom’s Hardware Premium. “But from a hardware design perspective, a GPU is just a very fast, very dumb engine. It cannot talk to the internet or pull data from a hard drive.”
Rather than a single host CPU loosely paired with multiple GPUs, hyperscalers are deploying configurations with higher core-count CPUs, more memory channels, and, in some cases, multiple CPUs per node to keep pace with data movement demands.
There are also thermal and power considerations shaping how racks are populated. High-core-count CPUs, especially those optimized for cloud workloads, are being selected not just for raw performance but for efficiency under sustained load. In liquid-cooled environments, CPUs are increasingly part of the same thermal design envelope as GPUs, rather than an afterthought cooled separately with air.
Financial signs of success
Recent results from AMD and Arm reinforce the idea that this is not a short-term correction but a deeper architectural shift. AMD has reported strong growth in its data center CPU segment, driven in large part by hyperscaler demand for its EPYC processors, which offer high core counts and memory bandwidth well suited to AI orchestration tasks.
Arm, meanwhile, is benefiting from hyperscalers designing their own custom silicon. “Arm accounts for close to half of all compute shipped to top hyperscalers in 2025, with over a billion Neoverse cores deployed,” said Beckett. “Those are rack-level architectural decisions made years ago.” AWS’s Graviton, Google’s Axion, and Microsoft’s Cobalt chips all reflect a move toward CPU architectures tailored for specific workloads: high-throughput, energy-efficient, and tightly integrated with networking and storage. Arm’s licensing model positions it at the center of this trend, and its recent financial results highlight how significant that hyperscaler-driven demand has become.
Both sets of results point to a change in how CPUs are being valued. In traditional enterprise contexts, the hardware was often general-purpose and interchangeable. In hyperscaler environments, it’s becoming a specialized infrastructure component, tuned for specific roles within AI systems, whether orchestration, inference at the edge, or data preprocessing.
Taken together, the changes in rack design and vendor performance suggest that CPUs aren’t a secondary consideration in AI infrastructure planning any more. Instead, they are becoming a critical factor in determining overall system efficiency and cost.
“The spotlight hasn't revealed something new,” said Beckett. “It's just finally illuminating what serious infrastructure teams never stopped building on.”

3 hours ago
8





English (US) ·