Meta's multi-billion-dollar Graviton deal highlights intensifying CPU shortages in AI infrastructure — the industry signals a shift to Agentic inference workloads, pushing demand

3 hours ago 1

Meta signed a multibillion-dollar, multi-year deal with Amazon Web Services last week to deploy tens of millions of Graviton5 CPU cores across AWS data centers, making Meta one of the five largest Graviton customers worldwide. The deal focuses explicitly on CPU-intensive agentic AI workloads, not GPU training, with Amazon CEO Andy Jassy saying in a post accompanying the announcement that agentic AI is “becoming almost as big a CPU story as a GPU story.”

Meta already has GPU and accelerator contracts worth hundreds of billions across Nvidia, AMD, Broadcom, Google, CoreWeave, and Nebius, and it went to AWS specifically for general-purpose CPUs. Santosh Janardhan, Meta's head of infrastructure, said in the joint announcement that "diversifying our compute sources is a strategic imperative," and that Graviton allows the company to "run the CPU-intensive workloads behind agentic AI with the performance and efficiency we need at our scale."

Graviton5, which AWS unveiled at re: Invent in December, packs 192 Arm Neoverse V3 cores on a 3nm process with roughly 180 MB of L3 cache, a fivefold increase over Graviton4. AWS claims a 25% performance lift over its predecessor and 33% lower inter-core latency. AWS vice president Nafea Bshara confirmed that the contract runs for at least three years and that the majority of capacity will be deployed in the U.S.

Article continues below

The CPU-to-GPU ratio

The meteoric rise of agentic AI is driving notable shifts in CPU-to-GPU ratios. While training LLMs relies on large deployments of GPUs, agentic inference is fundamentally different, involving processes like branching control flow, tool invocation, sandbox execution, validation loops, and orchestration across many concurrent sub-agents. All that work falls on CPUs.

In its recent earnings call, Intel’s CFO David Zinsner said that the ratios of CPUs to GPUs in data centers have already moved from 1:8 to 1:4, adding that as workloads continue migrating towards inference and agentic AI, ratios could converge to 1:1 or even tilt further in favor of CPUs. “As you think about the growth rate now going forward, it’s [CPU demand] going to become a significant part of the AI [total addressable market],” Zinsner said.

Arm has also quantified the rising demand for agentic AI in terms of core counts. At the company’s Arm Everywhere event in March, Arm launched its first in-house silicon product, the 136-core AGI CPU, with Meta as lead partner and customer. Arm CEO Rene Haas told the audience that a typical AI data center today requires around 30 million CPU cores per gigawatt of capacity. With agentic workloads, however, that figure rises to roughly 120 million cores per gigawatt, a fourfold increase driven by agents that run continuously, spawn sub-agents, and generate queries at more than 15 times the rate of human chatbot users.

Meanwhile, AMD CEO Lisa Su said at the Morgan Stanley TMT Conference in March that "we're seeing a significant CPU demand, frankly, as a result of the inference demand picking up." She added that "the CPU portion of the business has actually far exceeded my expectations in terms of demand."

Supply constraints and rising lead times

The surge in CPU demand is running into a supply chain that planned for a GPU-dominated world, leading to server CPU lead times stretching to roughly six months, up from about two weeks before the agentic demand spike.

Intel acknowledged on its Q1 earnings call that unmet Xeon demand "starts with a B," referring to billions of dollars in lost revenue, with CEO Lip-Bu Tan saying that “In recent months, we have seen clear signs that the CPU is reinserting itself as the indispensable foundation of the AI era." Revenue would have been higher had Intel been able to produce more chips, the company said: Q1 data center and AI revenue came in at $5.05 billion, up 22% year-over-year.

Server CPU prices have climbed 10% to 20% since March, with analysts expecting a further 8% to 10% increase in the second half of the year. Intel raised prices in both February and March, with a third increase reportedly planned for May, bringing the cumulative hike to roughly 30% above 2025 levels. AMD’s Lisa Su told the Morgan Stanley audience that AMD's own customers described the demand as something that "was perhaps… under-forecasted," adding: "We are in the process of catching up."

The bottleneck extends well beyond CPUs themselves, however, with TrendForce downgrading its full-year server shipment growth forecast from 20% to 13%, per reporting from The Register, because power management ICs and baseboard management controllers needed to assemble complete servers are stretching to 35- to 40-week lead times.

Foundries are prioritizing higher-margin AI-specific chips, squeezing capacity for the mature-node components that general-purpose servers require. Samsung's planned closure of its S7 eight-inch wafer fab in Korea will tighten PMIC supply further. Even with all the GPUs and HBM in the world, you can’t ship a rack without the host CPUs, PMICs, and BMCs.

Compute diversification

In response to this, Meta is seemingly attempting to spread its CPU procurement across every available source. In addition to the Graviton5 deal, Meta co-developed the Arm AGI CPU announced in March and plans to deploy it alongside its Broadcom-built MTIA inference accelerators, and the company has struck a $100 billion deal with AMD that includes EPYC server CPUs and Instinct GPUs. Nvidia also announced that Meta will deploy standalone Grace CPUs in production, with Vera to follow. Intel and Google separately announced a multi-year Xeon collaboration in early April, further demonstrating how x86 supply is being locked up through long-term agreements across the industry.

Nvidia's decision to launch its 88-core Vera CPU as a standalone product, separate from its GPU systems, reflects the same dynamic, with Jensen Huang saying that he expects Vera to become a multibillion-dollar business at GTC in March. This, in addition to Arm breaking 35 years of pure IP-licensing precedent to ship finished silicon, and Intel redirecting wafer capacity to Xeon, shows that all the major players are either manufacturing or securing long-term supply of CPUs for agentic workloads.

In terms of infrastructure spending, CreditSights projects that the top five hyperscalers will spend roughly $750 billion on capex in 2026, up around 67% year-over-year. Amazon alone has guided to $200 billion, and Meta has set a range of $115 to $135 billion. Most of that is naturally destined for AI, with every gigawatt of agentic capacity requiring four times the CPU cores of traditional AI training clusters.

Meta's Graviton deal is a sign, by the company spending more aggressively on AI infrastructure than almost anyone else, that its own supply can’t deliver enough general-purpose compute to keep pace.

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.

Read Entire Article