Qualcomm unveils AI200 and AI250 AI inference accelerators — Hexagon takes on AMD and Nvidia in the booming data center realm
5 hours ago
4
(Image credit: Qualcomm)
Qualcomm on Monday formally announced two upcoming AI inference accelerators — the AI200 and AI250 — that will hit the market in 2026 and 2027. The new accelerators are said to compete against rack-scale solutions from AMD and Nvidia with improved efficiency and lower operational costs when running large-scale generative AI workloads. The announcement also reaffirms Qualcomm's plan to release updated products on a yearly cadence.
Both Qualcomm AI200 and AI250 accelerators are based on Qualcomm Hexagon neural processing units (NPUs) customized for data center AI workloads. The company has been gradually improving its Hexagon NPUs in the recent years, so the latest versions of these processors already feature scalar, vector, and tensor accelerators (in a 12+8+1 configuration), support such data formats as INT2, INT4, INT8, INT16, FP8, FP16, micro-tile inferencing to reduce memory traffic, 64-bit memory addressing, virtualization, and Gen AI model encryption for extra security. Scaling Hexagon for data center workloads is a natural choice for Qualcomm, though it remains to be seen what performance targets the company will set for its AI200 and AI250 units.
Qualcomm's AI200 rack-scale solutions will be the company's first data-center-grade inference system powered by AI200 accelerators with 768 GB of LPDDR memory onboard (which is a lot of memory for an inference accelerator) that will use PCIe interconnects for scale-up and Ethernet for scale-out scalability. The system will use direct liquid cooling and a power envelope of 160 kW per rack, which is also an unprecedented power consumption for inference solutions. In addition, the system will support confidential computing for enterprise deployments. The solution will be available in 2026.
The AI250, launching a year later, keeps this structure but adds a near-memory compute architecture to boost effective memory bandwidth by over 10 times. In addition, the system will support disaggregated inference capability that enables compute and memory resources to be dynamically shared across cards. Qualcomm positions it as a more efficient, high-bandwidth solution optimized for large transformer models, while preserving the same thermal, cooling, security, and scalability characteristics as the AI200.
"With Qualcomm AI200 and AI250, we’re redefining what’s possible for rack-scale AI inference," said Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center, Qualcomm Technologies. "These innovative new AI infrastructure solutions empower customers to deploy generative AI at unprecedented TCO, while maintaining the flexibility and security modern data centers demand."
In addition to building hardware platforms, Qualcomm is also building a hyperscaler-grade, end-to-end software platform optimized for large-scale inference. The platform is set to support major ML and generative AI toolsets — including PyTorch, ONNX, vLLM, LangChain, and CrewAI to while enabling seamless model deployment. The software stack will support disaggregated serving, confidential computing, and one-click onboarding of pre-trained models to simplify deployment.
"Our rich software stack and open ecosystem support make it easier than ever for developers and enterprises to integrate, manage, and scale already trained AI models on our optimized AI inference solutions," said Malladi. "With seamless compatibility for leading AI frameworks and one-click model deployment, Qualcomm AI200 and AI250 are designed for frictionless adoption and rapid innovation."
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
One crucial aspect about its AI200 and AI250 rack-scale solutions for inference that Qualcomm did not disclose is which processors these machines will run. The company formally began development of its own data center-grade CPUs earlier this year. While some CPU microarchitecture groundwork has probably been done by the Nuvia team before that, it is still going to take about a year to define and develop logical design, then at least six months to implement the design and tape it out, then months to bring the chip up and sample it. In short, it is reasonable to expect Qualcomm's in-house CPUs to emerge in late 2027, but rather in 2028. That said, at least the AI200 is poised to use an off-the-shelf Arm or x86 CPU, so the question is, which one?
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.