Qualcomm unveils AI200 and AI250 AI inference accelerators — Hexagon takes on AMD and Nvidia in the booming data center realm

5 hours ago 4
Qualcomm
(Image credit: Qualcomm)

Qualcomm on Monday formally announced two upcoming AI inference accelerators — the AI200 and AI250 — that will hit the market in 2026 and 2027. The new accelerators are said to compete against rack-scale solutions from AMD and Nvidia with improved efficiency and lower operational costs when running large-scale generative AI workloads. The announcement also reaffirms Qualcomm's plan to release updated products on a yearly cadence.

Both Qualcomm AI200 and AI250 accelerators are based on Qualcomm Hexagon neural processing units (NPUs) customized for data center AI workloads. The company has been gradually improving its Hexagon NPUs in the recent years, so the latest versions of these processors already feature scalar, vector, and tensor accelerators (in a 12+8+1 configuration), support such data formats as INT2, INT4, INT8, INT16, FP8, FP16, micro-tile inferencing to reduce memory traffic, 64-bit memory addressing, virtualization, and Gen AI model encryption for extra security. Scaling Hexagon for data center workloads is a natural choice for Qualcomm, though it remains to be seen what performance targets the company will set for its AI200 and AI250 units.

"With Qualcomm AI200 and AI250, we’re redefining what’s possible for rack-scale AI inference," said Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center, Qualcomm Technologies. "These innovative new AI infrastructure solutions empower customers to deploy generative AI at unprecedented TCO, while maintaining the flexibility and security modern data centers demand."

In addition to building hardware platforms, Qualcomm is also building a hyperscaler-grade, end-to-end software platform optimized for large-scale inference. The platform is set to support major ML and generative AI toolsets — including PyTorch, ONNX, vLLM, LangChain, and CrewAI to while enabling seamless model deployment. The software stack will support disaggregated serving, confidential computing, and one-click onboarding of pre-trained models to simplify deployment.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

Read Entire Article