Huawei braces for $12 billion in AI chip revenue driven by homegrown AI model demand — Chinese fabs can barely keep up as Nvidia's market share craters within the region

1 week ago 33

Huawei expects revenue from its AI processors to reach roughly $12 billion in 2026, up from $7.5 billion last year. The projection, based on orders already received from major Chinese technology firms including Alibaba, ByteDance, and Tencent, would represent at least 60% year-over-year growth and position Huawei as the dominant supplier in a domestic AI chip market that Morgan Stanley estimates could reach $67 billion by 2030. The surge has coincided with Nvidia CEO Jensen Huang confirming that Nvidia's share of the Chinese AI accelerator market has collapsed to zero percent.

These numbers describe a market that has bifurcated with unusual speed. Just 18 months ago, Nvidia supplied the vast majority of AI training and inference silicon used by Chinese cloud providers. Today, Huawei's Ascend 950PR is the primary procurement target for China's largest tech companies, and a training-focused successor named the 950DT is scheduled for Q4 this year.

The impact of DeepSeek V4

This raging demand can be largely attributed to the release of DeepSeek’s V4 LLM in April, which has been optimized specifically for Huawei's Ascend architecture and its CANN software framework rather than for Nvidia's CUDA ecosystem. Huawei engineers, per reporting from South China Morning Post, are said to have collaborated directly with DeepSeek ahead of the model’s launch, and the company confirmed that its full Ascend SuperNode product line was adapted for V4 inference on day one. Alibaba Cloud and Tencent Cloud both deployed V4 services within hours of release.

Article continues below

The 950PR is currently the only Chinese-made AI processor that supports FP8, a compressed numerical format that allows more operations per second and lowers per-query costs. V4 uses a Mixture-of-Experts architecture with up to 1 trillion total parameters but activates only around 37 billion per inference pass. That favors inference-efficient hardware, which plays to the 950PR's strengths over its limitations in raw training throughput.

DeepSeek gave Huawei early optimization access, but didn’t extend the same to Nvidia or AMD. While V4's open weights are released in standard formats compatible with CUDA-based frameworks, DeepSeek's own infrastructure runs on Huawei Ascend silicon. The collaboration has pulled forward procurement timelines across the Chinese cloud industry, and chip prices for the 950PR have reportedly risen by about 20% as a result of the demand.

SMIC capacity and production

Huawei's ability to fill those orders depends on SMIC, China's leading foundry. SMIC manufactures the 950PR on its N+3 process, a 7nm-class node built without EUV lithography. Huawei is said to be targeting production of roughly 750,000 950PR units this year, with full-scale shipments expected in the second half following samples that were shipped to customers in January, but that figure is expected to fall short of demand.

Meanwhile, SMIC has been working on expanding its advanced-node capacity for more than a year. The goal is a five-fold increase over a period of two years that’ll lift 7nm and 5nm production to 100,000 wafers per month and half a million by 2030. In addition, the combined capacity for 22nm and below could rise from 30,000-50,000 wafer starts per month in 2025 to 50,000-60,000 or higher this year. Huawei is adding two dedicated fabrication plants, though ownership structures remain unclear. Once fully operational, those facilities could exceed the current output of comparable lines at SMIC.

Yields remain a thorn in China’s side, with SMIC’s 7nm-class process delivering substantially fewer good dies per wafer than TSMC’s equivalent nodes, and the 950PR is likely to be a much larger chip than a TSMC equivalent. SMIC’s cycle time from wafer start to finished and packaged as an Ascend processor is also a problem, currently sitting at around eight months, according to estimates from JP Morgan. For similar nodes at TSMC, it’s around three months.

Then there’s HBM — Huawei announced in September that it had developed its own HBM chips with up to 1.6 TB/s bandwidth, HiBL 1.0, and HiZQ 2.0, in partnership with CXMT, but how quickly CXMT can ramp production of competitive HBM remains an open question.

Nvidia's collapse in China

Huang's admission that “In China, we have now dropped to zero,” came during an interview with the Special Competitive Studies Project's "Memos to the President" podcast. He criticized U.S. export policy as having "already largely backfired," arguing that conceding a market the size of China doesn’t make strategic sense.

The H200, which Nvidia received U.S. licenses to sell to China earlier this year, hasn’t shipped a single unit despite receiving orders. Contradictory regulatory requirements from Washington and Beijing created a stalemate at customs: U.S. regulators require that H200 chips ordered by Chinese customers be used only inside China, while Beijing has instructed domestic technology companies to limit Nvidia hardware to overseas operations.

Nvidia confirmed in its FY2026 10-K filing that it’s "effectively foreclosed from competing in China's data center computing market" and is not assuming any data center compute revenue from the region in its current outlook. Bernstein analysts estimated earlier this year that Nvidia’s share of the China AI GPU market could fall to roughly 8% in the coming years, down from 66% in 2024, both due to U.S. restrictions and because domestic vendors are being pushed to cover up to 80% of demand from domestic sources. TrendForce projected in December that China's high-end AI chip market would grow by more than 60% in 2026, with domestic suppliers capturing about half of the total.

950PR performance

The 950PR performs somewhere in between Nvidia’s H100 and H200, and outperforms the restricted H20 by an estimated factor of 2.8 times, but trails the H200 in both compute and memory bandwidth. That 2.8 figure can’t be verified, however, since Hopper-era hardware doesn’t support FP4 natively.

Huawei compensates by linking large numbers of processors via optical interconnects. Its CloudMatrix 384 system combines twelve racks of Ascend modules into a 384-processor fabric delivering roughly 300 PFLOPS, though at nearly four times the power draw of Nvidia's comparable GB200-based configurations.

The 950PR is primarily an inference chip, though; the training-focused 950DT, expected in Q4, is designed for deep learning workloads and could narrow the gap with Nvidia's Hopper generation for model training tasks. Until it ships, Chinese firms that need to train large foundation models domestically face constraints that inference silicon can’t fully solve.

As for Huawei's CANN software ecosystem, it’s now thought to have more than four million developers, but it remains far smaller than Nvidia's CUDA install base. Whether CANN can attract enough third-party development to become self-sustaining remains to be seen. For now, commercial momentum is running in Huawei's favor inside China, driven by the simple absence of alternatives.

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.

Read Entire Article