Huawei used its New Year message to highlight progress across its Ascend AI and Kunpeng CPU ecosystems, pointing to the rollout of Atlas 900 supernodes and rapid growth in domestic developer adoption as "a solid foundation for computing." The message arrives as China continues to accelerate efforts to replace Western hardware in critical AI workloads, and as Huawei positions itself as the closest thing the country has to a vertically integrated AI compute vendor.
Huawei’s message offers a snapshot of a strategy that has been unfolding for several years, shaped by U.S. export controls, constrained access to leading-edge manufacturing, and a domestic market increasingly mandated to adopt local silicon. Under those conditions, Huawei’s Ascend and Kunpeng platforms have evolved into something distinct from their Western counterparts: less focused on single-chip supremacy and more on building large, tightly coupled systems that compensate for weaker nodes with scale, networking, and software control.
Ascend’s architecture and the limits of the node
At the center of Huawei’s AI effort is Ascend, built around its proprietary Da Vinci architecture. The original Ascend 910, introduced in 2019, was manufactured on TSMC’s 7nm process and delivered roughly 256 TFLOPS of FP16 performance at a quoted 350W. That put it in the same broad class as Nvidia’s Volta-era accelerators, though without the same software ecosystem or interconnect maturity.
Sanctions that came in the years following Ascend’s launch significantly changed the playing field, forcing subsequent Ascend generators onto SMIC’s N+1 and N+2 processes, which are roughly comparable to older 7nm-class nodes without EUV. The Ascend 910C, now the backbone of Huawei’s latest clusters, is a dual-die package with two large chiplets combined into a single accelerator card. On paper, Huawei claims up to 780 TFLOPS of BF16 compute, but die area and power efficiency tell a more complicated story.
Huawei suggests the 910C’s combined silicon footprint is around 60% larger than Nvidia’s H100, with lower performance per square millimeter and per watt. In isolation, that would be a losing proposition, but Huawei has leaned hard on interconnects and clustering. The company uses a proprietary high-speed fabric alongside standard PCIe and RoCE networking to bind hundreds or thousands of Ascend accelerators into a single logical training or inference system.
This approach is evident in Huawei’s claims around Atlas 900 and CloudMatrix systems. Rather than competing card-for-card with Nvidia’s H100 or AMD’s MI300X, Huawei emphasizes aggregate throughput. A CloudMatrix 384 system, linking 384 Ascend 910C accelerators, has been positioned as competitive with Nvidia’s large NVLink-based pods on selected workloads, particularly inference. But there’s a trade-off here in terms of physical scale: where Nvidia can deliver multi-exaflop-class FP4 performance in a handful of racks, Huawei requires an order of magnitude more floor space, power delivery, and cooling.
Inference is where Ascend looks strongest, and reports out of China indicate that 910C delivers roughly 60% of H100-class performance on inference tasks, but training remains more challenging.
Scaling out as a design philosophy
As for the Atlas 900 supernode, highlighted in Huawei’s New Year message, it is probably best viewed as a piece of architectural showmanship rather than a product that’s likely to come to the Chinese market any time soon. It reflects Huawei’s belief that AI compute can be industrialized through standardized clusters built from domestically controlled components, even if each component lags the global leading-edge.
This is where Huawei’s background in telecom networking comes into play, though. The company has decades of experience building carrier-grade systems that prioritize reliability, deterministic performance, and large-scale orchestration. Ascend clusters apply that mindset to AI, with the emphasis on predictable scaling behavior and integration with Huawei’s own AI frameworks rather than leading benchmarks.
That also explains why Huawei describes the supernode technology as a "more readily accessible" technology for forming a "solid AI computing backbone." Huawei is not pitching Ascend as a drop-in replacement for CUDA, but an alternative stack, from silicon to interconnect to compiler, that customers adopt wholesale. That’s something that could be attractive to Chinese cloud providers that are facing up to some pretty harsh procurement and compliance realities in the face of export restrictions and geopolitical uncertainty.
Kunpeng and the supporting CPU layer
Ascend does not stand alone. Huawei’s Kunpeng CPUs provide the general-purpose compute layer for these systems, and they follow a similar trajectory. Kunpeng chips are Arm-based, built around Huawei’s Taishan core designs. Earlier generations, such as Kunpeng 920, offered up to 64 Taishan V110 cores and targeted server and cloud workloads with respectable throughput but modest per-core performance.
Meanwhile, recent reporting suggests that the upcoming Kunpeng 930 generation is scaling core counts aggressively, pointing to 120-core designs built from multiple chiplets, while Huawei’s own roadmap references Kunpeng 950 and 960 variants with 192 cores and 384 threads. Per-core performance appears to be roughly in the Zen 3 class, which places Kunpeng behind current Xeon and EPYC parts but potentially competitive in highly parallel, throughput-oriented workloads.
That’s probably good enough for Huawei. Kunpeng’s role is to feed data to accelerators, manage I/O, and run infrastructure software in an environment where power and rack space are already dominated by Ascend clusters. Tight integration matters more than single-thread speed, and Arm gives Huawei architectural independence from x86 licensing and export risk.
Taken together, Ascend and Kunpeng show us how China’s AI hardware strategy has shifted from chasing individual best-in-class chips to assembling viable end-to-end platforms under constraint. Chinese government guidance discouraging new purchases of Nvidia hardware, combined with domestic subsidies and procurement rules, creates a large guaranteed market for "good enough" alternatives.
But "good enough" comes with obvious tradeoffs: Huawei’s clusters consume more power, occupy more space, and rely on heavy overprovisioning to match the throughput of more advanced Western systems. But when push comes to shove, those costs are evidently acceptable in a market where sovereignty and long-term continuity outweigh efficiency.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

3 hours ago
13








English (US) ·