Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training

2 hours ago 6

(Image credit: DeepSeek)

A research group that includes Huawei Technologies says it completed full-parameter post-training of DeepSeek's V4-Pro, a 1.6-trillion-parameter model. The group used a cluster of at least 1,000 Huawei Ascend 910C chips, according to the Shenzhen municipal government, as reported by the South China Morning Post.

The revelation is evidence that Chinese accelerators can now handle a training-class workload on domestic silicon, the part of the AI pipeline Chinese firms have had the most trouble moving off Nvidia hardware under U.S. export controls. Huawei carried out the work with the Shenzhen Loop Area Institute, the Shenzhen campus of Harbin Institute of Technology, and the Shenzhen Research Institute of Big Data.

The Ascend 910C is Huawei's current flagship AI accelerator, a dual-die part that returned roughly 60% of an Nvidia H100's inference performance in earlier DeepSeek testing. Chinese chips have been competitive at inference, where a finished model answers prompts, but weak at training, where a model's weights are recalculated across large datasets. The team says it ran full-parameter post-training, meaning every weight was updated rather than a thin adapter layer added on top.

Post-training is essentially the “tuning” stage that follows the much larger pre-training phase. Pre-training builds a model's core capabilities by working through enormous text corpora, and DeepSeek's documentation puts V4-Pro's pre-training corpus at more than 32 trillion tokens.

Go deeper with TH Premium: AI and data centers

Post-training then shapes behavior through instruction-following, safety alignment, and task-specific data. Completing it on Ascend silicon is a genuine result for the platform, but it doesn’t demonstrate that the chips can pre-train a frontier model from scratch, which is the heavier and costlier job.

Back in August, it was reported that DeepSeek couldn’t complete a single successful training run for its R2 model in Ascend chips, even with Huawei engineers on site, blaming unstable performance, slow chip-to-chip interconnects, and gaps in Huawei's CANN software stack, its substitute for Nvidia's CUDA. The company fell back on Nvidia GPUs for training and left Ascend on inference. DeepSeek-V4-Pro, released in April, was the first DeepSeek model built around Ascend from the outset.

As for the claim coming out of Shenzen, it carries no benchmarks, gives no figure for how long the run took, how it compared to the same job on Nvidia hardware, or how efficiently the 1,000-chip cluster was used. It’s ultimately just another addition to a series of dubious claims that have come from the Chinese state without anything to back them up; DeepSeek itself hasn’t commented.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.

Read Entire Article