How DeepSeek's new way to train advanced AI models could disrupt everything - again

4 days ago 15
gettyimages-2198893086
Flavio Coelho/ Moment via Getty

Follow ZDNET: Add us as a preferred source on Google.


ZDNET's key takeaways

  • DeepSeek debuted Manifold-Constrained Hyper-Connections, or mHCs.
  • They offer a way to scale LLMs without incurring huge costs.
  • The company postponed the release of its R2 model in mid-2025.

Just before the start of the new year, the AI world was introduced to a potential game-changing new method for training advanced models.

A team of researchers from Chinese AI firm DeepSeek released a paper on Wednesday outlining what it called Manifold-Constrained Hyper-Connections, or mHC for short, which may provide a pathway for engineers to build and scale large language models without the huge computational costs that are typically required.

Also: Is DeepSeek's new model the latest blow to proprietary AI?

DeepSeek leapt into the cultural spotlight one year ago with its release of R1, a model that rivaled the capabilities of OpenAI's o1 and that was reportedly trained at a fraction of the cost. The release came as a shock to US-based tech developers, because it showed that access to huge reserves of capital and computing resources wasn't necessarily required to train cutting-edge AI models. 

The new mHC paper could turn out to be the technological framework for DeepSeek's forthcoming model, R2, which was expected in the middle of last year but was postponed, reportedly due to China's limited access to advanced AI chips and to concerns from the company's CEO Liang Wenfeng about the model's performance.

The challenge

Posted to the preprint server site arXiv, a popular online resource where researchers can share study results that have yet to be peer-reviewed, DeepSeek's new paper is an attempt to bridge a complex and important technical gap hindering the scalability of AI models.

Also: Mistral's latest open-source release bets on smaller models over large ones - here's why

LLMs are built upon neural networks, which in turn are designed to conserve signals across many layers. The problem is that as more layers get added, the more the signal can become attenuated or degraded, and the greater the risk it turns into noise. It's a bit like playing a game of telephone: the more people get added, the higher the chances that the original message gets confused and altered.

The core challenge, then, is to build models that can conserve their signals across as many layers as possible -- or to "better optimize the trade-off between plasticity and stability," as the DeepSeek researchers describe it in their new paper.

The solution

The authors of the new paper -- which include DeepSeek CEO Liang Wenfeng -- were building upon hyper-connections, or HCs, a framework introduced in 2024 by researchers from ByteDance, which diversifies the number of channels through which layers of a neural network can share information with one another. HCs introduce the risk, however, that the original signal gets lost in translation. (Again, think of more and more people getting added to a game of telephone.) They also come with high memory costs, making them difficult to implement at scale.

Also: DeepSeek may be about to shake up the AI world again - what we know

The mHC architecture aims to solve this by constraining the hyperconnectivity within a model, thereby preserving the informational complexity enabled by HCs while sidestepping the memory issue. This, in turn, could allow for the training of highly complex models in a manner that could be practical and scalable even for smaller, more cash-strapped developers.

Why it matters

Just as with the January 2025 release of R1, the debut of the mHC framework could hint at a new direction for the evolution of AI.

Thus far in the AI race, the prevailing wisdom has mostly been that only the biggest, most deep-pocketed companies can afford to build frontier models. But DeepSeek has continually shown that workarounds are possible, and that breakthroughs can be achieved solely through clever engineering. 

The fact that the company has published its new research into its mHC method means that it could become widely embraced by smaller developers, particularly if it ends up being used by the much-anticipated R2 model (the release date for which has not officially been announced).

Read Entire Article