OpenAI Claims DeepSeek Took All of its Data Without Consent

22 hours ago 10

A black and white geometric knot logo is on the left, and a stylized blue and white whale logo is on the right. Both logos are set against a plain white background.

OpenAI says that the viral Chinese AI app DeepSeek may have siphoned off massive amounts of data to build its models.

The Sam Altman-led company, which itself harvested large amounts of data from copyright holders without permission, says it has “some evidence” that DeepSeek used the output of OpenAI’s models to train its own in a method known as “distillation.”

Bloomberg reports that Microsoft’s security researchers — Microsoft has a 49 percent stake in OpenAI — noticed last fall that actors linked to DeepSeek exfiltrated a large amount of data using OpenAI’s application programming interface (API).

“We know that groups in the People’s Republic of China are actively working to use methods, including what’s known as distillation, to replicate advanced U.S. A.I. models,” OpenAI spokeswoman Liz Bourgeois tells The New York Times.

“We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more. We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here.”

DeepSeek has sent shock waves through the tech world this week after the chatbot surged to the top of the App Store charts. Its makers claim that it was built on a relatively small budget of $6 million. The U.S. bans sales of Nvidia chips to China and it was assumed that because of this, start-ups in China were years behind the U.S. However, the release of DeepSeek R1 challenges those assumptions.

Ironic

The news that DeepSeek may have stolen OpenAI’s IP is music to the ears of some who believe that the company is reaping what it has sown.

“It is, as many have already pointed out, incredibly ironic that OpenAI, a company that has been obtaining large amounts of data from all of humankind largely in an ‘unauthorized manner,’ and, in some cases, in violation of the terms of service of those from whom they have been taking from, is now complaining about the very practices by which it has built its company,” writes 404 Media.

OpenAI: THEY STOLE FROM US
Everyone: pic.twitter.com/1h5GR53Qay

— Sabine Hossenfelder (@skdh) January 30, 2025

OpenAI — which operates ChatGPT and the AI image generator DALL-E — has said that it is impossible to train models without using copyrighted content. The company is embroiled in several copyright lawsuits against it.

Read Entire Article