Meta claimed in a court filing this week that despite torrenting an 82 TB dataset of pirated, copyrighted material from shadow libraries to train its LLaMA AI models, that employees "took precautions not to "seed" any downloaded files".
The act of Seeding in torrenting terminology refers to sharing a file with other users during, (or commonly after) downloading it. Since torrenting is a peer-to-peer system, every user downloading a file can also upload parts of it to other users.
Meta's lawyers claim that there are "no facts to show that Meta seeded Plaintiffs' books". This means that the company's defense is pinning hopes on the fact that there isn't currently any proof that Meta shared the material during the torrenting process.
Though Meta claims that there is no evidence of seeding, Michael Clark, an executive at Meta in charge of project management testified that the configuration settings they were using were modified "so that the smallest amount of seeding possible could occur".
Following this statement, a question regarding why Meta chose to minimize seeding was asked, attorney-client privilege was invoked so that Clark could not answer.
Interestingly, the statement issued by Clark shows that Meta sought methods to minimize seeding, but has yet to offer up indication that it entirely prevented seeding copyrighted material.
Additionally, an internal message from Frank Zhang, a Meta researcher, could point toward alleged concealment of potential seeding from Meta's servers, to avoid "risk of tracing back the seeder/downloader" to Facebook servers.
Meta's defense seems to hinge around the lack of evidence around not sharing the large amount of data they have allegedly downloaded to train its AI models. Should Meta win on this defense and prove that downloading copyrighted content isn't illegal, but distribution is, it could shake up future cases of piracy and unauthorized distribution of copyrighted content.
The defense relying on torrenting terminology could also a way for Meta to aim in tripping up courts. Focusing on seeding could further muddy the claim that Meta allegedly knew that it was violating laws by torrenting copyrighted material.
Meta has yet to respond to claims surrounding on whether it knew that it was sharing data during the download process.
Authors of the copyrighted material alleged to have been obtained by Meta without prior licensing agreements have alleged [PDF] that "Meta's decision to bypass lawful acquisition methods and become a knowing participant in an illegal peer-to-peer piracy network".
With the court battle expected to continue, no final decision around the case has been made. Even following a final decision, it's expected that Meta will attempt to appeal the decision if they were to lose, meaning that final judgements could be a long while away.
But, similar cases do exist. OpenAI was sued by novelists in 2023, with the New York Times also suing OpenAI and Microsoft over "millions" copied news articles. As the long list of LLM-related litigation continues, this is likely not going to be the last we hear from Meta's specific case.