Reddit sues Anthropic for scraping its users' content without consent

5 months ago 54

The list of lawsuits against AI companies is growing: Reddit has joined in with a suit against Anthropic.

On Wednesday, the company filed a complaint in California stating that Anthropic -- developer of Claude -- ignores Robots Exclusion Protocol (REP), or robots.txt, which blocks AI crawlers from scraping a site's content. Research indicates that other AI companies are also engaging in this practice: In March, Columbia's Tow Center found that multiple chatbots, including Perplexity, could still retrieve articles from publishers that had blocked their crawlers using REP.

Also: Anthropic's popular Claude Code AI tool now included in its $20/month Pro plan

The complaint states that "Anthropic is in fact intentionally trained on the personal data of Reddit users without ever requesting their consent," which is a violation of Reddit's user privacy agreement. In July 2024, when Reddit publicly criticized Anthropic for misusing its content, the complaint continues, "Anthropic's bots continued to hit Reddit's servers over 100,000 times" despite insisting that it had stopped its bots from crawling the site.

The lawsuit is the latest in the ongoing clash between sites that create and host content -- including publishers, news organizations, and user forums like Reddit -- and the AI companies that scrape that content to use as training data. In late 2023, The New York Times became the first publisher to sue OpenAI and Microsoft for using its content to train its models without permission or payment. In April, Ziff Davis, the parent company of this publication, sued OpenAI for copyright violation, citing similar instances of the AI company crawling Ziff Davis sites despite being blocked. Authors and creatives have also sued OpenAI and Meta on similar grounds.

What sets Reddit apart here is that it is also a tech company, unlike the publishers behind the lawsuits that predate this one. Reddit has licensing agreements with OpenAI and Google.

Also: Reddit's new Google-powered AI search tool makes finding answers faster than ever

Other publishers, including Dotdash Meredith, Financial Times, and the AP, have taken a different approach, proactively entering into licensing agreements with AI companies that allow them to access some or all of their content in exchange for internal AI tools and preferential citation placements in chatbot responses. However, research shows that chatbots still struggle to accurately cite and favor stories from publishers, meaning it is still unclear whether those benefits are being realized.

Want more stories about AI? Sign up for Innovation, our weekly newsletter.

Editorial standards

Read Entire Article