Last week, tech companies notched several victories in the fight over their use of copyrighted text to create artificial intelligence products.
Anthropic: A US judge has ruled that Anthropic, maker of the Claude chatbot, use of books to train its artificial intelligence system – without permission of the authors – did not breach copyright law. Judge William Alsup compared the Anthropic model’s use of books to a “reader aspiring to be a writer.”
And the next day, Meta: The US district judge Vince Chhabria, in San Francisco, said in his decision on the Meta case that the authors had not presented enough evidence that the technology company’s AI would cause “market dilution” by flooding the market with work similar to theirs.
The same day that Meta received its favorable ruling, a group of writers sued Microsoft, alleging copyright infringement in the creation of that company’s Megatron text generator. Judging by the rulings in favor of Meta and Anthropic, the authors are facing an uphill battle.
These three cases are skirmishes in the wider legal war over copyrighted media, which rages on. Three weeks ago, Disney and NBCUniversal sued Midjourney, alleging that the company’s namesake AI image generator and forthcoming video generator made illegal use of the studios’ iconic characters like Darth Vader and the Simpson family. The world’s biggest record labels – Sony, Universal and Warner – have sued two companies that make AI-powered music generators, Suno and Udio. On the textual front, the New York Times’ suit against OpenAI and Microsoft is ongoing.
The lawsuits over AI-generated text were filed first, and, as their rulings emerge, the next question in the copyright fight is whether decisions about one type of media will apply to the next.
“The specific media involved in the lawsuit – written works versus images versus videos versus audio – will certainly change the fair-use analysis in each case,” said John Strand, a trademark and copyright attorney with the law firm Wolf Greenfield. “The impact on the market for the copyrighted works is becoming a key factor in the fair-use analysis, and the market for books is different than that for movies.”
To Strand, the cases over images seem more favorable to copyright holders, as the AI models are allegedly producing images identical to the copyrighted ones in the training data.
A bizarre and damning fact was revealed in the Anthropic ruling, too: the company had pirated and stored some 7m books to create a training database for its AI. To remediate its wrongdoing, the company bought physical copies and scanned them, digitizing the text. Now the owner of 7m physical books that no longer held any utility for it, Anthropic destroyed them. The company bought the books, diced them up, scanned the text and threw them away, Ars Technica reports. There are less destructive ways to digitize books, but they are slower. The AI industry is here to move fast and break things.
Anthropic laying waste to millions of books presents a crude literalization of the ravenous consumption of content necessary for AI companies to create their products.