AI guzzled millions of books without permission. Authors are fighting back.
人工智能未經著作權人同意豪取數百萬書籍。著作人現在提出反擊。

David Baldacci, the author of best-selling legal thrillers, watched his son ask ChatGPT to craft a plot in the style of a David Baldacci novel. Within five seconds, he told U.S. senators at a hearing this week on artificial intelligence and copyright, the chatbot spat out a pastiche of characters, settings and plot twists that were uncannily familiar.
最暢銷法律驚悚小說系列legal thrillers的作者David Baldacci看著其兒子輸入提示詞詢問ChatGPT，ChatGPT微調後以類David Baldacci小說風格情節來回覆。

“It truly felt like someone had backed up a truck to my imagination and stolen everything I’d ever created,” he said.

Baldacci is among a group of authors suing OpenAI and Microsoft over the companies’ use of their work to train the AI software behind tools such as ChatGPT and Copilot without permission or payment — one of more than 40 lawsuits against AI companies advancing through the nation’s courts. He and other authors this week appealed to Congress for help standing up to what they see as an assault by Big Tech on their profession and the soul of literature.
Baldacci是諸多對OpenAI及Microsoft提起訴訟原告之一，最主要的訴求是該人工智能公司未經原告同意利用原告著作以訓練旗下ChapGPT及Copilot等人工智能。他與其他作家於本星期訴求國會起身協助正如其等所見作家們的專業與文學靈魂被這些所謂的科技巨擘欺凌。

They found sympathetic ears at a Senate subcommittee hearing Wednesday, where lawmakers expressed outrage at the technology industry’s practices. Their cause gained further momentum Thursday when a federal judge granted class-action status to another group of authors who allege that the AI firm Anthropic pirated their books.

“I see it as one of the moral issues of our time with respect to technology,” Ralph Eubanks, an author and University of Mississippi professor who is president of the Authors Guild, said in a phone interview. “Sometimes it keeps me up at night.”

Lawsuits have revealed that some AI companies had used legally dubious “torrent” sites to download millions of digitized books without having to pay for them.

Book authors and publishers are among the many groups of creative professionals and copyright holders suing tech companies and pushing for laws to prevent use of published works for AI projects without permission. Artists, musicians, newspapers, photographers and bloggers have also filed claims.

Tech industry leaders claim that the practice is allowed under copyright law as “fair use” and crucial to their attempts to build AI that’s smarter than any human. Some have said that if they aren’t allowed to keep using copyrighted content, the United States will lose ground in its AI race with China.

Eubanks and several other authors were in the audience Wednesday as Baldacci testified at a hearing convened by Sen. Josh Hawley (R-Missouri), chair of the Senate Judiciary subcommittee on crime and counterterrorism.

“Today’s hearing is about the largest intellectual property theft in American history,” Hawley said.

It was the first congressional hearing to focus on the plight of authors and followed recent federal court rulings in cases brought by authors and publishers against Meta and Anthropic.

The two companies have not disputed using online repositories to download pirated books. But the firms argued they were within their rights to use the material internally to create cutting-edge “large language models” such as Meta’s Llama or Anthropic’s Claude.

One key question for the courts is whether those AI tools compete with the books used to make them. The judges in the Meta and Anthropic cases last month broadly accepted the companies’ argument that training their models on copyrighted material could qualify as “fair use.”

That’s an encouraging sign for the AI industry, said James Grimmelmann, a professor of digital and information law at Cornell University, and a blow to creators and publishers hoping companies will be forced to pay to use their works.

But portions of the Anthropic case were allowed to proceed, with U.S. District Judge William Alsup finding the company might have violated copyright law in the process of obtaining the books, even if the training itself was fair use. On Thursday, Alsup also granted the suit class-action status, meaning every author whose books were part of the allegedly pirated dataset could be eligible to collect damages from the company if it is found guilty.

Anthropic spokeswoman Jennifer Martinez said at the time that the company appreciated the court’s ruling on fair use. She added that the company trained its models on works not to replicate them but to “turn a hard corner and create something different.” Martinez said Friday that Anthropic disagrees with the decision to grant the rest of the lawsuit class-action status and is “exploring all avenues for review.”

In the Meta case, U.S. District Judge Vince Chhabria dismissed most of the authors’ arguments, ruling that they had failed to show they were harmed by Meta’s use of their works.

Meta spokesman Christopher Sgro said at the time that the company appreciated the decision. “Fair use of copyright material is a vital legal framework for building this transformative technology,” he said, claiming that AI such as Meta’s powered transformative innovations for individuals and companies.

But in his ruling, Chhabria also laid out what Grimmelmann called a “road map” that plaintiffs could use in future cases to demonstrate such harms. He suggested that authors and other creators could claim that AI tools and chatbots will undermine sales of their original work by flooding the market with cheap imitations.

That argument has not been tested in copyright cases, Grimmelmann said. He predicts it will likely take years — and the Supreme Court — to settle how copyright law applies to AI.

That’s one reason some activists are hoping lawmakers will step in. At Wednesday’s hearing, Hawley, an outspoken critic of Meta and its CEO Mark Zuckerberg, said he found it galling that a court considered the company’s unauthorized use of books to train its AI models to be fair use.

If a huge and valuable company “can come take an individual author’s work like Mr. Baldacci, lie about it, hide it, profit off of it, and there’s nothing our law does about that,” Hawley said, “we need to change the law.”

Sen. Peter Welch (D-Vermont) touted a bill he co-wrote with Sen. Marsha Blackburn (R-Tennessee) called the Train Act that would allow creators and copyright holders to use the courts to find out whether a company has used their work to create its AI models. Proving that an AI tool was trained on any given work can be difficult because the datasets used are so vast and the training processes opaque.

Baldacci told Welch that in his case, OpenAI and Microsoft had acknowledged using 44 of his books without permission.

“That’s astonishing,” Welch said. “We just can’t allow that. That’s really wrong.” The Washington Post has a content partnership with OpenAI.

Edward Lee, a law professor at Santa Clara University, offered a more industry-friendly perspective at the hearing. He said the judges in the Meta and Anthropic cases rightly recognized that the use of books to train AI models is genuinely “transformative” — a key test of fair use.

Lee cautioned lawmakers against legislating before courts have had their say, adding that the U.S. has a vital interest in the success of its AI industry.

Sen. Dick Durbin (D-Illinois) said he wanted to find a balance between promoting innovation and protecting and encouraging artists and creatives. “How can creators compete with AI companies that generate content at the push of a button, particularly when the content might mimic or even reproduce their own work?” he asked.

Speaking after the hearing, Eubanks said his experience teaching college students who use tools such as ChatGPT makes him worry that AI will erode not only the market for books but the craft of writing. He increasingly sees signs of students using AI tools to help with essay writing, undermining his intention for the assignments to stimulate people to develop their own opinions.

While lawmakers from both parties lent sympathetic ears on Wednesday, it seems unlikely new laws will come soon. Many committee members missed the hearing because of other drama on Capitol Hill that day, suggesting AI and copyright might not be at the top of Congress’s agenda.

▲美國小說作者反擊人工智能

蘇思鴻律師

蘇思鴻律師

▲美國小說作者反擊人工智能

蘇思鴻 律師

蘇思鴻 律師

你可能感興趣的

▲蘇狀師談墮胎判決

辯「不舉」身有疾病 少龍羈押提抗告（葉鞠萱律師評析）

蘇思鴻律師

蘇思鴻律師

辯「不舉」身有疾病少龍羈押提抗告（葉鞠萱律師評析）