律師專欄-▲蘇狀師談人工智能著作權爭訟

將有著作權之著作用於訓練人工智能生成模型屬合理使用？
有正反意見；本篇採肯定解。
Among the proliferating AI-related litigation, the New York Times filed a copyright infringement lawsuit against Microsoft and OpenAI. Along with other allegations, the New York Times claims that Microsoft and OpenAI are infringing copyright when they train their large language models (LLMs) on material copyrighted by the Times.
紐約時代報對微軟及Open AI提起侵害著作權之訴，主張伊等公司訓練伊等所有的大型語言模型所利用的素材，係紐約時代報所有之著作，而構成侵害著作權。

OpenAI has responded that “training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents.” In a blog post about the case, OpenAI cites the Library Copyright Alliance (LCA) position that “based on well-established precedent, the ingestion of copyrighted works to create large language models or other AI training databases generally is a fair use.” LCA explained this position in our submission to the US Copyright Office notice of inquiry on copyright and AI, and in the LCA Principles for Copyright and AI.
Open AI回應說：人工智能訓練模型利用網路上公開可得利用的素材是合理使用，此為長久以來廣被接受的前例所支持。此案於見於部落格所布告，Open AI引述著作聯盟圖書庫的立場：基於已建立好之前例，採用有著作權之著作去建構大型語言模型或其他人工智能訓練資料庫通常是合理使用。

LCA is not involved in any of the AI lawsuits. But as champions of fair use, free speech, and freedom of information, libraries have a stake in maintaining the balance of copyright law so that it is not used to block or restrict access to information. We drafted the principles on AI and copyright in response to efforts to amend copyright law to require licensing schemes for generative AI that could stunt the development of this technology, and undermine its utility to researchers, students, creators, and the public. The LCA principles hold that copyright law as applied and interpreted by the Copyright Office and the courts is flexible and robust enough to address issues of copyright and AI without amendment. The LCA principles also make the careful and critical distinction between input to train an LLM, and output—which could potentially be infringing if it is substantially similar to an original expressive work.

On the question of whether ingesting copyrighted works to train LLMs is fair use, LCA points to the history of courts applying the US Copyright Act to AI. For instance, under the precedent established in Authors Guild v. HathiTrust and upheld in Authors Guild v. Google, the US Court of Appeals for the Second Circuit held that mass digitization of a large volume of in-copyright books in order to distill and reveal new information about the books was a fair use. While these cases did not concern generative AI, they did involve machine learning. The courts now hearing the pending challenges to ingestion for training generative AI models are perfectly capable of applying these precedents to the cases before them.

Why are scholars and librarians so invested in protecting the precedent that training AI LLMs on copyright-protected works is a transformative fair use? Rachael G. Samberg, Timothy Vollmer, and Samantha Teremi (of UC Berkeley Library) recently wrote that maintaining the continued treatment of training AI models as fair use is “essential to protecting research,” including non-generative, nonprofit educational research methodologies like text and data mining (TDM). If fair use rights were overridden and licenses restricted researchers to training AI on public domain works, scholars would be limited in the scope of inquiries that can be made using AI tools. Works in the public domain are not representative of the full scope of culture, and training AI on public domain works would omit studies of contemporary history, culture, and society from the scholarly record, as Authors Alliance and LCA described in a recent petition to the US Copyright Office. Hampering researchers’ ability to interrogate modern in-copyright materials through a licensing regime would mean that research is less relevant and useful to the concerns of the day.

As the lawsuits illustrate, the availability of generative AI trained on datasets that include copyrightable material has raised questions about the intersection of copyright law and AI. But as discussed above, many of the questions raised have already been litigated. Nick Garcia, policy counsel at Public Knowledge, pointed out during a recent Chamber of Progress panel on AI, art, and copyright that concerns about web crawling to collect data—a practice that the Times takes issue with in its lawsuit—have been around for decades, and courts have found web crawling to be a fair use.

New York Times v. Microsoft et al. is, of course, just one legal battle through which the courts will interpret copyright law in the US, and it may be years before these cases are settled. Copyright law as it applies to AI will also be informed by the US Copyright Office Study, which will culminate in a report this year. LCA will monitor these lawsuits and pursue opportunities to advance the interests of scholars, educators, students, and the public via selected amicus briefs and discussions of the issues and the range of library concerns with legislators and regulators.

▲蘇狀師談人工智能著作權爭訟

蘇思鴻律師

蘇思鴻律師

▲蘇狀師談人工智能著作權爭訟

蘇思鴻 律師

蘇思鴻 律師

你可能感興趣的

▲大陸地區《生成式人工智能服務管理暫行辦法》

▲蘇狀師談名人代言契約

蘇思鴻律師

蘇思鴻律師