Recently, a bevy of legal action demanding compensation from AI companies has been filed in the U.S. and Europe. The plaintiffs include authors, artists and major media organizations who have consistently expressed concern about AI stealing their work and producing mediocre derivatives.
最近在美國及歐洲,已有諸多公司對人工智能公司起訴請求損害賠償。這些原告包括作家、藝人及媒體組織,其等不斷地表達人工智能竊取其著作並藉此創作出拙劣衍生著作之憂慮。
An open letter from the Authors Guild -- signed by more than 8,500 authors, including Margaret Atwood, Dan Brown and Jodi Picoult -- urges tech companies responsible for generative AI applications, such as ChatGPT and Bard, to cease using their works without proper authorization or compensation. The authors want companies to pay for the data they scraped for training -- the "food" for AI systems, endless meals for which there has been no bill.
一封由超過8500名作家署名,這些作家包括:Margaret Atwood, Dan Brown (達文西密碼原著小說之作者)與Jodi Picoult,並由作家工會出名發出公開信,力促科技公司要對未經授權或付費而使其所有人工智能生成應用程式利用其等著作之行為負責。作家們欲使人工智能公司擷取大量資料用以訓練生成模型的行為付費。對人工智能猶如食物之資料,等同是無須付費之餐點,比吃到飽還慘。
Authors also express concern that generative AI threatens their profession by flooding the market with machine-written content based on their work. This was a problem in recent months as Amazon took action against AI authors spamming the bestseller list with generated works.
作家們亦就生成人工智能威脅到到他們的生計表示擔憂,這些生成人工智能基於作家們之作品創作成由機器撰寫的內容充斥整個市場。這在近幾個月成為問題,像Amazon就對人工智能作家之生成著作襲捲暢銷書排行榜乙事提起訴訟。
Prior to the release of the Authors Guild letter, two North American authors -- Mona Awad and Paul Tremblay -- filed a lawsuit against OpenAI, claiming the organization breached copyright law. The suit argued that OpenAI breached copyright law because ChatGPT generated accurate summaries of the author's works and, therefore, must've trained on the authors' works. They aren't the only ones. Author and comedian Sarah Silverman is also suing OpenAI and Meta for illegally reproducing her memoir, The Bedwetter, without permission. But that argument may not hold up in court because of the way generative AI works.
在作家公會發布公開信之前,兩位北美作家Mona Awad 及 Paul Tremblay ,狀告OpenAI,主張其違反著作權法。理由是ChatGPT對作家們之著作生成精準的摘要內容,並以之去訓練其生成模型。其並非唯一之訟案。作家兼喜劇演員Sarah Silverman亦狀告OpenAI及Meta未經其允許違法重製其傳記The Bedwetter。但該主張在法律站不住腳,因為與生成人工智能著作不相符。
Individual authors and artists aren’t the only plaintiffs. In December 2023, The New York Times became the first major American news publication to sue OpenAI for using copyrighted works in AI development.
非僅像作家及藝人等自然人為原告。2023年12月紐約時代報成為全美第一個主流新聞出版品對OpenAI起訴侵害著作權。
Generative AI is the technology that powers ChatGPT and Bard. Text-based generative AI uses algorithms to predict the likely next words in text and generates that text based on a prompt from the user. ChatGPT knows what to generate because it was trained on a large corpus of publicly available data from the internet. It learned patterns from the training and matches those patterns to prompts from the user.
Generative AIs are usually black box AI systems, meaning nobody -- not even the programmers -- understands the exact steps the machine takes to go from input to output. Input goes in, the magic happens and output comes out.
生成人工智能常是黑箱作業,意指沒有任何人甚至是程式設計師能精確地知悉機器從輸入到輸出之進程。當一輸入,猶如魔術般之輸出結果產生。
All machine learning and generative AI tools use preexisting works of some kind.
People are suing AI companies over copyright. Even though ChatGPT's trained on data from the internet, it does so without permission from the data creators. For example, GPT-3 was trained on Wikipedia and Reddit, among other sources. However, conversations about and segments of copyrighted works could exist in the training material and give large language models enough context to accurately summarize those copyrighted works.
On a larger scale, people are suing because AI is a black box, and it's impossible to know how it works on a granular level. The fear is that people will use AI to avoid taking responsibility for their decisions or the things it produces.
"If AI companies are allowed to market AI systems that are essentially black boxes, they could become the ultimate ends-justify-the-means devices," Matthew Butterick, one of the lawyers behind several of the lawsuits, wrote in his blog. "Before too long, we will not delegate decisions to AI systems because they perform better. Rather, we will delegate decisions to AI systems because they can get away with everything that we can't."
Numerous cases have been brought against generative AI companies regarding copyright and misuse. Here are some of the companies being sued.
A class-action suit was filed against these companies involving GitHub's Copilot tool. The tool predictively generates code based on what the programmer has already written. The plaintiffs allege that Copilot copies and republishes code from GitHub without abiding by the requirements of GitHub's open source license, such as failing to provide attribution. The complaint also includes claims related to GitHub's mishandling of personal data and information, as well as claims of fraud. The complaint was filed in November 2022. Microsoft and GitHub have repeatedly tried to get the case dismissed.
A complaint against these AI image generator providers was filed in January 2023. The plaintiffs alleged the systems directly infringe on plaintiffs' copyrights by training on works created by the plaintiffs and creating unauthorized derivative works. The complaint also takes issue with the fact that the tools can be used to generate work in the style of artists. The judge on the case, William Orrick, said he was inclined to dismiss the lawsuit.
In January 2023, Getty Images issued a complaint against Stability AI for allegedly copying and processing millions of images and associated infringing on authors' copyrights. Butterick is one of the attorneys representing the authors. The complaint estimated that more than 300,000 books were copied in OpenAI's training data. The suit seeks an unspecified amount of money. The case was filed in June 2023.
The New York Times is suing OpenAI for copyright infringement. The case, filed December 2023, alleges that millions of New York Times articles were used to train and develop OpenAI’s chatbot and other technology, which now competes with the news organization as a source of reliable information. The case also alleges that OpenAI’s language models mimic the Time’s style and recites its content verbatim. The Times is the first major American news outlet to sue OpenAI and Microsoft for copyright infringement. The Times approached the companies earlier in the year to discuss the copyright issue but never reached an agreement.
Eight other newspapers filed a lawsuit against OpenAI and Microsoft on April 30, 2024, alleging they've purloined millions of copyrighted news articles to train their AI. Newspapers included in the suit are The New York Daily News, Chicago Tribune, Denver Post, Mercury News, Orange County Register, St. Paul Pioneer-Press, Orlando Sentinel and South Florida Sun Sentinel.
Sarah Silverman's lawsuit against Meta and OpenAI alleged copyright infringement and said ChatGPT and Large Language Model Meta AI (Llama) were trained on illegally acquired data sets with her work contained. The suit alleges the books were acquired from shadow libraries, such as Library Genesis, Z-Library and Bibliotek, where the books can be torrented. Torrenting is a common method of downloading files without proper legal permission. Specifically, Meta's language model, Llama, was trained on a data set called the Pile, which uses data from Bibliotek, according to a paper from EleutherAI, the company that assembled the Pile. The suit was filed in July 2023.
A class-action lawsuit is being brought against Google for alleged misuse of personal information and copyright infringement. Some of the data specified in the lawsuit includes photos from dating websites, Spotify playlists, TikTok videos and books used to train Bard. The lawsuit, filed in July 2023, said Google could owe at least $5 billion. The plaintiffs have elected to remain anonymous.
These copyright cases against big tech companies aren't the first of their kind. In 2015, the Author's Guild sued Google for making digital copies of millions of books and providing snippets of them to the public. The court ultimately favored Google, saying the works were transformative and did not provide a market substitute for the books.
Sony Music Entertainment, Universal Music Group and Warner Records filed lawsuits against AI song-generator start-ups Suno and Udio in June 2024 for alleged copyright infringement. One lawsuit describes how Suno-generated songs sound very similar to Chuck Berry’s “Johnny B. Goode,” using prompts such as “1950s rock and roll,” “12-bar blues” and “energetic male vocalist.” The Udio lawsuit alleges something similar, saying many outputs sounded like Mariah Carey’s “All I Want for Christmas is You.” The record labels are seeking up to $150,000 for each work that was copied without permission.
The above lawsuits will be important in answering the following questions:上述的訟案將回應以下幾個重要問題?
As the cases continue to take shape and answers emerge, companies involved with generative AI tools should watch for guidance around the intersection of AI and intellectual property and check to see if they need risk mitigation strategies.