The 27 Dec 2023 NYT v OpenAI et al lawsuit alleges that the liability for an AI’s output of copyrighted training material falls on the AI developers or AI providers. Notably the most damning Exhibits attached to the complaint omitted most of the user prompts that The NY Times used to generate the allegedly infringing output – infringing output which appears at best to be extremely difficult to innocently recreate.
Ira P. Rothken is a renowned legal advisor and litigator, experienced in arguing for novel doctrines in advanced technology-related areas of law where precedent is hard to identify. Rothken wants to shift the liability from AI developers / providers to AI users, with a parallel to the 1984 Sony Doctrine. With some understanding of the technology underlying LLMs, this feels to me like a far better match with the reality. Here’s the essence of Rothken’s article:
Input: Fair Use and Training of LLMs
On the potential for copyright infringement via AI training, Rothken emphasizes that the training of LLMs aligns squarely with the principles of fair use. It involves a transformative process that extends beyond mere replication of copyrighted texts, focusing instead on harnessing non-copyrightable elements of language to create new and valuable content.
Output: Dual-Use Nature of AI Technologies
On the potential for copyright infringement via AI training, Rothken draws parallels with the Sony Doctrine, framing AI LLMs as “dual-use” technologies capable of both infringing and non-infringing uses. The 1984 U.S. Supreme Court case "Sony Corp. of America v. Universal City Studios, Inc." found that, while the Betamax VCR could be used for purposes that included copyright infringement, it was also used for significant non-infringing purposes, such as recording programs to watch at a later time, which the Court deemed fair use.
Proposed TAO Doctrine
The central proposal in Rothken’s article is the “Training And Output” (TAO) Doctrine.
This AI Doctrine recognizes that if a good faith AI LLM engine is trained using copyrighted works, where the (1) original work is not replicated but rather used to develop an understanding, and (2) the outputs generated are based on user prompts, the responsibility for any potential copyright infringement should lie with the user, not the AI system.
Role of User Intent and Responsibility
The TAO Doctrine highlights the role of user intent and responsibility in the use of AI LLMs. Should copyright infringement involving LLM output be dependent on how the user makes use of the AI produced content downstream? On The NY Times complaint, Rothken writes:
Did the team of “users” involved in the NY Times v. OpenAI complaint go too far and game the technology or act with unclean hands when they formulated aggressive prompts? Should the law tolerate such transient and rare gaming in favor of the greater good that LLMs have to offer?
First time I'm hearing of Rothken, but this is the most sober voice on the case that I've heard so far. Thanks for the quick distillation.
> the responsibility for any potential copyright infringement should lie with the user
With VCRs there is very little concern that people will unintentionally infringe copyright. LLMs seem to delineate a different class of problem though. Who knows whether your prompt will happen to wholesale regurgitate some memorized data. This particular case might just narrowly address the issue of adversarial prompting, but in the general case it could have a chilling effect on usage if the infringement responsibility ends up wholly burdened by users.