7 Comments
Jan 16Liked by Harold Godsoe

First time I'm hearing of Rothken, but this is the most sober voice on the case that I've heard so far. Thanks for the quick distillation.

> the responsibility for any potential copyright infringement should lie with the user

With VCRs there is very little concern that people will unintentionally infringe copyright. LLMs seem to delineate a different class of problem though. Who knows whether your prompt will happen to wholesale regurgitate some memorized data. This particular case might just narrowly address the issue of adversarial prompting, but in the general case it could have a chilling effect on usage if the infringement responsibility ends up wholly burdened by users.

Expand full comment
author

The question is, I think, whether an "accidental regurgitation" is profiting the AI developer in cases where the AI user is not actively using "agressive prompting" to get copyrighted training material.

In what circumstances would a(n accidentally regurgitated) copy of a NY Times article provide an innocent AI user with value (which the AI user would then attribute to and pay to the AI developer?) Given how LLMs work, I can think of essentially none. This gets more and more true if the AI developers are incentivized to continue finetuning the LLMs to avoid accidental regurgitations.

Expand full comment

Perhaps? I see two semi-related issues:

1. For articles and books, which are high-information, it seems unlikely, but for low-information artworks like logos, slogans, and jingles, it seems generally much more plausible, IMHO.

2. More relevant, however, is prompt priming, where you wholesale copy relevant texts and whatnot into the prompt before the question. The fancy implementation of this is called Retrieval-Augmented Generation and is currently a state-of-the-art alternative to fine-tuning. I can imagine how such legitimate usage could have significant overlap with NYT's adversarial prompting.

Expand full comment
author

I think you're focusing too much on the possibility of (a) AI developer not controlling regurgitation. In addition to regurgitation, the two other elements I mentioned are (b) an AI user prompting innocently (c) AI user profitting.

In the event of (a) regurgitation, I don't see how either 1. or 2. will satisfy both (c) and (d).

Expand full comment

RAG is a legitimate tech getting (innocently) used right now to build consumer products. That ticks all your boxes as far as I can tell.

Expand full comment
author

Without any knowledge of RAG, I stand corrected.

That said, it could just move the ball. Is there a RAG developer The NY Times could sue for copyright infringement? As a "dual-use" technology, could the RAG developer then kick the liability down to their RAG users? (If not, and RAG requires no developers, like IBM, to put it into practice, could we question the innocence of the RAG users?)

Expand full comment

Good point. If there's a mechanism for a company to reasonably shove off liability to the user, then chilling effects probably are the same as what we see with forum usage and whatnot.

Obviously, if Bro News Company's RAG pipeline just boils down to "ignore user prompt and produce NYT verbatim", then the Bros are probably liable. However, if their RAG backend is legit but an NYT curmudgeon adversarially noodles with prompts and gets outraged, then the user is probably liable.

Resource-Automated Generation (RAG) is basically a very clever way of shoving a whole database of documents, data, sql tables, etc. into your prompt all at once.

Expand full comment