Skip Navigation

Technology @beehaw.org Arthur Besse @lemmy.ml 1 yr. ago

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

www.thedailybeast.com Sarah Silverman Sues ChatGPT Creator for Copyright Infringement

“If a user prompts ChatGPT to summarize a copyrighted book, it will do so,” the suit claims.

Sarah Silverman Sues ChatGPT Creator for Copyright Infringement

Piracy @lemmy.ml Arthur Besse @lemmy.ml 1 yr. ago

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

www.thedailybeast.com /sarah-silverman-sues-chatgpt-creator-meta-for-copyright-infringement

ChatGPT @lemdro.id ijeff @lemdro.id 1 yr. ago

Sarah Silverman Sues ChatGPT Creator for Copyright Infringement

www.thedailybeast.com /sarah-silverman-sues-chatgpt-creator-meta-for-copyright-infringement

You're viewing a single thread.

129 comments

I tested by asking ChatGPT 3.5 specific questions about The Bedwetter, and it seems like it was not trained on the full text of the book. I asked it what is the first sentence, and then what is the second paragraph, and it gave plausible but incorrect answers. I asked it for the table of contents, and then if a specific chapter was in the book, and it said "my responses are generated based on pre-existing data and do not have real-time access to specific book content". I asked who wrote the foreward, and who wrote the afterward. It said Patton Oswalt wrote the foreward and that there is no afterward. In reality, Sarah wrote the foreward and God wrote the afterward.

ChatGPT conversation
Table of contents and first chapter from Google Books.
- LLMs compress data, there’s no way ChatGPT could remember every detail of the book alongside all the other information it stores in its encodings. The issue isn’t whether the entire text of the book is contained within the encodings, it’s whether it was trained on the book in the first place.
  
  GPT3 is 800GB while the entirety of the English Wikipedia is around 10GB compressed. So yeah it doesn't store evey detail of everything but LLMs do memorize a lot of things verbatim. Also see https://bair.berkeley.edu/blog/2020/12/20/lmmem/

You've viewed 129 comments.