Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library
I'm actually surprised by the comments in here. This technology is incredibly disruptive to authors, if they are correct that their intellectual property has been misused by these companies to train LLMs, then they absolutely should have the right to prevent that.
You can both be pro AI and advancement, and still respect creators intellectual rights and the right to not have all content stolen by megacorporations and used by them to create profits while decimating entire industries.
The fact that the ai can summarize these works in detail is proof that they were trained using copyrighted material without permission, (which is not fair use) Sarah Silverman is obviously not going to be hurt financially by this, but there are hundreds of thousands of authors who definitely will be affected. They have every right to sue.
if asked by a user prompts chatGPT to summarize a copyrighted book, it will do so.
So will a human. Let's stop extending copyright law. Also, how you know it read the book, and not a summary of it, of which there are loads on the internet?
Now that's interesting. I really have been waiting for something like this. Wonder if the LLM companies now actually have to explain where their models get the detailed information about the book from. Or if they can get away with stating that they have no idea how their own system works
A lot of these comments are missing a large point which is that, if the claim is true, the books are being pirated and then effectively used for a commercial application.
So the authors are losing money through this process and did not give their permission for their work to be used in a commercial way.
The decision of this case will be wildly important for the development of AI.
My pie in the sky hope is that copyright somehow becomes less stringent after all of this.
Don't get me wrong I want protections for creators and support reasonable copyright (life of the author +25 years with the possibility of a 15 year extension) but letting a company lord over an IP for damn near a century isn't ideal for anyone.
Seems very improbable that they scraped a pirate website with forced registration and tight daily download limits (10 books a day max?) to get content that's often mislabeled and not presented in an homogeneous way.
Probably it's just using the excerpt from Amazon (which instead with paid API access is much more easy to access) as a prompt and build on it
I tested by asking ChatGPT 3.5 specific questions about The Bedwetter, and it seems like it was not trained on the full text of the book. I asked it what is the first sentence, and then what is the second paragraph, and it gave plausible but incorrect answers. I asked it for the table of contents, and then if a specific chapter was in the book, and it said "my responses are generated based on pre-existing data and do not have real-time access to specific book content". I asked who wrote the foreward, and who wrote the afterward. It said Patton Oswalt wrote the foreward and that there is no afterward. In reality, Sarah wrote the foreward and God wrote the afterward.