In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.
Both filings make a broader case against AI, claiming that by definition, the models are a risk to the Copyright Act because they are trained on huge datasets that contain potentially copyrighted information
They've got a point.
If you ask AI to summarize something, it needs to know what it's summarizing. Reading other summaries might be legal, but then why not just read those summaries first?
If the AI "reads" the work first, then it would have needed to pay for it. And how do you deal with that? Is a chatbot treated like one user? Or does it need to pay for a copy for each human that asks for a summary?
I think if they'd have paid for a single ebbok Library subscription they'd be fine. However the article says they used pirate libraries so it could read anything on the fly.
Pointing an AI at pirated media is going to be hard to defend in court. And a class action full of authors and celebrities isn't going to be a cakewalk. They've got a lot of money to fight, and have lots of contacts for copyright laws. I'm sure all the publishers are pissed too.
Everyone is going after AI money these days, this seems like the rare case where it's justified
I like her and I get why creatives are panicking because of all the AI hype.
However:
In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.
A summary is not a copyright infringement. If there is a case for fair-use it's a summary.
The comic's suit questions if AI models can function without training themselves on protected works.
A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.
I feel like when confronted about a "stolen comedy bit" a lot of these people complaining would also argue that "no work is entirely unique, everyone borrows from what already existed before." But now they're all coming out of the woodwork for a payday or something... It's kinda frustrating especially if they kill any private use too...
AI is a duel sided blade. On one hand, you have an incredible piece of technology that can greatly improve the world. On the other, you have technology that can be easily misused to a disastrous degree.
I think most people can agree that an ideal world with AI is one where it is a tool to supplement innovation/research/creative output. Unfortunately, that is not the mindset of venture capitalists and technology enthusiasts. The tools are already extremely powerful, so these parties see them as replacements to actual humans/workers.
The saddest example has to be graphic designers/digital artists. It’s not some job that “anyone can do.” It’s an entire profession that takes years to master and perfect. AI replacement doesn’t just mean taking away their job, it’s rendering years of experience worthless. The frustrating thing is it’s doing all of this with their works, their art. Even with more regulations on the table, companies like adobe and deviant art are still using shady practices to unknowingly con users into building their AI algorithms (quietly instating automatic OPT-IN and making OPT-OUT options difficult). It’s sort of like forcing a man to dig their own grave.
You can’t blame artists for being mad about the whole situation. If you were in their same position, you would be just as angry and upset. The hard truth is that a large portion of the job market could likely be replaced by AI at some point, so it could happen to you.
These tools need to be TOOLS, not replacements. AI has it’s downfalls and expert knowledge should be used as a supplement to both improve these tools and the final product. There was a great video that covered some of those fundamental issues (such as not actually “knowing” or understanding what a certain object/concept is), but I can’t find it right now. I think the best comes when everyone is cooperating.
She's going to lose the lawsuit. It's an open and shut case.
"Authors Guild, Inc. v. Google, Inc." is the precedent case, in which the US Supreme Court established that transformative digitalization of copyrighted material inside a search engine constitutes as fair use, and text used for training LLMs are even more transformative than book digitalization since it is near impossible to reconstitute the original work barring extreme overtraining.
You will have to understand why styles can't and should not be able to be copyrighted, because that would honestly be a horrifying prospect for art.
If the models were trained on pirated material, the companies here have stupidly opened themselves to legal liability and will likely lose money over this, though I think they're more likely to settle out of court than lose. In terms of AI plagiarism in general, I think that could be alleviated if an AI had a way to cite its sources, i.e. point back to where in its training data it obtained information. If AI cited its sources and did not word for word copy them, then I think it would fall under fair use. If someone then stripped the sources out and paraded the work as their own, then I think that would be plagiarism again, where that user is plagiarizing both the AI and the AI's sources.
On information and belief, the reason ChatGPT can accurately summarize a certain copyrighted book is because that book was copied by OpenAI and ingested by the underlying OpenAI Language Model (either GPT-3.5 or GPT-4) as part of its training data.
While it strikes me as perfectly plausible that the Books2 dataset contains Silverman's book, this quote from the complaint seems obviously false.
First, even if the model never saw a single word of the book's text during training, it could still learn to summarize it from reading other summaries which are publicly available. Such as the book's Wikipedia page.
Second, it's not even clear to me that a model which only saw the text of a book, but not any descriptions or summaries of it, during training would even be particular good at producing a summary.
We can test this by asking for a summary of a book which is available through Project Gutenberg (which the complaint asserts is Books1 and therefore part of ChatGPT's training data) but for which there is little discussion online. If the source of the ability to summarize is having the book itself during training, the model should be equally able to summarize the rare book as it is Silverman's book.
I chose "The Ruby of Kishmoor" at random. It was added to PG in 2003. ChatGPT with GPT-3.5 hallucinates a summary that doesn't even identify the correct main characters. The GPT-4 model refuses to even try, saying it doesn't know anything about the story and it isn't part of its training data.
If ChatGPT's ability to summarize Silverman's book comes from the book itself being part of the training data, why can it not do the same for other books?
As the commentor points out, I could recreate this result using a smaller offline model and an excerpt from the Wikipedia page for the book.
The comic's suit questions if AI models can function without training themselves on protected works.
I doubt a human can compose chat responses without having trained at school on previous language. Copyright favors the rich and powerful, established like Silverman.
VC backed AI makers and billionaire-ran corporations should definitely pay for the data they use to train their models. The common user should definitely check the licences of the data they use as well.
Copyright laws are a recent phenomenon and should have never been a thing imo. The only reason it's there is not to "protect creators," but to make sure upper classes extract as much wealth over the maximum amount of time possible.
Music piracy has showed that it's got too many holes in it to be effective, and now AI is showing us its redundancy as it uses data to give better results.
it stifles creativity to the point it makes us inhuman. Hell, Chinese writers used to praise others if they used a line or two from other writers.
Like the record labels sued every music sharing platform in the early days. Adapt. They're all afraid of new things but in the end nobody can stop it. Think, learn, work with it, not against it.
The plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.
That's an interesting angle. All these lawsuits are good for shaking the dirt around these things. They should be tested in the real world before they become lost in the background of every day life.
We do already have a defense against these programs to stop them from scraping a site. I asked chatgpt once how it gets around captchas on websites, and it told it if there is one then it just doesn't go any further.
If that's actually true or not is another question though.
Personally I find this stupid. If we have robots walking around, are they going to be sued every time they see something that's copywrited?
It's this what will stop progress that could save us from environmental collapse? That a robot could summarize your shitty comedy?
Copywrite is already a disgusting mess, and still nobody cares about models being created specifically to manipulate people en mass. "What if it learned from MY creations" asks every self obsessed egoist in the world.
Doesn't matter how many people this tech could save after another decade of development. Somebody think of the [lucky few artists that had the connections and luck to make a lot of money despite living in our soul crushing machine of a world]
All of the children growing up abused and in pain with no escape don't matter at all. People who are sick or starving or homeless do no matter. Making progress to save the world from immanent environmental disaster doesn't matter. Let Canada burn more and more every year. As long as copywrite is protected, all is well.