The lawsuit says ChatGPT “recites Times content verbatim.”
The New York Times is suing OpenAI and Microsoft for copyright infringement, claiming the two companies built their AI models by “copying and using millions” of the publication’s articles and now “directly compete” with its content as a result.
As outlined in the lawsuit, the Times alleges OpenAI and Microsoft’s large language models (LLMs), which power ChatGPT and Copilot, “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style.” This “undermine[s] and damage[s]” the Times’ relationship with readers, the outlet alleges, while also depriving it of “subscription, licensing, advertising, and affiliate revenue.”
The complaint also argues that these AI models “threaten high-quality journalism” by hurting the ability of news outlets to protect and monetize content. “Through Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment,” the lawsuit states.
The trained model includes vast swathes of copyrighted material. It's the rights holders who get to decide whether someone can use it.
Just because it makes it inconvenient or harder for someone to train an AI model does not justify wholesale stealing.
A lot of models are even trained on large numbers of pirated material like books downloaded from pirate sites etc. I guarantee you OpenAI and others didn't even buy a lot of the material they use to train the AI models on.
I guarantee you OpenAI and others didn't even buy a lot of the material they use to train the AI models on.
My hunch is that if they did actually buy or properly license that material, they would have been bankrupt before the first version of ChatGPT came online. And if that's true, then OpenAI owes it's entire existence to it's piracy.
No it doesn't, the training data isn't inside the LLM.
So firstly, even if those claims are true, you sue the wrong business, you would need to sue the training data maker. They however are usually protected by laws for science, because they are "non profit research"
Therefore this is completely ridiculous.
Btw, A the copyright part is only a thing if its a significant portion of the thing... Wich it clearly isn't in this case (its below 1% of it) making it even more ridiculous.
Also, if you can get the information on the internet, you are again suing the wrong place, you should be after the provider, not the automatic data grabbing system... As they can and will argue that they cant control what their algorithm crawler takes. There is a way to mark content as "dont use" for Mashines, but most people don't do that and will lose in court because they don't understand it...
Lastly, the training wouldn't be harder, the problem is the gathering of data. You can't manually look through all of it and its idiotic to think that its reasonable to demand such a thing.
To be fair some of the chat bots are effectively just that. They have "scrapped" their data models and outputing it in a way that seems like you are having a conversation with the "bot".