Skip Navigation

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

www.businessinsider.com 2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

137
137 comments
  • If I read a book to inform myself, put my notes in a database, and then write articles, it is called "research". If I write a computer program to read a book to put the notes in my database, it is called "copyright infringement". Is the problem that there just isn't a meatware component? Or is it that the OpenAI computer isn't going a good enough job of following the "three references" rule to avoid plagiarism?

  • AI fear is going to be the trojan horse for even harsher and stupider 'intellectual property' laws.

  • I think this is exposing a fundamental conceptual flaw in LLMs as they're designed today. They can't seem to simultaneously respect intellectual property / licensing and be useful.

    Their current best use case - that is to say, a use case where copyright isn't an issue - is dedicated instances trained on internal organization data. For example, Copilot Enterprise, which can be configured to use only the enterprise's data, without any public inputs. If you're only using your own data to train it, then copyright doesn't come into play.

    That's been implemented where I work, and the best thing about it is that you get suggestions already tailored to your company's coding style. And its suggestions improve the more you use it.

    But AI for public consumption? Nope. Too problematic. In fact, public AI has been explicitly banned in our environment.

  • I'd love to know the source for the works that were allegedly violated. Presuming OpenAI didn't scour zlib/libgen for the books, where on the net were the cleartext copies of their writings stored?

    Being stored in cleartext publicly on the net does not grant OpenAI the right to misuse their art, but the authors need to go after the entity that leaked their works.

  • There's an additional question: who holds the copyright on the output of an algorithm? I don't think that is copyrightable at all. The bot doesn't really add anything to the output, it's just a fancy search engine. In the US, in particular, the agency in charge of Copyrights has been quite insistent that a copyright can only be given to the output if a human.

    So when an AI incorporates parts of copyrighted works into its output, how can that not be infringement?

  • Can’t reply directly to @OldGreyTroll@kbin.social because of that “language” bug, but:

    The problem is that they then sell the notes in that database for giant piles of cash. Props to you if you’re profiting off your research the way OpenAI can profit off its model.

    But yes, the lack of meat is an issue. If I read that article right, it’s not the one being contested here though. (IANAL and this is the only article I’ve read on this particular suit, so I may be wrong).

  • They definitely should follow through with this, but this is a more broad issue where we need to be able to prevent data scraping in general. Though that is a significantly harder problem.

  • If you're doing research, there are actually some limits on the use of the source material and you're supposed to be citing said sources.

    But yeah, there's plenty of stuff where there needs to be a firm line between what a random human can do versus an automated intelligent system with potential unlimited memory/storage and processing power. A human can see where I am in public. An automated system can record it for permanent record. An integrated AI can tell you detailed information about my daily activities including inferences which - even if legal - is a pretty slippery slope.

  • I was actually thinking about this the other day for some reason. AI scraping my own original stuff and doing whatever with it. I can see the concern and I'm curious where this goes and how a court would rule on a pretty technical topic like this.

  • I have a post consumerism pipe dream that one day we will collectively realize all the stupid shit we waste time and resources on are not worth it and we enter a future like star trek.

    As a species we waste so much simply making sure that those less privileged either by money or means, are not allowed to take from those with either. It's stupid.

    Edit - if we spent half the energy helping out brothers and sisters to succeed as we did to keep them down the world would be a better place. And by help them succeed I don't mean money. Money is the lowest possible threshold.

  • Capitalism hit a massive roadblock with the dawn of the internet, information has a tendency to want to be free and easily accessible, but corporations need to own our productive output to maximize profits. In the age of the internet, our productive output more and more becomes our ideas and thoughts manifest into code or other forms of digital information.

    Capitalists somewhat fought off the first wave of this, but AI will be a second and more challenging wave to overcome. I hope the capitalists fail and we don't restrict the learning and power of AI so corporations can maximize profits again, but I recognize there's a world where they successfully slow down or even entirely hault these learning systems and stop the technology from developing.

    We already see people like Tucker Carlson calling for bans on AI because it'll put people out of work. Of course, we should be trying to reduce the amount of work needed, but the natural tendency of capitalism in this environment is to maximize efficiency in favor of capital owners. Once workers aren't needed anymore, the best thing (from a capitalist perspective) to do is let them starve in the streets instead of "giving them stuff for just existing". We already live in a world where millions of people die from hunger a year, and almost a billion people are dangerously underfed, because global capitalism dictates these people don't deserve enough food.

  • Too be honest, I hope they win. While I my passion is technology, I am not a fan of artificial intelligence at all! Decision-making is best left up to the human being. I can see where AI has its place like in gaming or some other things but to mainstream it and use it to decide who's resume is going to be viewed and/or who will be hired; hell no.

  • The only question I have to content creators of any kind who are worried about AI...do you go after every human who consumed your content when they create anything remotely connected to your work?

    I feel like we have a bias towards humans, that unless you're actively trying to steal someone's idea or concepts we ignore the fact that your content is distilled into some neurons in their brain and a part of what they create from that point forward. Would someone with an eidetic memory be forbidden from consuming your work as they could internally reference your material when creating their own?

  • Can’t reply directly to @OldGreyTroll@kbin.social because of that “language” bug, as well. This is an interesting argument. I would imagine that the AI does not have the ability to follow plagiarism rules. Does it even credit sources? I've seen plenty of complaints from students getting in trouble because anti cheating software flags their original work as plagiarism. More importantly I really believe we need to take a firm stance on what is ethical to feed into chat gpt. Right now it's the wild west.

  • Good, hope they win.

  • I don't really understand why people are so upset by this. Except for people who train networks based on someone's stolen art style, people shouldn't be getting mad at this. OpenAI has practically the entire internet as its source, so GPT is going to have so much information that any specific author barely has an effect on the output. OpenAI isn't stealing peoples art because they are not copying the artwork, they are using it to train models. imagine getting sued for looking at reference artwork before creating artwork.

You've viewed 137 comments.