Actually I prefer if individual users pirating being considere fair use, but corporation pirating not be considered fair use. So them pirating is not fine but us pirating should be.
I do wonder how it shakes out. If the case establishes that a license to use the material should be acquired for copyrighted material, then maybe the license I'm setting on comments might bring commercial AI companies in hot water too - which I'd love. Opensource AI models FTW
That license would require the AI model to only output content under the same license. Not sure if you realize, but commercial use is part of the OpenSource definition:
AI is just too much of a hype. Every company invests millions into AI and all new products need to "have AI". And then everybody also needs to file lawsuits. I mean rightly so if Meta just pirated the books, but that's not a problem with AI, but plain old piracy.
I was pretty sure OpenAI or Meta didn't license gigabytes of books correctly for use in their commercial products. Nice that Meta now admitted to it. I hope their " Fair Use" argument works and in the future we can all "train AI" with our "research dataset" of 40GB of ebooks. Maybe I'm even going to buy another harddisk and see if I can train an AI on 6 TB of tv series, all marvel movies and a broad mp3 collection.
Btw, there was no denying anyways. Meta wrote a scientific paper about their LLaMA model in march of last year. And they clearly listed all of their sources, including Books3. Other companies aren't that transparent. And even less so as of today.
That's not the take away you should be having here, it's that a mega Corp felt that they should be allowed to create new content from someone else's work, both without their permission and without paying
ok, fair; but do consider the context that the models are open weight. You can download them and use them for free.
There is a slight catch though which I’m very annoyed at: it’s not actually Apache. It’s this weird license where you can use the model commercially up until you have 700M Monthly users, which then you have to request a custom license from meta. ok, I kinda understand them not wanting companies like bytedance or google using their models just like that, but Mistral has their models on Apache-2.0 open weight so the context should definitely be reconsidered, especially for llama3.
It’s kind of a thing right now- publishers don’t want models trained on their books, „because it breaks copyright“ even though the model doesn’t actually remember copyrighted passages from the book. Many arguments hinge on the publishers being mad that you can prompt the model to repeat a copyrighted passage, which it can do. IMO this is a bullshit reason
anyway, will be an interesting two years as (hopefully) copyright will get turned inside out :)
I'm pretty sure "admits" implies an attempt to hide it. They've explicitly said in the model's initial publication that the training set includes Books3.
We live in a clown world where nonprofit book and knowledge preservationists are constantly under threat by the copyright mafia, but big AI companies can freely and openly steal the same stuff for profit.
What a bunch of losers, thinking they are making the future…… by stealing from as many artists as they can? How do you convince yourself you are doing the right thing when what you are doing is scaling up the theft of art from small artists to a tech company sized operation?
And how much oxygen has been wasted over the years by music companies pushing the narrative that “stealing” from artists with torrenting is wrong? This is so much worse than stealing (and a million times worse than torrenting) though because the point of the theft is to destroy the livelihood of the artist who was stolen from and turn their art into a cheap commodity that can be sold as a service with the artist seeing none of the monetary or cultural reward for their work.
Did you just make a contradictory argument for both sides?
Is your distinction that piracy by individuals gives cultural recognition while that of corporations doesn't?
If you think piracy is warranted, at the cost of artists/creators, how is a generalized AI that makes it available and more accessible as a cultural abstracted good different?
I’m going to imagine it’s because that cultural abstracted good is then put behind a pay wall, which OP will theb also pirate, thus fulfilling the prophesy.
Because I don’t see a strong argument for piracy coming at a direct, immutable cost to artists. I also don’t see a strong argument that piracy reduces the chance fans will pay for art when the art is made decently easy to purchase and is being sold at a reasonable price. Of course there are complexities to this discussion but ultimately when you compare it to massive corporations wholesale stealing massive amounts of works of art with the specific intention of undercutting and destroying the value of said art by attempting to commodify it I think the difference is pretty clear. One of these things is a morally arguable choice by one individual, the other is class warfare by the rich.
Joe shmo torrents an album from a band they like, maybe they buy the album in the future or go to a band concert and buy merch. Joe shmo hasn’t mined some economic gain out of a band and then moved on, Joe shmo has become more of a committed fan because they love the album. Meta steals from a band so that they can create an algorithm that produces knockoff versions of the band’s music that Meta can sell to say a company making a commercial who wants music in that style but would prefer not to pay an actual human artist an actual fair price for the music. These are not the same.
(AI doesn’t create convincing fake songs yet necessarily, but you get my point as it applies to other art that AI can create convincing examples of, books and writing being a prime example)
Meta stealing intellectual property and utilizing it for corporate gain is not the same as normal users pirating content. They are so far apart that it warrants its own discussion and cannot be lumped in together.