A judge has dismissed the majority of claims in a copyright lawsuit filed by developers against GitHub, Microsoft, and OpenAI.
A judge has dismissed the majority of claims in a copyright lawsuit filed by developers against GitHub, Microsoft, and OpenAI.
The lawsuit was initiated by a group of developers in 2022 and originally made 22 claims against the companies, alleging copyright violations related to the AI-powered GitHub Copilot coding assistant.
Judge Jon Tigar’s ruling, unsealed last week, leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract. This decision marks a substantial setback for the developers who argued that GitHub Copilot, which uses OpenAI’s technology and is owned by Microsoft, unlawfully trained on their work.
...
Despite this significant ruling, the legal battle is not over. The remaining claims regarding breach of contract and open-source license violations are likely to continue through litigation.
The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”
"Rarely" is not zero. This looks like it's opening a loophole to copying open source code with strong copyleft licenses like the GPL:
Find OSS code you want to copy
Set up conditions for Copilot to reproduce code
Copy code into your commercial product
When sued, just claim Copilot generated the code
Depending on how good your lawyers are, 2 is optional. And bingo! All the OSS code you want without those pesky restrictive licenses.
In fact, I wonder if there's a way to automate step 2. Some way to analyze an OSS GitHub repo to generate inputs for Copilot that will then regurgitate that same repo.
It doesn't work like that. A copy is a copy. Only if you can make it credible that you independently produced the same code, can you get through with that. Hence, clean room implementations. It's not strictly necessary but deters lawsuits.
Apparently there's some confusion here what the judge ruled. This particular part is about claims under the DMCA, not copyright infringement. The relevant sections can be seen here: https://www.copyright.gov/title17/92chap12.html
[edit: link fixed. The claim was that "copyright management information" was removed; prohibited under these sections.]
Immediately lose the case because nobody is claiming that when copilot does emit copyrighted code verbatim it is magically stripped of copyright protections.
This is an aspect of the German court system that is LEAGUES more sensible than the US - they have certified subject matter experts in a ton of domains that work with courts to help meaningfully inform judicial decisions. The system isn’t perfect (no system is), but it’s a damn sight better than what the US generally does. I'm categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.
I am usually not wont to defend the dysfunction presently found in the USA federal (and state-level) judiciary, but I think this comparison to the German courts requires a bit more context. Generally speaking, the USA federal courts and US States adopt the adversarial system, originally following the English practice in both common law and equity. This means the judge takes on a referee role, and a plaintiff and a defendant will make their best, most convincing arguments.
I should clarify that "common law" in this context refers to the criminal matters (akin to public law), and "equity" refers to person-versus-person disputes (akin to private law), such as contracts.
For the adversarial system to work, the plaintiff and defendant need to be sufficiently motivated (and nowadays, well-monied) to put on good arguments, or else they're just wasting the court's time. Hence, there is a requirement (known as "standing") where -- grossly oversimplifying -- the plaintiff must be the person with the most to gain, and the defendant must be the person with the most to lose. They are interested parties who will argue vigorously.
Of course, that's legal fiction, because oftentimes, a defendant might be unable to able to afford excellent legal counsel. Or plaintiffs will half-ass or drag out a lawsuit, so that it's more an annoyance to the opposite party.
In an adversarial system, it is each party's responsibility to obtain subject-matter experts and their opinions to present to the court. The judge is just there to listen and evaluate the evidence -- exception: criminal trials leave the evaluation of evidence to the jury.
Why is the USA like this? For the USA federal courts, it's because it's part of our constitution, in the Case or Controversy Clause. One of the key driving forces for drafters of the USA Constitution was to restrict the powers of government officials and bureaucrats, after seeing the abuses committed during the Colonial Era. The Clause above is meant to constrain the unelected judiciary -- which otherwise has awe-inducing powers such as jailing people, undoing legislation, and assigning wardship or custody of children -- from doing anything unless some controversy actually needed addressing.
With all that history in mind, if the judiciary kept their own in-house subject-matter experts, then that could be viewed as more unelected officials trying to tip the scale in matters of science, medicine, computer science, or any other field. Suddenly, landing a position as the judiciary's go-to expert could have broad reaching impacts, despite no one in the federal judiciary being elected.
In a sense, because of the fear of officials potentially running amok, the USA essentially "privatizes" subject matter experts, to be paid by the plaintiff or defendant, rather than employed by the judiciary. The adversarial system is thus an intentional value judgement, rather than "whoopsie" type of thing that we walked into.
Small note: the federal executive (the US President and all the agencies) do keep subject matter experts, for the limited purpose of implementing regulations (aka secondary legislation). But at least they all report indirectly to the US President, who is term-limited and only stays 4 years at a time.
This system isn't perfect, but it's also not totally insane.
I mean I get what you’re saying on a theoretical level, but all of that breaks down once you fill the judiciary with rank incompetents and political hacks.
You should emphasize more that the difference adversarial system vs inquisitorial system exists in criminal law only. In civil/private matters - eg copyright disputes like in this instance - continental Europe handles matters much the same.
Judge William Alsup. Um, now ask me to name another.
Biden or Harris could do the US a favor and name, say, Shayon Ghosh to the federal bench. He's not quite as qualified as Alsup: whilst he's also from Jackson, MS, he strangely chose to go to Carnegie Mellon over Alsup's choice of Mississippi State.
I mean sure you can cherry pick examples that are outstanding justices in that regard. But that’s never going to hold a candle to implementing a systemic norm that essentially says “a judge ruling on a case primarily concerned with <specialized domain here> can tap a pool of certified experts on <specialized domain here> to make the most informed decision possible”. An enhancement to that would be “the pool of experts may also flag decisions made by justices that the a majority of said experts deem inappropriate”.
I’m not saying this hypothetical system would be perfect, or that it wouldn’t need further tweaking and iteration, but specifically including feedback mechanisms like that would probably (hopefully) steer things towards a reasonably decent trajectory.
I’m categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.
Can you name one in Germany? Just asking.
Anyway, at this stage of the trial only legal experts are involved. The judge examines if the legal arguments are sound, assuming the allegations are true. Whether the allegations are actually true will only be determined in the future. That's also when Fair Use comes in. At that point, you need outside experts to advise on the non-legal aspects.
Not a specific one, but I was kind of citing the German judicial system writ large as a model that appeared meaningfully more effective than the model the US uses.
Consistently? Not that I can think of either but there was that one judge in the Oracle v Google Java case that I believe learned enough programming to call BS on oracle's claims.
Ahh all those sweet source available windows OS code that can be used by universities for studying and wtv fed through a pipe like this. Would be fun seeing them defending it then.
Very curious what the final output of this will be... if they can finally just train their models on everything with no repercussions, I wonder what kind of loopholes that will open for say music. "I didn't share your music, I shared a model that happens to output music trained from your input. Yes, it happens to be byte for byte".
Copyleft licenses are explicitly leveraging copyright laws.
So if the output of an “AI” is not subject to copyright, and the input material is also not subject to copyright, I can train a model so it outputs a byte-for-byte copy of, say, a marvel movie, and said copy is copyright free, yes?
I think this community accepts posts from weeks, months, and even years older. I think it also accepts repeat articles. It's just the format of the article makes it seem like it was published today, not months ago. It's an ongoing legal case, and has progressed further from the the Verge's reporting of the order from 06/24/2024.