Skip Navigation

The Xz Backdoor Highlights the Vulnerability of Open Source Software—and Its Strengths

www.404media.co The Xz Backdoor Highlights the Vulnerability of Open Source Software—and Its Strengths

The backdoor highlights the politics, governance, and community management of an ecosystem exploited by massive tech companies and largely run by volunteers.

The Xz Backdoor Highlights the Vulnerability of Open Source Software—and Its Strengths

xz backdoor
The Xz Backdoor Highlights the Vulnerability of Open Source Software—and Its Strengths

Jason Koebler · Mar 30, 2024 at 3:27 PM

The backdoor highlights the politics, governance, and community management of an ecosystem exploited by massive tech companies and largely run by volunteers.

Image: Zulian Firmansyah, Unsplash

Friday afternoon, Andres Freund, a software developer at Microsoft, sent an email to a listserv of open source software developers with the subject line “backdoor in upstream xz/liblzma leading to ssh server compromise.” What Freund had stumbled upon was a malicious backdoor in xz Utils, a compression utility used in many major distributions of Linux, that increasingly seems like it was purposefully put there by a trusted maintainer of the open source project. The “xz backdoor” has quickly become one of the most important and most-discussed vulnerabilities in recent memory.

Ars Technica has a detailed writeup of the technical aspects of the backdoor, which intentionally interfered with SSH encryption, which is a security protocol that allows for secure connections over unsecured networks. The specific technical details are still being debated, but basically, a vulnerability was introduced into a very widely-used utility that chains into a type of encryption that is used by many important internet servers. Luckily, this specific backdoor seems like it was caught before it was introduced into the code of major Linux distributions.

Alex Stamos, the chief trust officer of SentinelOne and a lecturer at Stanford’s Internet Observatory called the discovery of this backdoor “the most interesting hack of the year.”

This is because the mechanism of the attack highlights both the strengths and weaknesses of open source software and the ecosystem under which open source software is developed, and the extent to which the internet and massive tech companies rely on an ecosystem that is largely run by volunteers.

In this case, the vulnerabilities were introduced by a coder who goes by the name Jia Tan (JiaT75 on GitHub) who was a “maintainer” of the xz Utils codebase, meaning they could make commits (update the software’s code) without oversight from others. Critically, Tan has been one of the maintainers of xz Utils for almost two years and also maintains other critical open source projects. This raises the possibility, of course, that they have always been a bad actor and could have been introducing vulnerabilities into earlier versions of xz Utils and other open source projects.

“Given the activity over several weeks, the committer is either directly involved or there was some quite severe compromise of their system,” Freund wrote in his initial email.

The open source community is now doing a mix of collaborative damage control, soul searching, and infighting over the backdoor, how to respond to it, and what it means for the broader open source ecosystem. The xz backdoor was seemingly caught before it made its way into major Linux distributions, which hopefully means that there will not be widespread damage caused by the backdoor. But it is, at best, a close call that Freund himself said was essentially “accidentally” discovered.

This is all important because huge parts of the internet and software infrastructure rely on free and open source software that is often maintained by volunteer software developers. This has always been a controversial and complicated state of affairs, because big tech companies take this software, use it in their products, and make a lot of money from them. Many of these open source codebases are maintained by a small number of people doing it on a volunteer basis, and many of these projects have complicated politics about who is allowed to be a maintainer and how a project should be maintained and developed. If a trusted maintainer of a critical open source codebase is actually a malicious hacker, vulnerabilities could be introduced into widely used, critical software and chaos could ensue.

Stamos noted that the backdoor “proves what everybody suspected about the supply-chain risks of OSS. Should hopefully drive some serious investment by the companies that profit from open-source to look for back doors using scalable means.”

The backdoor highlights open source software’s strengths and its weaknesses in that, well, everything is happening in the open.

While a malicious maintainer can commit code that introduces a backdoor, the community can also actively analyze the code and trace exactly what was introduced, when it was introduced, who did it, and what the code does. The project can (and is) rolling back its codebase to an earlier distribution before the vulnerability was introduced. The coding history and email arguments of that user can be traced over time, and the broader developer community can make educated guesses about how this all happened. As I’m writing this, coders are analyzing Jia Tan’s contributions to other projects and the political discussions in listservs that led to them becoming a trusted maintainer in the first place.

On the open source software security listserv, developers are trying to make sense of what happened, and are debating about how and when the discovery of the vulnerability should have been made public (the discovery was made one day before it was distributed to the broader listserv). Tavis Ormandy, a very famous white hat hacker and security researcher who works for Google, wrote on the listserv, “I would have argued for immediately discussing this in the open.”

“We’re delaying everybody else’s ability to react,” he added. Others argued that making the vulnerability known immediately could have incentivized attackers to exploit the bug, or could have allowed others to do so. On Mastodon, software developers are criticizing Microsoft and GitHub for taking down some of the affected code repositories as people are trying to analyze it.

“Hey, it’s totally cool that Microsoft GitHub blocked access to one of the repositories in the very center of the xz backdoor saga,” Michal Woźniak, a white hat hacker who was part of a team that discovered DRM in a Polish train earlier this year wrote on Mastodon. “It’s not like a bunch of people are scrambling to try to make sense of all the right now, or that specific commits got linked to directly from media and blogposts and the like. Cool, cool.” Other coders mused that Copilot, a subscription AI coding assistant created by GitHub, could have integrated some of the malicious code into its training data.

All of this discussion and many of these issues are not normally possible when a vulnerability is discovered in closed source software, which is kept private by the company and whose governance is determined by the companies releasing a product. And that’s what makes all of this so interesting. Not only is vulnerability mitigation being managed in public, but so is the culture, politics, supply chain, and economics that governs this type of critically important software. About the author

Jason is a cofounder of 404 Media. He was previously the editor-in-chief of Motherboard. He loves the Freedom of Information Act and surfing.
More from Jason Koebler

44
44 comments
  • I'll tell you what it highlights: giant companies like Google, Microsoft and all the others making billions using free software a few dudes maintain for them for free on their own time. Instead of speaking of the vulnerability of open source software, the profiteers should pay them to ensure they have the time and resources to secure their supply chain.

  • “Hey, it’s totally cool that Microsoft GitHub blocked access to one of the repositories in the very center of the xz backdoor saga,” Michal Woźniak, a white hat hacker who was part of a team that discovered DRM in a Polish train earlier this year wrote on Mastodon. “It’s not like a bunch of people are scrambling to try to make sense of all the right now, or that specific commits got linked to directly from media and blogposts and the like. Cool, cool.”

    Security teams that break stuff to mitigate risk and call it fixed is exactly what Linus's Do No Harm plea is about.

    Edit: It's still disabled

    Access to this repository has been disabled by GitHub Staff due to a violation of GitHub's terms of service.

  • I wonder what sort of mitigations we can take to prevent such kind of attacks, wherein someone contributes to an open-source project to gain trust and to ultimately work towards making users of that software vulnerable. Besides analyzing with bigger scrutiny other people's contributions (as the article mentioned), I don't see what else one could do. There are many ways vulnerabilities can be introduced and a lot of them are hard to spot (especially in C with stuff like undefined behavior and lack of modern safety features) , so I don't think "being more careful" is going to be enough.

    I imagine such attacks will become more common now, and that these kind of attacks could become very appealing for governments.

    • As far as I understand, in this case opaque binary test data was gradually added to the repository. Also the built binaries did not correspond 1:1 with the code in the repo due to some buildchain reasons. Stuff like this makes it difficult to spot deliberately placed bugs or backdors.

      I think some measures can be:

      • establish reproducible builds in CI/CD pipelines
      • ban opaque data from the repository. I read some people expressing justification for this test-data being opaque, but that is nonsense. There's no reason why you couldn't compress+decompress a lengthy creative commons text, or for binary data encrypt that text with a public password, or use a sequence from a pseudo random number generator with a known seed, or a past compiled binary of this very software, or ... or ... or ...
      • establish technologies that make it hard to place integer overflows or deliberately miss array ends. That would make it a lot harder to plant a misbehavement in the code without it being so obvious that others note easily. Rust, Linters, Valgrind etc. would be useful things for that.

      So I think from a technical perspective there are ways to at least give attackers a hard time when trying to place covert backdoors. The larger problem is likely who does the work, because scalability is just such a hard problem with open source. Ultimately I think we need to come together globally and bear this work with many shoulders. For example the "prossimo" project by the Internet Security Research Group (the organisation behind Let's Encrypt) is working on bringing memory safety to critical projects: https://www.memorysafety.org/ I also sincerely hope the german Sovereign Tech Fund ( https://www.sovereigntechfund.de/ ) takes this incident as a new angle to the outstanding work they're doing. And ultimately, we need many more such organisations and initiatives from both private companies as well as the public sector to protect the technology that runs our societies together.

    • In the sysadmin world, the current approach is to follow a zero-trust and defense-in-depth model. Basically you do not trust anything. You assume that there's already a bad actor/backdoor/vulnerability in your network, and so you work around mitigating that risk - using measures such as compartmentalisation and sandboxing (of data/users/servers/processes etc), role based access controls (RBAC), just-enough-access (JEA), just-in-time access (JIT), attack surface reduction etc.

      Then there's network level measures such as conditional access, and of course all the usual firewall and reverse-proxy tricks so you're never exposing a critical service such as ssh directly to the web. And to top it all off, auditing and monitoring - lots of it, combined with ML-backed EDR/XDR solutions that can automatically baseline what's "normal" across your network, and alert you of any abnormality. The move towards microservices and infrastructure-as-code is also interesting, because instead of running full-fledged VMs you're just running minimal, ephemeral containers that are constantly destroyed and rebuilt - so any possible malware wouldn't live very long and would have to work hard at persistence. Of course, it's still possible for malware to persist in a containerised environment, but again that's where the defense-in-depth and monitoring comes into play.

      So in the case of xz, say your hacker has access to ssh - so what? The box they got access to was just a jumphost, they can't get to anywhere else important without knowing what the right boxes and credentials are. And even if those credentials are compromised, with JEA/JIT/MFA, they're useless. And even if they're compromised, they'd only get access into a very specific box/area. And the more they traverse across the network, the greater the risk of leaving an audit trail or being spotted by the XDR.

      Naturally none of this is 100% bullet-proof, but then again, nothing is. But that's exactly what the zero-trust model aims to combat. This is the world we live in, where we can no longer assume something is 100% safe. Proprietary software users have been playing this game for a long time, it's about time we OSS users also employ the same threat model.

    • @Faresh 1.) Making it easier to analyze. There are multiple steps in the whole process which may be hiding an exploit. The "tarball-not-same-as-git" is a clear example. Sure, reviewing will still be necessary and it will still be difficult, but it doesn't have to be as difficult as today. 2.) stop giving maintainer rights, fork instead. That's what pull requests are for. 3.) we should be careful if our critical infrastructure depends on a hobby project - either pay, or don't depend.

  • software developers are criticizing Microsoft and GitHub for taking down some of the affected code repositories

    Surely it's sensible of Github to take down malicious code? It's not just honest, hardworking people trying to make sense of this that have eyes, it's others looking for inspiration from what appears to be a sophisticated and very dangerous supply chain attack.

    • It does make sense, to prevent automated tools from pulling it's code. But I do wish they kept it around, maybe I my viewable if you're logged in or something like that, but it seems they don't have the tools to do this.

  • Wrong. The XZ backdoor highlights the value of enterprise-style releases vs the supply-chain exploits attacking the source stream. Backporting fixes is hard; but the diffs are smaller and this kind of trojan stands out.

  • so do i just need to update this in my update manager

  • I hope you didn't just copypasta the whole article of a cooperative journalism website that is trying to not get their stuff eaten by LLMs by not having their articles open on the internet?

  • Naturally closed source for profit software is so much better and would never contain anything malicious. We know this for certain because the PR department affirmed us that there is nothing malicious or illegal within their code. There internal investigations found no proof of hacking from external sources, All code changes where done with the full legal permission of our Ceo and Overlord Marz Kucherberg (tm)

  • Before pointing to vulnerabilities of open source software in general, please always look into the details, who -and if so - "without any need" thus also maybe "why" introduced the actual attack vector in the first place. The strength of open source in action should not be seen as a deficit, especially not in such a context.

    To me it looks like an evilish company has put lots of efforts over many years to inject its very own overall steady attack-vector-increase by "otherwise" needless increase of indroduction of uncounted dependencies into many distros.

    such a 'needless' dependency is liblzma for ssh:

    https://lwn.net/ml/oss-security/20240329155126.kjjfduxw2yrlxgzm@awork3.anarazel.de/

    openssh does not directly use liblzma. However debian and several other distributions patch openssh to support systemd notification, and libsystemd does depend on lzma.

    ... and that was were and how the attack then surprisingly* "happened"

    I consider the attack vector here to have been the superlfous systemd with its excessive dependency cancer. Thus result of using a Microsoft-alike product. Using M$-alike code, what would one expect to get?

    *) no surprises here, let me predict that we will see more of their attack vectors in action in the future: as an example have a look at the init process, systemd changed it into a 'network' reachable service. And look at all the "cute" capabilities it was designed to "need" ;-)

    however distributions free of microsoft(-ish) systemd are available for all who do not want to get the "microsoft experience" in otherwise security driven** distros

    **) like doing privilege separation instead of the exact opposite by "design"

    • How is SystemD a Microsoft product?

      • looking at the official timeline it is not completely a microsoft product, but...

        1. microsoft hated all of linux/open source for ages, even publicly called it a cancer etc.
        2. microsoft suddenly stopped it's hatespeech after the long-term "ineffectivenes" (as in not destroying) of its actions against the open source world became obvious by time
        3. systemd appeared on stage
        4. everything within systemd is microsoft style, journald is literally microsoft logging, how services are "managed" started etc is exactly the flawed microsoft service management, how systemd was pushed to distributions is similar to how microsoft pushes things to its victi... eh.. "custumers", systemd breaks its promises like microsoft does (i.e. it has never been a drop-in-replacement, like microsoft claimed its OS to be secure while making actual use of separation of users from admins i.e. by filesystem permissions first "really" in 2007 with the need of an extra click, where unix already used permissions for such protection in 1973), systemd causes chaos and removes the deterministic behaviour from linux distributions (i.e. before systemd windows was the only operating system that would show different errors at different times during installtion on the very same perfectly working hardware, now on systemd distros similar chaos can be observed too). there AFAIK still does not exist a definition of the 'binary" protocol of journald, every normal open source project would have done that official definition in the first place, systemd developers statement was like "we take care for it, just use our libraries" wich is microsoft style saying "use our products", the superflous systems features do harm more than they help (journald's "protection" from log flooding use like 50% cpu cycles for huge amount of wanted and normal logs while a sane logging system would be happily only using 3%cpu for the very same amount of logs/second whilst 'not' throwing away single log lines like journald, thus journald exhaustively and pointlessly abuses system resources for features that do more harm where they are said to help with in the first place), making the init process a network reachable service looks to me like as bad as microsoft once put its web rendering enginge (iis) into kernelspace to be a bit faster but still beeing slower than apache while adding insecurity that later was an abused attack vector. systemd adding pointless dependencies all along the way like microsoft does with its official products to put some force on its customers for whatever official reason they like best. systemd beeing pushed to distributions with a lot of force and damage even to distributions that had this type of freedom of choice to NOT force their users to use a specific init system in its very roots (and the push to place systemd inside of those distros even was pushed furzher to circumvent the unstable->testing->stable rules like microsoft does with its patches i.e.), this list is very far from complete and still no end is in sight.
        5. "the" systemd developer is finally officially hired by microsoft

        i said that systemd was a microsoft product long before its developer was then hired by microsoft in 2022. And even if he wasn't hired by them, systemd is still a microsoft-style product in every important way with all what is wrong in how microsoft does things wrong, beginning with design flaws, added insecurities and unneeded attack vectors, added performance issues, false promises, usage bugs (like i've never seen an already just logged in user to be directly be logged off in a linux system, except for when systemd wants to stop-start something in background because of it's 'fk y' and where one would 'just try to login again and dont think about it" like with any other of microsofts shitware), ending in insecure and instable systems where one has to "hope" that "the providers" will take care for it without continueing to add even more superflous features, attack vectors etc. as they always did until now.

        systemd is in every way i care about a microsoft product. And systemd's attack vectors by "needless dependencies" just have been added to the list of "prooven" (not only predicted) to be as bad as any M$ product in this regard.

        I would not go as far to say that this specific attack was done by microsoft itself (how could i ?), but i consider it a possibility given the facts that they once publicly named linux/open source a "cancer" and now their "sudden" change to "support the open source world" looks to me like the poison "Gríma" used on "Théoden" as well as some other observations and interpretations. however i strongly believe that microsoft secretly actually "likes" every single damage any of systemd's pointlessly added dependencies or other flaws could do to linux/open source very much. and why shouldn't they like any damage that was done to any of their obvious opponents (as in money-gain and "dictatorship"-power)? it's a us company, what would one expect?

        And if you want to argue that systemd is not "officially" a product of the microsoft company.... well people also say "i googled it" when they mean "i used one of the search engines actually better than google.com" same with other things like "tempo" or "zewa" where i live. since the systemd developer works for microsoft and it seems he works on systemd as part of this work contract, and given all the microsoft style flaws within from the beginning, i consider systemd a product of microsoft. i think systemd overall also "has components" of apple products, but these are IMHO none of technical nature and thus far from beeing part of the discussion here and also apple does not produce "even more systemd" also apple has -as of my experience- very other flaws i did not encounter in systemd (yet?) thus it's clearly not an apple product.

You've viewed 44 comments.