- cross-posted to:
- programming@programming.dev
- cross-posted to:
- programming@programming.dev
Why can’t we have nice things instead.
I mean, this kind of stuff was going to happen.
The more-important and more-widely-used open source software is, the more appealing supply-chain attacks against it are.
The world where it doesn’t happen is one where open source doesn’t become successful.
I expect that we’ll find ways to mitigate stuff like this. Run a lot more software in isolation, have automated checking stuff, make more use of developer reputation, have automated code analysis, have better ways to monitor system changes, have some kind of “trust metric” on packages.
Go back to the 1990s, and most everything I sent online was unencrypted. In 2024, most traffic I send is encrypted. I imagine that changes can be made here too.
Yeah, I think there is a lot of potential for code analysis. There’s a limited cross section of ways malware can do interesting things, but many permutations of ways to do that.
So look for the interesting things, like:
- accessing other programs’ address spaces
- reading/writing files
- deleting/moving files
- sending/receiving network traffic
- os system calls and console commands
- interacting with hardware
- spawning new processes
- displaying things on the screen
- accessing timing information
Obviously there’s legitimate uses for each of these, so that’s just the first step.
Next, analyze the data that is being used for that:
- what’s the source?
- what’s the destination?
- what kind of transformations are being applied to the data?
Then you can watch out for things like:
- is it systematically going through directories and doing some operation to all files? (Maybe ransomware, data scrubbing, or just maliciously deleting stuff?)
- is it grabbing data from somewhere and sending it somewhere else on the internet? (Stealing data?)
- is it using timing information to build data? (Timing attacks to figure out kernel data that should be hidden?)
- is it changing OS settings/setup?
Then generate a report of everything it is doing and see if it aligns with what the code is supposed to do. Or you could even build some kind of permissions system around that with more sophistication than the basic “can this app access files? How about the internet?”
Computer programs can be complex, but are ultimately made up of a series of simple operations and it’s possible to build an interpreter that can do those operations and then follow everything through to see exactly what is included in the massive amount of data it sends over the network so that you can tell your file sharing program is also for some reason sending /etc/passwords to a random address or listening for something to access a sequence of closed ports and then will do x, y, z, if that ever happens. Back doors could be obvious with the right analysis tools, especially if it’s being built from source code (though I believe it’s still possible with binaries, just maybe a bit harder).
The Jia Tan xz backdoor attack did get flagged by some automated analysis tools – they had to get the analysis tools modified so that it would pass – and that was a pretty sophisticated attack. The people running the testing didn’t catch it, trusted the Jia Tan group that it was a false positive that needed to be fixed, but it was still putting up warning lights.
More sophisticated attackers will probably replicate their own code analysis environments mirroring those they know of online, make a checklist of running what code analysis tools they can run against locally prior to making the code visible, tweak it until it passes – but I think that it definitely raises the bar.
Could have some analysis tools that aren’t made public but run against important public code repositories specifically to try to make this more difficult.
I believe you. There is no AI ever made that could have as bad a grammar as you. ;)
Because people have forgotten that bad actors exist.
I mean programming language package managers are just begging to be used as an attack vector. This is why package management should be an OS responsibility across the board and only trusted package sources and publishers should ever be allowed.
Or at the very fucking least require specific versions with checksums, like golang.
I really think every package repository should be opt in and every publisher should be required to verify their identity and along with checksum verification for the downloaded files.
There’s also the alternatives of making your own library. I’m happy to use minimal amount of 3rd parties and just make my own instead.
I’m not sure I understand what you are saying. What part of the OS should managed the packages? The creators aka. Microsoft/Linux foundation/Apple/Google, the distributor, or a kernel module? What about cross platform package managers like Nuget, gradle, npm?
What part of the OS should managed the packages?
The OS package manager. This is already a thing with Python in apt and pacman, where it will give you a fat warning if you try to install a package through
pip
instead of the actual OS package manager (i.e.pacman -Syu python-numpy
instead ofpip install numpy
)
Tale as old as time.
🎶 Beauty and the beast
I don’t know much about NPM (having avoided JS as much as possible for my entire life), but golang seems to have a good solution: ‘vendoring’. One can choose to lock all external dependencies to local snapshots brought into a project, with no automatic updating, but with the option to manually update them when desired.
NPM has that as well. In fact most languages and build tools support that. It’s actually rare to not have support for that these days.
Yes. I can’t imagine being foolish enough to automatically update your external dependencies when you don’t need to.
Ah, good. I wonder why it isn’t used more often – this wouldn’t be such a huge problem then I would hope. (Let me guess – ‘convenience’, the archenemy of security.)
Because it doesn’t really solve much. After every update of external libraries, do you go through all the diffs to see if there is malicious code? Of course you don’t. And even if you would, it’s not even always possible to spot it. So all locking packages does is postpone the problem to when you eventually update. As an added bonus, you’re now vulnerable to all the legitimate issues that get fixed in those updates you’re not installing regularly.
As LiPoly said, it doesn’t really solve the problem. It’s not useless, it does accomplish something, but not that. Locking dependencies isn’t a security thing, it’s a reproducible builds thing. You can accomplish that by just using a traditional static version of everything, but now you’ve got a maintenance headache as you’re constantly needing to go in and update your dependency versions. You could instead use version ranging, but now you never actually know which version of a dependency any given build is going to end up using. Locking allows you to have the best of both worlds.
To understand how this works, lets take a look at a hypothetical. Lets say you have a code base, and a CICD setup. Additionally you’re using a git-flow style release management where your release version is in master, your active development is in develop, and your feature work is done in feature branches. What you do is setup your version ranges to cover what the semantic versions of things say should be compatible (generally locked major version, and possibly locked minor depending on the library, but unlocked patch). In your CICD for CI builds of develop and feature branches you include a step that updates your lock file to the latest version of a library matching your version range. This insures that your develop and feature branches are always using the latest version of dependencies. For your master branch though, its CI job only builds, it never updates the lock file. This means when you merge a release to master your dependencies are effectively frozen. Every build from the master branch is guaranteed to use exactly the same versions of dependencies as when you merged it.
I don’t think that that’s a counter to the specific attack described in the article:
The malicious packages have names that are similar to legitimate ones for the Puppeteer and Bignum.js code libraries and for various libraries for working with cryptocurrency.
That’d be a counter if you have some known-good version of a package and are worried about updates containing malicious software.
But in the described attack, they’re not trying to push malicious software into legitimate packages. They’re hoping that a dev will accidentally use the wrong package (which presumably is malicious from the get-go).
That won’t prevent typo squatting. This article is a out people wanting to add a dependency to “famousLib” and instead typing “famusLib”.
What probably help more in Go is the lack of a central repo so you actually need to “go get github.com/whoever…” so typo squatting is a bit be a bit more complicated.
On the other hand it will be an easy fix in NPM by simply adding a check to libraries names and reject names that are too similar since it’s centralized.
I’m with you but I have regrettably been sucked into the node-i-verse against my will.
Good thing people expect no promises of security from javascript anyway.
I just love async await
That’s also why I always use dev containers
Does anyone know how JSR and Deno would do in this type of attack?
deleted by creator
This should kill off NPM
You’d be surprised to see how many common libraries have vulnerabilities every week.
As well as how many common JS libraries, while not malicious have no business existing (ex. IsEven).
Why stop there lets just kill js in its entirity.
Not really a language-specific problem. Like, there are numerous languages that have distribution mechanisms for libraries that might potentially be malicious.
Only way I can think that the language might be a factor would be if a language were designed to only run in a restricted mode.
Not really a language-specific problem, but why should that stop us from this goal?
Exactly
You must be very smart.