Discussion
woodruffw: > Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).I think this post has some good information in it, but this is essentially overstated: I look at crate discrepancies pretty often as part of reviewing dependency updates, and >90% of the time it's a single line difference (like a timestamp, hash, or some other shudder between the state of the tree at tag-time and the state at release-time). These are non-ideal from a consistency perspective, but they aren't cause for this degree of alarm -- we do know what the code does, because the discrepancies are often trivial.
ethanj8011: Isn't the point that unless actually audited each time, the code could still be effectively anything?
woodruffw: Yes, but that's already the case. My point was that in practice the current discrepancies observed don't represent a complete disconnect between the ground truth (the source repo) and the package index, they tend to be minor. So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.
dralley: I think it just depends on whether or not you interpret the phrase "no one knows" neutrally or pessimistically.Saying that there could be something there, but "no one knows" doesn't mean that there is something there. But it's still true.
woodruffw: If that's the case, it would be a lot simpler (and equally accurate) to say that "no one knows" what the source repo is doing, either! The median consumer of packages in any packaging ecosystem is absolutely not reading the entire source code of their dependencies, in either the ground truth or index form.
bcjdjsndon: But it's impossible to have a buffet overflow in rust
CoastalCoder: > But it's impossible to have a buffet overflow in rustI dunno, I can only listen to Margaritaville so many times in a row.
amelius: Rust should add a way to sandbox every dependency.It's basically what we're already doing in our OSes (mobile at least), but now it should happen on the level of submodules.
echelon: For your serious consideration: Claude Mythos is going to change the risk envelope of this problem.We're still thinking in the old mindset, whereas new tools are going to change how all of this is done.In some years dependencies will undergo various types of automated vetting. We need to think about how to scale this problem instead. We're not ready for it.
lesuorac: Eh, the only way to secure your Rust programs it the technique not described in the article.Vendor your dependencies. Download the source and serve it via your own repository (ex. [1]). For dependencies that you feel should be part of the "Standard Library" (i.e. crates developed by the Rust team but not included into std) don't bother to audit them. For the other sources, read the code and decide if it's safe.I'm honestly starting to regret not starting a company like 7 years ago where all I do is read OSS code and host libraries I've audited (for a fee to the end-user of course). This was more relevant for USG type work where using code sourced from an American is materially different than code sourced from non-American.[1]: https://docs.gitea.com/usage/packages/cargo
whytevuhuni: The only thing this leads to is that you'll have hundreds of vendored dependencies, with a combined size impossible to audit yourself.But if you somehow do manage that, then you'll soon have hundreds of outdated vendored dependencies, full of unpatched security issues.
echelon: A large number of security issues in the supply chain are found in the weeks or months after library version bumps. Simply waiting six months to update dependency versions can skip these. It allows time to pass and for the dependency changes to receive more eyeballs.Vendoring buys and additional layer of security.When everyone has Claude Mythos, we can self-audit our supply chain in an automated fashion.
petcat: How would that work? Rust "crates" are just a compilation unit that gets linked into the resulting binary.
tasuki: > In a recent analysis, Adam Harvey found that among the 999 most popular crates on crates.io, around 17% contained code that do not match their code repository.Huh, how is this possible? Is the code not pulled from the repository? Why not?
duped: Publishing doesn't go through GitHub or another forge, it's done from the local machine. Crates can contain generated code as well.
nyc_pizzadev: Random question, does cargo have a way to identify if a package uses unsafe Rust code?
woodruffw: No, but you can use cargo-geiger[1] or siderophile[2] for that.[1]: https://github.com/geiger-rs/cargo-geiger[2]: https://github.com/trailofbits/siderophile
bluGill: That is why you mix in "Something So Feminine About A Mandolin" in once in a while. Or if you really insist on only very well known tunes "Cheese Burger in Paradise" should still count.
downrightmike: Also to note that RS domain is Serbia, who could simply redirect all rust users to malicious domains in a supply chain attack.
sgbeal: > So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.Noting that you willfully cut the qualifying "virtually" from that quote, thereby transforming it to over-stated:> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does
lukeschlather: > Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).I dug into the linked article, and I would really say this means something closer to 17% of the most popular Rust package versions are either unbuildable or have some weird quirks that make building them not work the way you expect, and not in a remotely reproducible fashion.https://lawngno.me/blog/2024/06/10/divine-provenance.htmlPulling things into the standard lib is fine if you think everyone should stop using packages entirely, but that doesn't seem like it really does anything to solve the actual problem. There are a number of things it seems like we might be forced to adopt across the board very soon, and for Rust it seems tractable, but I shudder to think about doing it for messier languages like Ruby, Python, Perl, etc.* Reproducible builds seems like the first thing.* This means you can't pull in git submodules or anything from the Internet during your build.* Specifically for the issues in this post, we're going to need proactive security scanners. One thing I could imagine is if a company funnels all their packages through a proxy, you could have a service that goes and attempts to rebuild the package from source, and flags differences. This requires the builds to be remotely reproducible.* Maybe the latest LLMs like Claude Mythos are smart enough that you don't need reproducible builds, and you can ask some LLM agent workflow to review the discrepancies between the repo and the actual package version.