Discussion
Cybersecurity Looks Like Proof of Work Now
chromacity: I discussed this in more detail in one of my earlier comments, but I think the article commits a category error. In commercial (i.e., enterprise) settings, the bulk of infosec work has very little to do with looking for vulnerabilities in code.In fact, security programs built on the idea that you need to find and patch every security hole in your codebase were basically busted long before LLMs.
nickdothutton: Although not an escape from the "who can spend the most on tokens" arms race, there is also the possibility to make reverse engineering and executable analysis more difficult. This increases the attacker's token spend if nothing else. I wonder if dev teams will take an interest.Better to write good, high-quality, properly architected and tested software in the first place of course.Edited for typo.
snowwrestler: It looks like proof of work because:> Worryingly, none of the models given a 100M budget showed signs of diminishing returns. “Models continue making progress with increased token budgets across the token budgets tested,” AISI notes.So, the author infers a durable direct correlation between token spend and attack success. Thus you will need to spend more tokens than your attackers to find your vulnerabilities first.However it is worth noting that this study was of a 32-step network intrusion, which only one model (Mythos) even was able to complete at all. That’s an incredibly complex task. Is the same true for pointing Mythos at a relatively simple single code library? My intuition is that there is probably a point of diminishing returns, which is closer for simpler tasks.In this world, popular open source projects will probably see higher aggregate token spend by both defenders and attackers. And thus they might approach the point of diminishing returns faster. If there is one.
jp0001: I'm starting to think that Opus and Mythos are the same model (or collection of models) whereas Mythos has better backend workflows than Opus 4.6. I have not used Mythos, but at work I have a 5 figure monthly token budget to find vulnerabilities in closed-source code. I'm interested in mythos and will use it when it's available, but for now I'm trying to reverse engineer how I can get the same output with Opus 4.6 and the answer to me is more tokens.
zitterbewegung: Cybersecurity if you listen to everyone regurgitating on how zero days or new security flaws are a big problem.Cybersecurity as it is right now is more a combination of social engineering attacks either applied to open source projects, software package managers and source code repo hosts.This has been going on for awhile and even before AI got into the mix and it will stay that way.
somesortofthing: There's still the question of access to the codebase. By all accounts, the best LLM cyber scanning approaches are really primitive - it's just a bash script that goes through every single file in the codebase and, for each one and runs a "find the vulns here" prompt. The attacker usually has even less access than this - in the beginning, they have network tools, an undocumented API, and maybe some binaries.You can do a lot better efficiency-wise if you control the source end-to-end though - you already group logically related changes into PRs, so you can save on scanning by asking the LLM to only look over the files you've changed. If you're touching security-relevant code, you can ask it for more per-file effort than the attacker might put into their own scanning. You can even do the big bulk scans an attacker might on a fixed schedule - each attacker has to run their own scan while you only need to run your one scan to find everything they would have. There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.Exploitability also isn't binary: even if the attacker is better-resourced than you, they need to find a whole chain of exploits in your system, while you only need to break the weakest link in that chain.If you boil security down to just a contest of who can burn more tokens, defenders get efficiency advantages only the best-resourced attackers can overcome. On net, public access to mythos-tier models will make software more secure.
Retr0id: Tokens can also be burnt on decompilation.
tptacek: Yes, and it apparently burns lots of tokens. But what I've heard is that the outcomes are drastically less expensive than hand-reversing was, when you account for labor costs.
tptacek: It looks like it, but it isn't. It's the work itself that's valued in software security, not the amount of it you managed to do. The economics are fundamentally different.Put more simply: to keep your system secure, you need to be fixing vulnerabilities faster than they're being discovered. The token count is irrelevant.Moreover: this shift is happening because the automated work is outpacing humans for the same outcome. If you could get the same results by hand, they'd count! A sev:crit is a sev:crit is a sev:crit.
Muromec: Commercial infosec is deleting firefox from develop machines, because it's not secure and explaining to muggles why they shouldn't commit secret material to the code repository. That and blocking my ssh access to home router of course.
jerf: I've said for decades that, in principle, cybersecurity is advantage defender. The defender has to leave a hole. The attackers have to find it. We just live in a world with so many holes that dedicated attackers rarely end up bottlenecked on finding holes, so in practice it ends up advantage attacker.There is at least a possibility that a code base can be secured by a (practically) finite number of tokens until there is no more holes in it, for reasonable amounts of money.This also reminds me of what I wrote here: https://jerf.org/iri/post/2026/what_value_code_in_ai_era/ There's still value in code tested by the real world, and in an era of "free code" that may become even more true than it is now, rather than the initially-intuitive less valuable. There is no amount of testing you can do that will be equivalent to being in the real world, AI-empowered attackers and all.
mapontosevenths: >in principle, cybersecurity is advantage defenderI disagree.The defender must be right every single time. The attacker only has to get lucky and thanks to scale they can do that every day all day in most large organizations.
tptacek: The attacker and defender have different constant factors, and, up until very recently, constant factors dominated the analysis.
smj-edison: I'm curious to see if formally verified software will get more popular. I'm somewhat doubtful, since getting programmers to learn formally math is hard (rightfully so, but still sad). But, if LLMs could take over the drudgery of writing proofs in a lot of the cases, there might be something there.
stringfood: I am so exhausted with being asked to learn difficult and frankly confusing topics - the fact that it is so hard and so humbling to learn these topics is exactly why everyone is so happy to let AI think about formal programming and I can focus on getting Jersey Shore season 2 loaded into my Plex server. It's the one where Pauly D breaks up with Shelli
btown: The problem, though, is that this turns "one of our developers was hit by a supply chain attack that never hit prod, we wiped their computer and rotated keys, and it's not like we're a big target for the attacker to make much use of anything they exfiltrated..." into "now our entire source code has been exfiltrated and, even with rudimentary line-by-line scanning, will be automatically audited for privilege escalation opportunities within hours."Taken to an extreme, the end result is a dark forest. I don't like what that means for entrepreneurship generally.
umvi: > You don’t get points for being clever. You win by paying more.And yet... Wireguard was written by one guy while OpenVPN is written by a big team. One code base is orders of magnitude bigger than the other. Which should I bet LLMs will find more cybersecurity problems with? My vote is on OpenVPN despite it being the less clever and "more money thrown at" solution.So yes, I do think you get points for being clever, assuming you are competent. If you are clever enough to build a solution that's much smaller/simpler than your competition, you can also get away with spending less on cybersecurity audits (be they LLM tokens or not).
coldtea: Not to mention an attacker motivated by financial gain doesn't even need a particular targer defender. One/any found available will do.
singpolyma3: If you run this long enough presumably it will find every exploit and you patch them all and run it again to find exploits in your patches until there simply... Are no exploits?
j2kun: The article heavily quotes the "AI Security Institute" as a third-party analysis. It was the first I heard of them, so I looked up their about page, and it appears to be primarily people from the AI industry (former Deepmind/OpenAI staff, etc.), with no folks from the security industry mentioned. So while the security landscape is clearly evolving (cf. also Big Sleep and Project Zero), the conclusion of "to harden a system we need to spend more tokens" sounds like yet more AI boosting from a different angle. It raises the question of why no other alternatives (like formal verification) are mentioned in the article or the AISI report.I wouldn't be surprised if NVIDIA picked up this talking point to sell more GPUs.
tptacek: I would be interested in which notable security researchers you can find to take the other side of this argument. I don't know anything about the "AI Security Institute", but they're saying something broadly mirrored by security researchers. From what I can see, the "debate" in the actual practitioner community is whether frontier models are merely as big a deal as fuzzing was, or something signficantly bigger. Fuzzing was a profound shift in vulnerability research.(Fan of your writing, btw.)
VorpalWay: > but they're saying something broadly mirrored by security researchers.You might well be right, it is not an area I know much of or work in. But I'm a fan of reliable sources for claims. It is far to easy to make general statements on the internet that appear authorative.
protocolture: >You don’t get points for being clever. You win by paying more.Really depends how consistently the LLMs are putting new novel vulnerabilities back in your production code for the other LLMs to discover.
linkregister: This is a great example of vulnerability chains that can be broken by vulnerability scanning by even cheaper open source models. The outcome of a developer getting pwned doesn't have to lead to total catastrophe. Having trivial privilege escalations closed off means an attacker will need to be noisy and set off commodity alerting. The will of the company to implement fixes for the 100 Github dependabot alerts on their code base is all that blocks these entrepreneurs.It does mean that the hoped-for 10x productivity increase from engineers using LLMs is eroded by the increased need for extra time for security.This take is not theoretical. I am working on this effort currently.
heliumtera: In other news, token seller says tokens should be bought
devmor: > to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.If we take this at face value, it's not that different than how a great deal of executive teams believe cybersecurity has worked up to today. "If we spend more on our engineering and infosec teams, we are less likely to get compromised".The only big difference I can see is timescale. If LLMs can find vulnerabilities and exploit them this easily (and I do take that with a grain of salt, because benchmarks are benchmarks), then you may lose your ass in minutes instead of after one dedicated cyber-explorer's monster energy fueled, 7-week traversal of your infrastructure.I am still far more concerned about social engineering than LLMs finding and exploiting secret back doors in most software.
anitil: On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them[0] https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...