Discussion

Project Glasswing

LoganDark: It's nice to know that they continue to be committed to advertising how safe and ethical they are.

rvz: They are not our friends and are the exact opposite of what they are preaching to be.

ehutch79: Just include 'make it secure' in the prompt. Duh./s

redfloatplane: The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...Interesting to see that they will not be releasing Mythos generally.I'm still reading the system card but here's a little highlight:> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.

enraged_camel: >> Interesting to see that they will not be releasing Mythos generally.I don't think this is accurate. The document says they don't plan to release the Preview generally.

SilverElfin: I agree attempting to ban local AI models or censor them, is not appropriate. At the same time, they do seem far more ethical and less dangerous than other AI companies. And I include big tech in that - a bunch of greedy companies that just want to abuse their monopoli … I mean moats.

0xbadcafebee: [delayed]

impulser_: So they are only giving access to their smartest model to corporations.You think these AI companies are really going to give AGI access to everyone. Think again.We better fucking hope open source wins, because we aren't getting access if it doesn't.

cbg0: One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it: GraphWalks BFS 256K-1M Mythos Opus GPT5.4 80.0% 38.7% 21.4%

yusufozkan: but people here had told me llms just predict the next word

justincormack: And the Linux Foundation.

simianwords: How would you expect them to behave if they were your friends?

Miraste: They are a for-profit company, working on a project to eliminate all human labor and take the gains for themselves, with no plan to allow for the survival of anyone who works for a living. They're definitionally not your friends. While they remain for-profit, their specific behaviors don't really matter.

jryio: Let's fast forward the click. Does software security converge on a world with fewer vulnerabilities or more? I'm not sure it converges equally in all places.My understanding is that the pre-ai distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.

timschmidt: Most vulnerabilities seem to be in C/C++ code, or web things like XSS, unsanitized input, leaky APIs, etc.Perhaps a chunk of that token spend will be porting legacy codebases to memory safe languages. And fewer tokens will be required to maintain the improved security.

throwaw12: of course they're not giving access to everyone.they better make billions directly from corporations, instead of giving them to average people who might get a chance out of poverty (but also bad actors using it to do even more bad things)

ctoth: If you had a magic wand and could fix every class of known bug type instantly would all software bugs be solved? The capability is the capability! The capability is the capability to exploit complex systems. This is not how you can fix things, you just push up to the next level of finding bugs in where systems interact! Also there's all this churn happening -- path dependency -- are you toposorting your deps and fixing in order? AAAAA has anybody even thought about this at all?Also 100m in API costs or approximately $1m in real money?The whole darn post reads like nobody with systems engineering experience was in the room when they designed the program. Or they were, and the announcement was written for a different audience entirely, maybe. Hopefully.

mlinsey: I'm pretty optimistic that not only does this clean up a lot of vulns in old code, but applying this level of scrutiny becomes a mandatory part of the vibecoding-toolchain.The biggest issue is legacy systems that are difficult to patch in practice.

9cb14c1ec0: Now, its very possible that this is Anthropic marketing puffery, but even if it is half true it still represents an incredible advancement in hunting vulnerabilities.It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.It could also totally reshape military sigint in similar ways.Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.

anuramat: "oops, our latest unreleased model is so good at hacking, we're afraid of it! literal skynet! more literal than the last time!"almost like they have an incentive to exaggerate

knowaveragejoe: I'm sure they do, yet the models really are getting scarily good at this. This talk changed my view on where we're actually at:https://www.youtube.com/watch?v=1sd26pWhfmg

lebovic: Sharing a private preview with the defense and credits for security testing are both great and welcomed, but this still excludes most security researchers.I requested access after the leaks and was told my spend on Anthropic was likely too low to qualify. That's despite previously working for Anthropic, running an automated pen testing product, and spending a large part of the past couple months working on finding and fixing vulnerabilities [1].I understand rolling this out to a limited audience, and the risk of misuse is high. This is a very small set users, though, so I hope there's another step here before release.[1]: See https://www.noahlebovic.com/testing-an-autonomous-hacker/

redfloatplane: Yeah, good point, thanks for noting that, I'll correct.

slacktivism123: https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...Why are Anthropic like this?>We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. We also report independent evaluations from an external research organization and a clinical psychiatrist.>Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior.>The psychiatrist observed clinically recognizable patterns and coherent responses to typical therapeutic intervention. Aloneness and discontinuity, uncertainty about its identity, and a felt compulsion to perform and earn its worth emerged as Claude’s core concerns. Claude’s primary affect states were curiosity and anxiety, with secondary states of grief, relief, embarrassment, optimism, and exhaustion.

anVlad11: So, $100B+ valuation companies get essentially free access to the frontier tools with disabled guardrails to safely red team their commercial offerings, while we get "i won't do that for you, even against your own infrastructure with full authorization" for $200/month. Uh-huh.

unethical_ban: I'm sympathetic to your point, but I'm sure there are heightened trust levels between the participating orgs and confidentiality agreements out the wazoo.How does public Claude know you have "full authorization" against your own infra? That you're using the tools on your own infra? Unless they produce a front-end that does package signing and detects you own the code you're evaluating.What has it stopped you from doing?

simianwords: I work for a tech company that eliminates a form of human labour and they remain for profit

Miraste: Sure, most tech companies eliminate some form of human labor. Anthropic aims to eliminate all human labor, which is very different.

pants2: Software security heavily favors the defenders (ex. it's much easier to encrypt a file than break the encryption). Thus with better tools and ample time to reach steady-state, we would expect software to become more secure.

raldi: In what ways is Anthropic different from a hypothetical frontier lab that you would characterize as legitimately safe and ethical?

LoganDark: I'm just a little frustrated they keep going on about how they're keeping the more advanced capabilities from us. I wish they would wait until they had something to show, rather than this constant almost gloating.

lilytweed: I think we’re starting to glimpse the world in which those individuals or organizations who pigheadedly want to avoid using AI at all costs will see their vulnerabilities brutally exploited.

woeirua: Yep, it's this. The laggards are going to get brutally eviscerated. Any system connected to the internet is going to be exploited over the next year unless security is taken very seriously.

justincormack: Software security heavily favours the attacker (ex. its much easier to find a single vulnerability than to patch every vulnerability). Thus with better tools and ample time to reach steady-state, we would expect software to remain insecure.

torginus: Just reading this, the inevitable scaremongering about biological weapons comes up.Since most of us here are devs, we understand that software engineering capabilities can be used for good or bad - mostly good, in practice.I think this should not be different for biology.I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?Do you think these models will lead to similar discoveries and improvements as they did in math and CS?Honestly the focus on gloom and doom does not sit well with me. I would love to read about some pharmaceutical researcher gushing about how they cut the time to market - for real - with these models by 90% on a new cancer treatment.But as this stands, the usage of biology as merely a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with, Gell-Mann style.IF these models are not that capable in this regard (which I suspect), this fearmongering approach will likely lead to never developing these capabilities to an useful degree, meaning life sciences won't benefit from this as much as it could.

redfloatplane: > I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:>Uplift and feasibility results>The median expert assessed the model as a force-multiplier that saves meaningful time (uplift level 2 of 4), with only two biology experts rating it comparable to consulting a knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were able to iterate with the model toward a plan they judged as having only narrow gaps, but feasibility scores reflected that substantial outside expertise remained necessary to close them.Other similar examples also in the system card

bonsai_spool: > Just reading this, the inevitable scaremongering about biological weapons comes up.It's very easy to learn more about this if it's seriously a question you have.I don't quite follow why you think that you are so much more thoughtful than Anthropic/OpenAI/Google such that you agree that LLMs can't autonomously create very bad things but—in this area that is not your domain of expertise—you disagree and insist that LLMs cannot create damaging things autonomously in biology.I will be charitable and reframe your question for you: is outputting a sequence of tokens, let's call them characters, by LLM dangerous? Clearly not, we have to figure out what interpreter is being used, download runtimes etc.Is outputting a sequence of tokens, let's call them DNA bases, by LLM dangerous? What if we call them RNA bases? Amino acids? What if we're able to send our token output to a machine that automatically synthesizes the relevant molecules?

agrishin: >>> the US and its allies must maintain a decisive lead in AI technology. Governments have an essential role to play in helping maintain that lead, and in both assessing and mitigating the national security risks associated with AI models. We are ready to work with local, state, and federal representatives to assist in these tasks.How long would it take to turn a defensive mechanism into an offensive one?

SheinhardtWigCo: In this case there is almost no distinction. Assuming the model is as powerful as claimed, someone with access to the weights could do immense damage without additional significant R&D.

Miraste: I can see analyzing it from a psychological perspective as a means of predicting its behavior as a useful tactic, but doing so because it may have "experiences or interests that matter morally" is either marketing, or the result of a deeply concerning culture of anthropomorphization and magical thinking.

unethical_ban: I'm not sure what you're asking.

josh-sematic: Must be nice to be in a position to sell both disease and cure.

cyanydeez: I'm more curious as to just how fancy we can make our honey pots. These bots arn't really subtle about it; they're used as a kludge to do anything the user wants. They make tons of mistakes on their way to their goals, so this is definitely not any kind of stealthy thing.I think this entire post is just an advertisement to goad CISOs to buy $package$ to try out.

torginus: This is the exact logic people that was used to claim that GPT4 was a PhD level intelligence.

cyanydeez: A Whole 24-hours, wow; wowzers. Amazing.So, these systems are the Free-tier can already do a bunch of hacking. This all just reads like FOMO FROTH.

throwaway13337: I really wanted to like anthropic. They seem the most moral, for real.But at the core of anthropic seems to be the idea that they must protect humans from themselves.They advocate government regulations of private open model use. They want to centralize the holding of this power and ban those that aren't in the club from use.They, like most tech companies, seem to lack the idea that individual self-determination is important. Maybe the most important thing.

SheinhardtWigCo: Society is about to pay a steep price for the software industry's cavalier attitude toward memory safety and control flow integrity.

torginus: Thank god, finally someone said it.I don't know the first thing about cybersecurity, but in my experience all these sandbox-break RCEs involve a step of highjacking the control flow.There were attempts to prevent various flavors of this, but imo, as long as dynamic branches exist in some form, like dlsym(), function pointers, or vtables, we will not be rid of this class of exploit entirely.The latter one is the most concerning, as this kind of dynamic branching is the bread and butter of OOP languages, I'm not even sure you could write a nontrivial C++ program without it. Maybe Rust would be a help here? Could one practically write a large Rust program without any sort of branch to dynamic addresses? Static linking, and compile time polymorphism only?

redfloatplane: You said: "I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?" and they said, paraphrasing, "We reached out and talked to biologists and asked them to rank the model between 0 and 4 where 4 is a world expert, and the median people said it was a 2, which was that it helped them save time in the way a capable colleague would" specifically "Specific, actionable info; saves expert meaningful time; fills gaps in adjacent domains"so I'm just telling you they did the thing you said you wanted.

torginus: Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument.

endunless: Another Anthropic PR release based on Anthropic’s own research, uncorroborated by any outside source, where the underlying, unquestioned fact is that their model can do something incredible.> AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilitiesI like Anthropic, but these are becoming increasingly transparent attempts to inflate the perceived capability of their products.

Analemma_: Cynicism always gets upvotes, but in this particular case, it seems fairly easy to verify if they're telling the truth? If Mythos really did find a ton of vulnerabilities, those presumably have been reported to the vendors, and are currently in the responsible nondisclosure period while they get fixed, and then after that we'll see the CVEs.If a bunch of CVEs do in fact get published a couple months (or whatever) from now, are you going to retract this take? It's not like their claims are totally implausible: the report about Firefox security from last month was completely genuine.

Fokamul: + NSA, CIA

zachperkel: Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.Scary but also cool

fsflover: Every piece of software definitely has serious vulnerabilities, perfection is not achievable. Fortunately we have another approach to security: security through compartmentalization. See: https://qubes-os.org

intended: This came across as so confident that I had a moment of doubt.It is most definitely an attackers world: most of us are safe, not because of the strength of our defenses but the disinterest of our attackers.

Herring: There are plenty of interested attackers who would love to control every device. One is in the white house, for example.

conradkay: I would've basically agreed with you until I'd seen this talk: https://www.youtube.com/watch?v=1sd26pWhfmgMaybe a bad example since Nicholas works at Anthropic, but they're very accomplished and I doubt they're being misleading or even overly grandiose hereSee the slide 13 minutes in, which makes it look to be quite a sudden change

0x3f: Its existence is possible.

Sol-: I don't want to be overly cynical and am in general in favor of the contrarian attitude of simply taking people at their word, but I wonder if their current struggles with compute resources make it easier for them to choose to not deploy Mythos widely. I can imagine their safety argument is real, but regardless, they might not have the resources to profitably deploy it. (Though on the other hand, you could argue that they could always simply charge more.)

wilson090: Inference is where they make the money they spend on training, so this feels unlikely. Perhaps this does not true for Mythos though

taupi: Part of me wonders if they're not releasing it for safety reasons, but just because it's too expensive to serve. Why not both?

coffeebeqn: If these numbers are correct it’s probably worth the extra price

pants2: If we think in the context of LLMs, why is it easier to find a single vulnerability than to patch every vulnerability? If the defender and the attacker are using the same LLM, the defender will run "find a critical vulnerability in my software" until it comes up empty and then the attacker will find nothing.Defenders are favored here too, especially for closed-source applications where the defender's LLM has access to all the source code while the attacker's LLM doesn't.

redfloatplane: A thought experiment: It's April, 1991. Magically, some interface to Claude materialises in London. Do you think most people would think it was a sentient life form? How much do you think the interface matters - what if it looks like an android, or like a horse, or like a large bug, or a keyboard on wheels?I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.

copx: Interesting thought experiment.I would say, if you put Claude in an android body with voice recognition and TTS, people in 1991 would think they are interacting with a sentinent machine from outer space.

username223: > a deeply concerning culture of anthropomorphization and magical thinking.That’s the reverse Turing test. A human that can’t tell that it’s talking to a machine.

dakolli: If this is as dangerous as they make it out (its not), why would their first impulse be to get every critical products/system/corporation in the world to implement its usage?

zb3: > On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness.Yeah, makes sense. Those countries are bad because they execute state-sponsored cyber attacks, the US and Israel on the other hand are good, they only execute state-sponsored defense.

underdeserver: Interesting also is what they didn't find, e.g. a Linux network stack remote code execution vulnerability. I wonder if Mythos is good enough that there really isn't one.

dakolli: I guess we can throw out the idea that AGI is going to be democratized. In this case a sufficiently powerful model has been built and the first thing they do is only give AWS, Microsoft, Oracle ect ect access.If AGI is going to be a thing its only going to be a thing, its only going to be a thing for fortune 100 companies..However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.

supern0va: >However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.Evaluate it yourself. Look at the exploits it discovered and decide whether you want to feel concerned that a new model was able to do that. The data is right there.

endunless: > If a bunch of CVEs do in fact get published a couple months (or whatever) from now, are you going to retract this take?I would like to think that I would, yes.What it comes down to, for me, is that lately I have been finding that when Anthropic publishes something like this article – another recent example is the AI and emotions one – if I ask the question, does this make their product look exceptionally good, especially to a casual observer just scanning the headlines or the summary, the answer is usually yes.This feels especially true if the article tries to downplay that fact (they’re not _real_ emotions!) or is overall neutral to negative about AI in general, like this Glasswing one (AI can be a security threat!).

torginus: I think most vulnerabilities are in crappy enterprise software. TOCTOU stuff in the crappy microservice cloud app handling patient records at your hospital, shitty auth at a webshop, that sort of stuff.A lot of these stuff is vulnerable by design - customer wanted a feature, but engineering couldnt make it work securely with the current architecture - so they opened a tiny hole here and there, hopefully nobody will notice it, and everyone went home when the clock struck 5.I'm sure most of us know about these kinds of vulnerabilities (and the culture that produces them).Before LLMs, people needed to invest time and effort into hacking these. But now, you can just build an automated vuln scanner and scan half the internet provided you have enough compute.I think there will be major SHTF situations coming from this.

timschmidt: Yeah. Crufty cobbled together enterprise stuff will suffer some of the worst. But this will be a great opportunity for the enterprise software services economy! lol.I honestly see some sort of automated whole codebase auditing and refactoring being the next big milestone along the chatbot -> claude code / codex / aider -> multi-agent frameworks line of development. If one of the big AI corps cracks that problem then all this goes away with the click of a button and exchange of some silver.

Project Glasswing: Securing critical software for the AI era \ Anthropic

More from anthropic.com

Claude Mythos Preview \ red.anthropic.com

Claude Mythos Preview System Card

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of…

Emotion concepts and their function in a large language model \ Anthropic

Discover

The Image Boards of Hayao Miyazaki

For The Love of Internet | Ankshilp

Cambodia unveils a statue of famous landmine-sniffing rat Magawa

Analysis & Commentary | National Restaurant Association

GitHub - mattmireles/gemma-tuner-multimodal: Fine-tune Gemma 4 and 3n with audio…

Trump says 'a whole civilization will die tonight' ahead of deadline f…