Discussion
Claude Opus 4.7
nathanielherman: Claude Code hasn't updated yet it seems, but I was able to test it using `claude --model claude-opus-4-7`
yanis_t: > where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.interesting
u_sama: Excited to use 1 prompt and have my whole 5-hour window at 100%. They can keep releasing new ones but if they don't solve their whole token shrinkage and gaslighting it is not gonna be interesting to se.
lbreakjai: Solve? You solve a problem, not something you introduced on purpose.
rvz: Introducing a new slot machine named "Claude Opus" in the Anthropic casino.You are in for a treat this time: It is the same price as the last one [0] (if you are using the API.)But it is slightly less capable than the other slot machine named 'Mythos' the one which everyone wants to play around with. [1][0] https://claude.com/pricing#api[1] https://www.anthropic.com/news/claude-opus-4-7
dbbk: If you're building a standard app Opus is already good enough to build anything you want. I don't even know what you'd really need Mythos for.
hackerInnen: I just subscribed this month again because I wanted to have some fun with my projects.Tried out opus 4.6 a bit and it is really really bad. Why do people say it's so good? It cannot come up with any half-decent vhdl. No matter the prompt. I'm very disappointed. I was told it's a good model
constantius: Not related to this release, but is anyone aware of what's happening with Deepseek? The usual cascade of synced releases has been lacking this frontier lab whale for a while now.
cupofjoakim: > Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type.caveman[0] is becoming more relevant by the day. I already enjoy reading its output more than vanilla so suits me well.[0] https://github.com/JuliusBrussee/caveman/tree/main
buildbot: Too late, personally after how bad 4.6 was the past week I was pushed to codex, which seems to mostly work at the same level from day to day. Just last night I was trying to get 4.6 to lookup how to do some simple tensor parallel work, and the agent used 0 web fetches and just hallucinated 17K very wrong tokens. Then the main agent decided to pretend to implement tp, and just copied the entire model to each node...
alvis: I don't have much quality drop from 4.6. But I also notice that I use codex more often these days than claude code
johntopia: is this just mythos flex?
oliver236: someone tell me if i should be happy
nickmonad: Did you try asking the model?
skerit: ~~That just changes it to Opus 4, not Opus 4.7~~My statusline showed _Opus 4_, but it did indeed accept this line.I did change it to `/model claude-opus-4-7[1m]`, because it would pick the non-1M context model instead.
nathanielherman: Oh good call
OtomotO: Another supply chain attack waiting?Have you tried just adding an instruction to be terse?Don't get me wrong, I've tried out caveman as well, but these days I am wondering whether something as popular will be hijacked.
jmsdnns: How is it too late if you were using Claude yesterday? Just try it.
mesmertech: Not showing up in claude code by default on the latest version. Apparently this is how to set it:/model claude-opus-4.7Coming from anthropic's support page, so hopefully they did't hallucinate the docs, cause the model name on claude code says:/model claude-opus-4-7 ⎿ Set model to Opus 4what model are you?I'm Claude Opus 4 (model ID: claude-opus-4-7).
klipitkas: It does not work, it says Claude Opus 4 not 4.7
rvz: > Not related to this release, but is anyone aware of what's happening with Deepseek?Given that no-one is talking about DeepSeek, I assume it is coming this month.They are still releasing research papers and that is what really matters and not the .1 increment releases of AI models to massage benchmarks or create hype around.
cmrdporcupine: There's been months of "DeepSeek v4 next week!" rumours and none have panned out.They're either stuck/dead or they're sitting on something really fantastic that they only want to release once they've perfected it.My realistic side thinks the former, my optimism on the latter.In the meantime, GLM 5.1 is actually really good.
benleejamin: For anyone who was wondering about Mythos release plans:> What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models.
jampa: Mythos release feels like Silicon Valley "don't take revenue" advice:https://www.youtube.com/watch?v=BzAdXyPYKQo""If you show the model, people will ask 'HOW BETTER?' and it will never be enough. The model that was the AGI is suddenly the +5% bench dog. But if you have NO model, you can say you're worried about safety! You're a potential pure play... It's not about how much you research, it's about how much you're WORTH. And who is worth the most? Companies that don't release their models!"
mchinen: Does it run for you? I can select it this way but it says 'There's an issue with the selected model (claude-opus-4-7). It may not exist or you may not have access to it. Run /model to pick a different model.'
nathanielherman: Weird, yeah it works for me
aliljet: Have they effectively communicated what a 20x or 10x Claude subscription actually means? And with Claude 4.7 increasing usage by 1.35x does that mean a 20x plan is now really a 13x plan (no token increase on the subscription) or a 27x plan (more tokens given to compensate for more computer cost) relative to Claude Opus 4.6?
queuep: Before opus released we also saw huge backlash with it being dumber.Perhaps they need the compute for the training
recursivegirth: Consumerism... if it ain't the best, some people don't want it.
Barbing: Time/frustrationIf it’s all slop, the smallest waste of time comes from the best thing on the market
not_ai: Oh look it was too powerful to release, now it’s just a matter of safeguards.This story sounds a lot like GPT2.
tabbott: The original blog post for Mythos did lay out this safeguard testing strategy as part of their plan.
aurareturn: Funny because many people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered.But now it seems like it's a major strategic advantage. They're 2x'ing usage limits on Codex plans to steal CC customers and it seems to be working.It seems like 90% of Claude's recent problems are strictly lack of compute related.
energy123: Is that 2x still going on I thought that ended in early April
lawgimenez: It’s for Pro users only, I think the 2x is up to May 31.
jameson: How should one compare benchmark results?For example, SWE-bench Pro improved ~11% compared with Opus 4.6. Should one interpret it as 4.7 is able to solve more difficult problems? or 11% less hallucinations?
grandinquistor: Quite a big improvement in coding benchmarks, doesn’t seem like progress is plateauing as some people predicted.
bsaul: i tried to find an API pricing for GLM 5.1 but couldn't find any on the homepage. How are you using it ?
cmrdporcupine: per-token via DeepInfra, who hosts it as one of their models.https://deepinfra.com/zai-org/GLM-5.1