Discussion
1M context is now generally available for Opus 4.6 and Sonnet 4.6
8cvor6j844qw_d6: Oh nice, does it mean less game of /compact, /clear, and updating CLAUDE.md with Claude Code?
convenwis: Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?
tyleo: I mentioned this at work but context still rots at the same rate. 90k tokens consumed has just as bad results in 100k context window or 1M.Personally, I’m on a 6M+ like codebase and had no problems with the old window.
wewewedxfgdf: The weirdest thing about Claude pricing is their 5X pricing plan is 5 times the cost of the previous plan.Normally buying the bigger plan gives some sort of discount.At Claude, it's just "5 times more usage 5 times more cost, there you go".
johnwheeler: This is incredible. I just blew through $200 last night in a few hours on 1M context. This is like the best news I've heard all year in regards to my business.
minimaxir: Claude Code 2.1.75 now no longer delineates between base Opus and 1M Opus: it's the same model. Oddly, I have Pro where the change supposedly only for Max+ but am still seeing this to be case.EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.
auggierose: No change for Pro, just checked it, the 1M context is still extra usage.
aliljet: Are there evals showing how this improves outputs?
dimitri-vs: The big change here is:> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.For Claude Code users this is huge - assuming coherence remains strong past 200k tok.
MikeNotThePope: Is it ever useful to have a context window that full? I try to keep usage under 40%, or about 80k token to avoid the dumb zone.No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ
hagen8: Did u use the API or subscription?
margorczynski: What about response coherence with longer context? Usually in other models with such big windows I see the quality to rapidly drop as it gets past a certain point.
vicchenai: The no-degradation-at-scale claim is the interesting part. Context rot has been the main thing limiting how useful long context actually is in practice — curious to see what independent evals show on retrieval consistency across the full 1M window.
gaigalas: I'm getting close to my goal of fitting an entire bootstrappable-from-source system source code as context and just telling Claude "go ahead, make it better".
zmmmmm: Noticed this just now - all of a sudden i have 1M context window (!!!) without changing anything. It's actually slightly disturbing because this IS a behavior change. Don't get me wrong, I like having longer context but we really need to pin down behaviour for how things are deployed.
phist_mcgee: Anthropic is famous for changing things under your feet. Claude code is basically alpha software with a global footprint.
islewis: The quality with the 1M window has been very poor for me, specifically for coding tasks. It constantly forgets stuff that has happened in the existing conversation. n=1, ymmv
comboy: Quality noticeably declined for me during the last few days, if that was the rollout that explains it but how do I get my claude back.
hagen8: Well, the question is what is contributing to the usage. Because as the context grows, the amount of input tokens are increasing. A model call with 800K token as input is 8 times more expensive than a model call with 100K tokens as input. Especially if we resume a conversation and caching does not hit, it would be very expensive with API pricing.
fnordpiglet: I’ve been using 1M for a while and it defers it and makes it worse almost when it happens. Compacting a context that big loses a ton of fidelity. But I’ve taken to just editing the context instead (double esc). I also am planning to build an agent to slice the session logs up into contextually useful and useless discarding the useless and keeping things high fidelity that way. (I.e., carve up with a script the jsonl and have subagent haiku return the relevant parts and reconstructing the jsonl)
dominotw: til you can edit context. i keep a running log and /clear /reload log
ogig: When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.
dominotw: rarely go over 25 percent in codex but i hit 80 on claude code in just a short time.
dominotw: can someone tell me how to make this instruction work in claude code"put high level description of the change you are making in log.md after every change"works perfectly in codex but i just cant get calude to do it automatically. I always have to ask "did you update the log".
FartyMcFarter: Isn't transformer attention quadratic in complexity in terms of context size? In order to achieve 1M token context don't these models have to be employing a lot of shortcuts?I'm not an expert but maybe this explains context rot.
pixelpoet: Compared to yesterday my Claude Max subscription burns usage like absolutely crazy (13% of weekly usage from fresh reset today with just a handful prompts on two new C++ projects, no deps) and has become unbearably slow (as in 1hr for a prompt response). GGWP Anthropic, it was great while it lasted but this isn't worth the hundreds of dollars.
Spooky23: Yeah, morning eastern time Claude is brutal.
operatingthetan: I think they are both subsidized so both are a great deal.
MikeNotThePope: I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.
auggierose: It is not the plan they want you to buy. It is a pricing strategy to get you to buy the 20x plan.
radley: 5x Max is the plan I use because the Pro plan limits out so quickly. I don't use Claude full-time, but I do need Claude Code, and I do prefer to use Opus for everything because it's focused and less chatty.
auggierose: Sure, I get it. For me a 2x Max would be ideal and usually enough. Now, guess why they are not offering that?
apetresc: Those sorts of volume discounts are what you do when you're trying to incentivize more consumption. Anthropic already has more demand then they're logistically able to serve, at the moment (look at their uptime chart, it's barely even 1 9 of reliability). For them, 1 user consuming 5 units of compute is less attractive than 5 users consuming 1 unit.They would probably implement _diminishing_-value pricing if pure pricing efficiency was their only concern.
SequoiaHope: Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.
boredtofears: All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)
dimitri-vs: It's kind of like having a 16 gallon gas tank in your car versus a 4 gallon tank. You don't need the bigger one the majority of the time, but the range anxiety that comes with the smaller one and annoyance when you DO need it is very real.
a_e_k: I've been using the 1M window at work through our enterprise plan as I'm beginning to adopt AI in my development workflow (via Cline). It seems to have been holding up pretty well until about 700k+. Sometimes it would continue to do okay past that, sometimes it started getting a bit dumb around there.(Note that I'm using it in more of a hands-on pair-programming mode, and not in a fully-automated vibecoding mode.)
apetresc: Improves outputs relative to what? Compared to previous contexts of 1M, it improves outputs by allowing them to exist (because previously you couldn't exceed 200K). Compared to contexts of <200K, it degrades outputs rather than improves them, but that's what you'd expect from longer contexts. It's still better than compaction, which was previously the alternative.
apetresc: I don't think they're claiming "no degradation at scale", are they? They still report a 91.9->78.3 drop. That's just a better drop than everyone else (is the claim).
chatmasta: So a picture is worth 1,666 words?
SkyPuncher: Yes. I've recently become a convert.For me, it's less about being able to look back -800k tokens. It's about being able to flow a conversation for a lot longer without forcing compaction. Generally, I really only need the most recent ~50k tokens, but having the old context sitting around is helpful.
vessenes: This is super exciting. I've been poking at it today, and it definitely changes my workflow -- I feel like a full three or four hour parallel coding session with subagents is now generally fitting into a single master session.The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?
mattfrommars: Random: are you personally paying for Claude Code or is it paid by you employer?My employer only pays for GitHub copilot extension
ashdksnndck: My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.
chaboud: Awesome.... With Sonnet 4.5, I had Cline soft trigger compaction at 400k (it wandered off into the weeds at 500k). But the stability of the 4.6 models is notable. I still think it pays to structure systems to be comprehensible in smaller contexts (smaller files, concise plans), but this is great.(And, yeah, I'm all Claude Code these days...)
steve-atx-7600: It seems possible, say a year or two from now that context is more like a smart human with a “small”, vs “medium” vs “large” working memory. The small fellow would be able to play some popular songs on the piano , the medium one plays in an orchestra professionally and the x-large is like Wagner composing Der Ring marathon opera. This is my current, admittedly not well informed mental model anyway. Well, at least we know we’ve got a little more time before the singularity :)
chrisweekly: weary (tired) -> wary (cautious)
grafmax: A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.
scwoodal: Except after 4 gallons it might as well be pure oil, mucking everything up.
saaaaaam: Wary, not weary. Wary: cautious. Weary: tired.
prettyblocks: I imagine you can do this with a hook that fires every time claude stops responding:https://code.claude.com/docs/en/hooks-guide
saaaaaam: That video is bizarre. Such a heavy breather.
arjie: This is fantastic. I keep having to save to memory with instructions and then tell it to restore to get anywhere on long running tasks.
steve-atx-7600: You can pin to specific models with —-model. Check out their doc. See https://support.claude.com/en/articles/11940350-claude-code-.... You can also pin to a less specific tag like sonnet-4.5[1m] (that’s from memory might be a little off).
hombre_fatal: Also, when you hit compaction at 200k tokens, that was probably when things were just getting good. The plan was in its final stage. The context had the hard-fought nuances discovered in the final moment. Or the agent just discovered some tiny important details after a crazy 100k token deep dive or flailing death cycle.Now you have to compact and you don’t know what will survive. And the built-in UI doesn’t give you good tools like deleting old messages to free up space.I’ll appreciate the 1M token breathing room.
tudelo: I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.
steve-atx-7600: Did it get better? I used sonnet 4.5 1m frequently and my impression was that it was around the same performance but a hell of a lot faster since the 1m model was willing to spends more tokens at each step vs preferring more token-cautious tool calls.
vlovich123: Nope, there’s no tricks unless there’s been major architectural shifts I missed. The rot doesn’t come from inference tricks to try to bring down quadratic complexity of the KV cache. Task performance problems are generally a training problem - the longer and larger the data set, the fewer examples you have to train on it. So how do you train the model to behave well - that’s where the tricks are. I believe most of it relies on synthetically generated data if I’m not mistaken, which explains the rot.
steve-atx-7600: Backup your config and ask Claude. I’ve done this for all kinds of things like mcp and agent config.
johnwheeler: Max subscription and "extra usage" billing
steve-atx-7600: That sounds high. I mean, if you paid for the 20x max plan you’d be capped at around 200/month and at least for me as a professional engineer running a few Claude’s in parallel all day, I haven’t exceeded the plans limits.
ricksunny: Since I'm yet to seriously dive into vibe coding or AI-assisted coding, does the IDE experience offer tracking a tally of the context size? (So you know when you're getting close or entering the "dumb zone")?
quux: OpenCode does this. Not sure about other tools
furyofantares: [delayed]
stevula: Most tools do, yes.
aragonite: Do long sessions also burn through token budgets much faster?If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?
roygbiv2: I've found compactation kills the whole thing. Important debug steps completely missing and the AI loops back round thinking it's found a solution when we've already done that step.
thunkle: Just have to ask. Will I be spending way more money since my context window is getting so much bigger?
celestialcheese: Both. Employer pays for work max 20x, i pay for a personal 10x for my side projects and personal stuff.
jasondclinton: If you use context cacheing, it saves quite a lot on the costs/budgets. You can cache 900k tokens if you want.
dathery: That's correct. Input caching helps, but even then at e.g. 800k tokens with all of them cached, the API price is $0.50 * 0.8 = $0.40 per request, which adds up really fast. A "request" can be e.g. a single tool call response, so you can easily end up making many $0.40 requests per minute.
nujabe: > Since I'm yet to seriously dive into vibe coding or AI-assisted codingUnless you’re using a text editor as an IDE you probably have already
Wowfunhappy: Prior to this announcement, all 1M context use consumed "extra usage", it wasn't included in a normal subscription plan.
twodave: It’s more like the size of the desk the AI has to put sheets of paper on as a reference while it builds a Lego set. More desk area/context size = able to see more reference material = can do more steps in one go. I’ve lately been building checklists and having the LLM complete and check off a few tasks at a time, compacting in-between. With a large enough context I could just point it at a PLAN.md and tell it to go to work.
esperent: [delayed]
alienbaby: is this the market played in front of our eyes slice by slice: ok, maybe not, but watching these entities duke it out is kinda amusing? There will be consequences but may as well sit it out for the ride, who knows where we are going?
garciasn: For me, Claude was like that until about 2m ago. Now it rarely gets dumb after compaction like it did before.
mgambati: 1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.
twodave: I mean, try using copilot on any substantial back-end codebase and watch it eat 90+% just building a plan/checklist. Of course copilot is constrained to 120k I believe? So having 10x that will blow open up some doors that have been closed for me in my work so far.That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.
iknowstuff: Hmm I’ve felt the dumb zone on codex
vessenes: Opus 4.6 is wayy better than sonnet 4.5 for sure.
not_kurt_godel: Just curious, what kind of work are you doing where agentic workflows are consistently able to make notable progress semi-autonomously in parallel? Hearing people are doing this, supposedly productively/successfully, kind of blows my mind given my near-daily in-depth LLM usage on complex codebases spanning the full stack from backend to frontend. It's rare for me to have a conversation where the LLM (usually Opus 4.6 these days) lasts 30 minutes without losing the plot. And when it does last that long, I usually become the bottleneck in terms of having to think about design/product/engineering decisions; having more agents wouldn't be helpful even if they all functioned perfectly.
hu3: Source? I ask because I use 500k+ context on these on a daily basis.Big refactorings guided by automated tests eat context window for breakfast.
nemo44x: Has anyone started a project to replace Linux yet?