Discussion
Search code, repositories, users, issues, pull requests...
wg0: Been experiencing similar issues even with the lower tier models.Fair transactions involve fair and transparent measurements of goods exchanged. I'm going to cancel my subscription this month.
eastbound: Yes: Claude Code “consumes tokens” and starts a session when the computer is asleep without anything started. Or consumes 10% of my session for “What time is it?”
jedisct1: GPT-5.4 works amazingly well.I’ve moved away from Claude and toward open-source models plus a ChatGPT subscription.That setup has worked really well for me: the subscription is generous, the API is flexible, and it fits nicely into my workflow. GPT-5.4 + Swival (https://swival.dev) are now my daily drivers.
turblety: Yeah it's much better, another plus is you can use it with OpenCode (or other 3rd party tools) so you can easily switch between Codex and most other models by alright companies (not Anthropic or Google).
spiderfarmer: That’s why I switched to Codex. It’s so much more generous and in my experience, just as good. Also, optimizing your setup for working with agents can easily make a 5x difference.
laksjhdlka: > Also, optimizing your setup for working with agents can easily make a 5x difference.Any highlights you can share here? I'm always looking to improve me setup.
quotemstr: Plus, whenever Codex does something you dislike, you can just tell Codex to fix itself. Open source software is wonderful.Especially when it's on purpose.
mannanj: so basically the anthropic employee who responded says those 1h caches were writes were almost never accessed, so a silent 5m cache change is for our best interest and saves cost. (justifying why they did this silently)however his response gaslights us because in the OPs opening post his math demonstrates this is not true, it shows reads 26x more so at least in his case the cache is not doing what the anthropic employee describes.clearly we are being charged for less optimization here and being given the message (from my perspective by anthropic) that if you are in a special situation your needs don't matter and we will close your thread without really listening.
postalcoder: Honestly, I don't think anything changed. I used to use Claude Code max as my daily driver several months ago (for about 5 months) and this sort of drama was par for the course. It's why I migrated entirely to Codex, despite liking Claude, the harness, more.There's this honeymoon period with Claude you experience for a month or two followed by a trough of disillusionment, and then a rebound after a model update (rinse and repeat). It doesn't help that Anthropic is experiencing a vicious compute famine atm.
cmaster11: For whoever else is having the same problems, worth voting these kind of issues. There needs to be more transparency over what goes on with our subscriptions.
TacticalCoder: We vote here on HN and it's much more effective. Anyone from Anthropic reading conversations on HN like this one can be scared. We'll jump ship if they don't address such glaring issues.
vidarh: I hit the limits on the lower tiers of Codex just as fast as with Claude. At the moment I'm cycling between Claude, Codex, GLM5.1, and Kimi. The latter two are getting good enough, though, that I can make things go really far by doing planning with Opus and then switching to one of the cheap models for execution.
MeetingsBrowser: I pay for the lowest plan. I used to struggle to hit my quota.Now a single question consistently uses around 15% of my quota
szmarczak: I'm on the Free tier using Claude exclusively for consultation (send third party codebase + ask why/where is something done). I also used to struggle to hit limits. Recently I was able hit the limit after a single prompt.
Achshar: I feel like I am living in a bubble, no one seems to mention Antigravity in these discussions and I have not had any issues with Ultra subscription yet. It seems to go on forever and the Interface is so much better for dev work as compared to CC. (Though admittedly my experience with cc is limited).
hyperionultra: Vote with wallet. The voting continues until product improve or die.
Nic0: I'm i alone to think that it become slower that usual to get responses?
kif: Nope. It has become much much slower for me as well. It’s weird cause at times I will get a response very quickly, like it used to be. But most of the time I have to wait quite a bit for the simplest tasks.
hirako2000: My take is that was the plan all along.Once people won't be able to think anymore and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
Cpoll: Didn't they move too soon then? People haven't forgotten how to tie their shoelaces (yet). And anyway, they'll just move to a different model; last holdout wins.
rdevilla: Bubble's bursting, get in.
ozim: I have the opposite conclusion.Demand is higher than supply it is just the start of bubble.Everyone and their dog is burning tokens on stupid shit that would be freed up if they would ask to make deterministic code for the task and run the task. OpenAI, Anthropic are cutting free use and decreasing limits because they are not able to meet the demand.When general public catches up with how to really use it and demand will fall and the today built supply will become oversupply that’s where the bubble will burst.I say 5 more years.
rvz: Why so many 'developers' complaining about Claude rate limiting them? You know you can actually....use local LLMs? instead of donating your money to Anthropic's casino?I guess this is fitting when the person who submitted the issue is in "AI | Crypto".Well there's no crying at the casino when, you exhaust your usage or token limit.The house (Anthropic) always wins.
qwertyforce: thats exaclty why i prefer codex
gessha: I’m processing some images(custom board game images -> JSON) with a common layout and basic structure and I exhausted my quota after just 30 images(pleb Pro account). I have 700 images to process…What I did instead is tune the prompt for gemma 4 26b and a 3090. Worked like a charm. Sometimes you have to run the main prompt and then a refinement prompt or split the processing into cases but it’s doable.Now I’m waiting for anyone to put up some competition against NVIDIA so I can finally be able to afford a workstation GPU for a price less than a new kidney.
BoredPositron: The two comments together sound like 2000s infomercial.
hdndjsbbs: "enshittification" gets thrown around a lot, but this is the exact playbook. Look at the previous bubble's cash cow: advertising.Online advertising is now ubiquitous, terrible, and mandatory for anyone who wants to do e-commerce. You can't run a mass-market online business without buying Adwords, Instagram Ads, etc.AI will be ubiquitous, and then it will get worse and more expensive. But we will be unable to return to the prior status quo.
chandureddyvari: Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best
zozbot234: > It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.Give it a custom sandbox and context for the work, so it has no opportunity to roam around when not required. AI agentic coding is hugely wasteful of context and tokens in general (compared to generic chat, which is how most people use AI), there's a whole lot of scope for improvement there.
imglorp: The sandbox is fine, but if the parent has given explicit instruction of files to inspect, why is it not centering there? Is the recent breakage that the base prompt makes it always try to explore for more context even if you try to focus it?
zozbot234: Because the "explicit instruction" you give AI is not deterministic as in a normal computer program. It's a complete black box and its context is also most likely polluted by all sorts of weird stuff. Putting it on as tight of a leash as possible should be seen as normal.
meetingthrower: I don't get it. Last week on the 100 bucks plan I generated probably 50k LOC (not a quality measure for sure!) and just barely kissed the weekly limit. I did get rate limited on some sessions for sure, but that's to be expected.I'm curious what are people doing that is consuming your limits? I can't imagine filling the $200 a month plan unless I was essentially using Claude code itself as the api to mass process stuff? For basic coding what are people doing?
freedomben: What does it look like when you get rate limited? Does the instance just kind of sit and spin?I suspect I was getting rate limited very aggressively on Thursday last week. It honestly infuriated me, because I'm paying $200 a month for this thing. If it's going to rate limit me, at least tell me what it's doing instead of just making it seem like it's taking 12 hours to run through something that I would expect to be 15 minutes. The worst part is that it never even finished it.
sailingcode: I had Max plan and never reached its limit despite constantly working. Now I use the Pro plan and regularly reach the 5h limit as well as the weekly limit, as expected. I found that it makes a huge difference if you provide clear context when developing code. If you leave open room for interpretation, Claude Code uses tokens up much faster than in a defined context. The same is true for his time to answer getting longer if there isn't much documentation about the project.
10keane: this same pattern seems to occur every time a new model is about to release. i didnt notice the usage problem - i am on 20x. but opus 4.6 feels siginificantly dumber for some reason. i cant qualitify it, but it failed on everyday tasks where it used to complete perfectly
peterpanhead: Every time there is a new model coming I think they deteriorate the current. This happens every darn time. Opus 4.6 isn't as sharp, not even close to as it was few weeks ago.
cedws: I had a weird experience at work last week where Claude was just thinking forever about tasks and not actually doing anything. It was unusable. The next day it was fine again.
geeky4qwerty: I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.For those not in the Google Gemini/Antigravity sphere, over the last month or so that community has been experiencing nothing short of contempt from Google when attempting to address an apparent bait and switch on quota expectations for their pro and ultra customers (myself included). [1]While I continue to pay for my Google Pro subscription, probably out of some Stockholm Syndrome, beaten wife level loyalty and false hope that it is just a bug and not Google being Google and self-immolating a good product, I have since moved to Kiro for my IDE and Codex for my CLI and am as happy as clam with this new setup.[1] https://github.com/google-gemini/gemini-cli/issues/24937
1970-01-01: Lights on = Ads in your output. EOY latest; they can't keep kicking the massive costs down the road.
hirako2000: Too abruptly for sure.
peterpanhead: I don't understand Anthropic. Be consistent. Why do models deteriorate to shit, this is not good for workflows and or trust. What Opus 4.7 is gonna come out and again the same thing? Come on.
comandillos: Quite scared by the fact that the original issue pointing out the actual root cause of the issue has been 'Closed as not planned' by Anthropic.https://github.com/anthropics/claude-code/issues/46829
hrimfaxi: The response doesn't even make sense and appears to be written by AI.> The March 6 change makes Claude Code cheaper, not more expensive. 1h TTL for every request could cost more, not lessFeels very AI. > Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.They won't show a toggle because it will increase costs for some unknown percentage of requests?
stingraycharles: Sounds like a decision I would make when memory is expensive and you want to get rid of the very long (in time) tail of waiting 1h to evict cache when a session has stopped.There must be a better way to do this. The consumer option is the pricing difference. If they’d make cache writes the same price as regular writes, that would solve the whole problem. If you really want to push it, use that pricing only for requests where number of cache hits > 0 (to avoid people setting this flag without intent to use it), and you solved the whole issue.
zozbot234: Memory is expensive? If reads are as rare as they claim you can just stash the KV-cache on spinning disk.
sdevonoes: Why scared? Like, if theit software gets bad, we stop using it.
comandillos: Maybe scared wasn't the best word... but we cannot deny Opus is a great - if not greatest - model at coding and Anthropic is the only one serving it a reasonable prices when going through their subscription model.
sdevonoes: Sounds like an addiction to me
mixermachine: I'm using the Codex Business subscription (about 30€) already for multiple months. Even there they cut back on the quota. A few months back it was hard for me to reach the limit. Now it is easier.Still, in comparison with Claude Code, the quota of Codex is a much better deal. However, they should not make it worse...
wheelerwj: I have the exact opposite experience. I can run claude forever, my codex quota was done by Wednesday morning.
pxc: It's a bit shocking to me how opaque the pricing for the subscription services by the frontier labs are. It's basically impossible for people to tell what they're actually buying, and difficult to even meaningfully report or compare experiences.How is this normal?
parasti: Yeah, I cancelled the moment I realized that the subscription is a scheme to get you to constantly dip into extra usage. I get more benefit out of Claude on the free tier than on Pro.
faangguyindia: Ultimately we'll find more efficient techniques and hardware and AI companies will end up owning Nuclear Power Stations and continue providing models capable of 10x of what they are now.Valuation have already reached point where these companies can run their nuclear power station, fund developement of new hardware and techniques and boost capabilities of their models by 10x
gavinray: Codex is the only CLI I've had purely positive experiences with. Take that for what you will
brunooliv: People need to understand a few things: vague questions make the models roam endlessly “exploring” dead ends. “Restarting” old chats immediately eats a lot of context. Anthropic CAN change their limits and rates as they see fit, there’s never been hard promises or SLOs on these plans.With that said, I pay the Pro subscription (20/mo) and I hit limits maybe 2/3 times over a period of 4 months building a simple running app in Python. I’d not call it production ready but it’s not nothing either.If people were considerably more willing to aggressively prune their context and scope tasks well, they could get a lot more done with it, at least in my experience. Anthropic can’t really fix anything because the underlying model architecture can’t be “patched”. But I definitely feel a lot of people can’t wrap their heads around the new paradigms needed to effectively prompt these models.Additionally, opting out is always an option… but these types of issues feel more like laziness than real, structural issues with the model/harness…
fooster: Where is your evidence of this "massive cost"? Inference is massively profitable for both anthropic and openai. Training is not.
wesammikhail: source?
onlyrealcuzzo: > Claude has gotten noticeably worse for me too.My experience is limited only to CC, Gemini-cli, and Codex - not Aider yet, trying different combinations of different models.But, from my experience, CC puts everything else to shame.How does Cursor compare? Has anyone found an Aider combination that works as well?
chrismustcode: Is aider even a thing considered anymore?It was pretty much first for CLI agents and had a benchmark that was the go to at the start of LLM coding. Now the benchmark doesn't get updated and aider never gets a mention in talking about CLI tools till now.
faangguyindia: Aider is dead because it's pre function calling era of tech
throwaway2027: They rolled out 1M context then they start doing this shit? I know Pro doesn't have access to the 1M context but what a joke.
bob1029: I've got a dual path system to keep costs low and avoid TOS violations.For general queries and investigation I will use whatever public/free model is available without being logged in. Not having a bunch of prior state stacked up all the time is a feature for me. This is essentially my google replacement.For very specific technical work against code files, I use prepaid OAI tokens in VS copilot as a "custom" model (it's just gpt5.4).I burn through maybe $30 worth of tokens per month with this approach. A big advantage of prepaying for the API tokens is that I can look at everything copilot is doing in my usage logs. If I use the precanned coding agent products, the prompts are all hidden in another layer of black box.
scrollop: There are MANY accounts of claude degradation (intelligence, limits) over the past week on reddit and here with many posts describing people moving. Nothing is changing. You'd think they'd at least give a statement.
stavros: Codex has been better for me, but it's WAY too nitpicky/defensive. It always wants to make changes that add complexity and code to solve a problem that's impossible to happen (e.g. a multiprocess race condition on a daemon I only ever run one instance of).
npn: [delayed]
hirako2000: The odds of that happening are high. Trillions invested.It occurred to me an outright rejections of these tools is brewing but can't quite materialise yet.
niklasd: We also experienced hitting our Claude limits much earlier than before during the last two weeks. Up to a degree where we were thinking it must be a bug.
cjonas: Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)
whywhywhywhy: Cursor one is back to Claude 4 or 3.5+ at best. Struggles to do things it did effortlessly a few weeks ago.It’s not under load either it’s just fully downgraded. Feels more they’re dialing in what they can get away with but are pushing it very far.
rzkyif: Fellow annoyed Google AI Pro subscriber here!Can confirm, I initially enjoyed the 5-hour limits on Gemini CLI and Antigravity so much that I paid for a full year, thinking it was a great decisionIn the following months, they significantly cut the 5-hour limits (not sure if it even exists anymore), introduced the unrealistically bad weekly limit that I can fully consume in 1-2 hour, introduced the monthly AI credits system, and added ads to upgrade to Ultra everywhereAt the very least the Gemini mobile app / web app is still kinda useful for project planning and day-to-day use I guess. They also bumped the storage from 2TB to 5TB, but I don't even use that
stavros: It should be illegal to change the terms of the subscription mid-period. If you paid for the full year, you should get that plan for the whole year. I don't understand how it's ok for corporations to just change the terms mid-way, and we just have to accept it.
palata: > We may very well look back on the last couple years as the golden era of subsidized GenAI compute.Looks like enshittification on steroids, honestly.
rzkyif: My personal experience is way different: I struggle to burn through more than 50% of the 5 hour limitFor context, with Google AI Pro, I can burn through the Antigravity weekly limit in 1-2 hours if I force it to use Gemini 3.1 Pro. Meanwhile Gemini 3 Flash is basically unlimited but frequently produces buggy code or fail to implement things how I personally would (felt like it doesn't "think" like a software dev)I also tried VS Code + Cline + OpenRouter + MiniMax M2.7. It's quite cheap and seems to be better than Gemini 3 Flash, but it gets really pricy as the context fills up because prompt caching is not supported for MiniMax on OpenRouter. The result itself usually needs 3-6 revisions on average so the context fills up pretty oftenEventually I got Claude Max 5x to try for a month. VS Code + Claude Code extension on a ~15k lines codebase, model set to "Default", and effort set to "Max". So far it's been really good: 0-2 revisions on average, and most of the time it implements things exactly how I would or better. And, like I said, I can only consume 40-60% of the 5-hour limits no matter how hard I tryGranted, I'm not forcing it to use Opus like OP (nor do I use complicated skills or launch multiple tasks at the same time), but I feel like they really nailed the right balance of when to use which model and how to pass context between the them. Or at least enough that I haven't felt the need to force it to use Opus all the time
rzkyif: Reading the other negative comments makes me wonder if this is only because I'm getting a hidden newcomer's limit bonus or something though hahah
dr_dshiv: "Hey Claude, can you help me create a strategy to optimize my token use so I don't run into limits so often?" --> worked for me! I had two $200 plans before and now I am cool despite all day use
SkyPuncher: And it’s working larger because the other models haven’t figured out how to provide a consistent, long running experience.
stavros: > Anthropic CAN change their limits and rates as they see fit, there’s never been hard promises or SLOs on these plans.No they can't. When I buy an annual subscription and prepay for the year, they can't just go "ok now you get one token a month" a day in. I bought the plan as I bought it. They can't change anything until the next renewal.
sunaookami: Set MAX_THINKING_TOKENS to 0, Claude's thinking hardly does anything and just wastes tokens. It actually often performs worse than without thinking.
gruez: Not the guy you're responding to, but when this happens the token counter is frozen at some low value (eg. 1k-10k) value as well, so it's not thinking in circles but rather not thinking (or doing anything, for that matter) at all.
egeozcan: This exact thing is happening to me since yesterday. It comes back to life when I throw the whole session away.
ryandrake: Yea, I found myself maxing out the $20/mo plan occasionally, so I tried the $100/mo, but I don't think I even once even approached the session limit, let alone the weekly limit. And this is doing what I would consider heavy, continuous programming. I probably ought to go back down to $20 one. It would be nice if they had a cheaper tier in between them, but the tiers they have are probably a good business trick to get people to buy much more than they need.
weakfish: This article convinced me otherwise https://www.wheresyoured.at/the-subprime-ai-crisis-is-here/
gruez: >and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.Is that bad? After all, even if they hiked to price infinity, you wouldn't worse off than if AI didn't exist because you could still code by hand. Moreover if it's really in a "business" (employment?) context, the tools should be provided by your employer, not least for compliance/security reasons.
tedivm: Something similar is happening with GitHub Copilot too. It's impossible to know what a "request" is and some change in the last couple of months has seen my request usage go up for the same style of work. Toss in the bizarre and impossible to understand rate limiting that occurs with regular usage and it's pretty obvious that these companies are struggle to scale.
alienbaby: I'm finding the oppostire with copilot. A request is a prompt, with some caveats around whats generating the prompt. I am quite happily working with opus 4.6 at 3x cost and about 1/3 oor the month in I'm stting at ~25% usage of a pro+ subscription. I find it quite easy to track my usage and rate of usage.The overall context windows are smaller with copilot I believe, but it dfoesnt appear to be hurting my work.I'm using it for approx 4 hours a day most days. Generally one shotting fun ideas I thoroughly plan out in planning mode first, and I have my own verison of the idea->plan->analyse-> document implementation phases -> implement via agent loop. simulations, games, stuff-im-curious about and resurrecting old projects that never really got off the ground.
siliconc0w: Switched back to codex for the promotion. Opus at the start of the year was GOAT- just relentless at chewing through hard problems. Now it spins on pretty easy work (took three swings just to edit a ts file) and my session is like 1-3 prompts (downgraded to the $20 plan but still)
wellthisisgreat: $200 plan and VERY tame usage (not 24/7, not every day even, maybe 8-10 hours for ~4 days). Suddenly I am at 96% weekly (!) limit, multiple session limits, two daily limits.Either they decimated the limits internally, or they broke something.Tried all the third-party tricks (headroom, etc.), switched to 200k context window, switched back to 4.5.I hope 4.5 will help, but the rest of the efforts didn’t move the needle much
nickstinemates: It feels so weird to me - people are exhausting their quotas while I am trying very hard to even reach mine with the $200 plan.We're generating all of the code for swamp[1] with AI. We review all of that generated code with AI (this is done with the anthropic API.) Every part of our SDLC is pure AI + compute. Many feature requests every day. Bug fixes, etc.Never hit the quota once. Something weird is definitely going on.1: https://github.com/systeminit/swamp
brookst: My hypothesis is that people who have continuous sessions that keep the cache valid see the behavior you’re describing: at 95% cache hits (or thereabouts), the max plan goes a long way.But people who go > 5 minutes between prompts and see no cache, usage is eaten up quickly. Especially passing in hundreds of thousands of tokens of conversation history.I know my quote goes a lot further when I sit down and keep sessions active, and much less far when I’m distracted and let it sit for 10+ minutes between queries.It’s a guess. But n=1 and possible confirmation bias noted, it’s what I’m seeing.
docheinestages: Anthropic paved the path for agentic coding and their pricing made it possible for masses of people to discover and experiment with this new style of development. Their Claude Code plans subsidized usage of models so much that I'm sure they must've had negative margin for quite some time. But now that they have acquired a substantial user base, it makes sense for them to dial back and become more greedy. These quiet and weird changes to the behavior of Claude in the recent weeks must have been due to both this increased greed and their struggles with scaling.What I wish for right now is for open-weight models and hardware companies (looking at you Apple) to make it possible to run local models with Opus 4.6-level intelligence.
voisin: It is pretty obvious to me that Anthropic wasn’t prepared with sufficient infrastructure to handle the wave of OpenAI/DoD refugees. Now everyone is getting throttled excessively and Claude is essentially unusable beyond chatting. Their big new release of Cowork is even worse than Claude Code for blasting through session limits.I am tired of all the astroturf articles meant to blame the user with “tips” for using fewer tokens. I never had to (still don’t) think of this with Codex, and there has been a massive, obvious decline between Claude 1 month ago and Claude today.
Rekindle8090: I put this in a reply but I'm also posting it as a general comment:Please unsubscribe to these services and see how they perform:"Maybe if I spend more money on the max plan it will be better" > no it will be the same "Maybe if I change my prompt it will work" > no it will be the same "Maybe if I try it via this API instead of that API it will improve" > no it will be the same.Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.Stop paying more, stop buying these pro max plans, hoping it will get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.When competitors have a better product these issues go away When a new model is released these issues don't existI was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.
rnadomvirlabe: I find copilot to be much more straightforward, and I can track per request against my credits. Here is the explanation of what a request is:https://docs.github.com/en/copilot/concepts/billing/copilot-...
tedivm: > A request is any interaction where you ask Copilot to do something for you—whether it's generating code, answering a question, or helping you through an extension. Each time you send a prompt in a chat window or trigger a response from Copilot, you're making a request. For agentic features, only the prompts you send count as premium requests; actions Copilot takes autonomously to complete your task, such as tool calls, do not. For example, using /plan in Copilot CLI counts as one premium request, and any follow-up prompt you send counts as another.This clearly isn't true for agentic mode though. This document is extremely misleading. VSCode has the `chat.agent.maxRequests` option which lets you define how many requests an agent can use before it asks if you want to continue iterating, and the default is not one. A long running session (say, implementing an openspec proposal) can easily eat through dozens of requests. I have a prompt that I use for security scanning and with a single input/request (`/prompt`) it will use anywhere between 17 and 25 premium requests without any user input.
Rekindle8090: The product was performing badly and you thought this would be solved by spending more money on it?When will people realize this is the same as vendor lock-in?"Maybe if I spend more money on the max plan it will be better" > no it will be the same "Maybe if I change my prompt it will work" > no it will be the same "Maybe if I try it via this API instead of that API it will improve" > no it will be the same.Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.The only mistake you made was paying MORE, hoping it would get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.When competitors have a better product these issues go away When a new model is released these issues don't existI was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.
heyitsaamir: It would be really nice to have improved transparency in token usage and throttling imo.
oldnewthing: If this helps, I rolled back to version 2.1.34. Here is the ~/.claude/settings.json blurb I added: "effortLevel": "high", "autoUpdatesChannel": "stable", "minimumVersion": "2.1.34", "env": { "DISABLE_AUTOUPDATER": 1, "CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING": 1 } I also had to:1. Nuke all other versions within /.local/share/claude/versions/ except 2.1.34. 2. Link ~/.local/bin/claude to claude -> ~/.local/share/claude/versions/2.1.34This seems to have fixed my running out of quota issues quickly problems. I have periods of intense use (nights, weekends) and no use (day job). Before these changes, I was running out of quota rather quickly. I am on the same 100$ plan.I am not sure adaptive thinking setting is relevant for this version but in the future that will help once they fix all the quota & cache issues. Seriously thinking about switching to Codex though. Gemini is far behind from what I have tried so far.
oldnewthing: I also have the following in ~/.bashrcexport CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 export MAX_THINKING_TOKENS=31999 export DISABLE_AUTOUPDATER=1 export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1
kibwen: The evidence is that quotas exist, as seen here, and are low enough that people are hitting them regularly. When was the last time you hit your quota of Google searches? When was the last time you hit your quota of StackOverflow questions? When was the last time you hit your quota of YouTube videos? Any service will rate limit abuse, but if abuse is indistinguishable from regular use from the provider's perspective, that's not a good sign.
jerf: It's also kind of interesting that they don't think they can do what an economy would normally do in this situation, which is raise prices until supply matches. Shortages generally imply mispricing.There's a lot of angles you take from that as a starting point and I'm not confident that I fully understand it, so I'll leave it to the reader.
bcherny: Hey all, Boris from the Claude Code team here.We've been investigating these reports, and a few of the top issues we've found are:1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude.2. People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins. This was the case for a surprisingly large number of users, and we are actively working on (a) improving the UX to make these cases more visible to users and (b) more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage.In the process, we ruled out a large number of hypotheses: adaptive thinking, other kinds of harness regressions, model and inference regressions.We are continuing to investigate and prioritize this. The most actionable thing for people running into this is to run /feedback, and optionally post the feedback ids either here or in the Github issue. That makes it possible for us to debug specific reports.
denysvitali: OpenAI (Codex) keeps on resetting the usage limits each time they fuck up...I have yet to see Anthropic doing the same. Sorry but this whole thing seems to be quite on purpose.
SkyPuncher: I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.Here’s what I’ve done to mostly fix my usage issues:* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.* keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.* compact after 200k tokens as soon as I reasonably can. I have no data but my usage absolutely sky rockets as I get into longer sessions. This is the most frustrating thing because Anthropic forced the 1M model on everyone.
stldev: Can confirm. Max effort helps; limiting context <= ~20-25% is crucial anymore.> * keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.Is this as opaque on their end as it sounds, or is there a way to check?
behole: I shred my Maxx5 in 2 hours on the reg this week! Glm here I come!
omosubi: Getting $5000 worth of product essentially free and then being told to pay is not enshittification.
zzzoom: It's predatory pricing.
wellthisisgreat: How and when do you apply the strategy?
danbots: Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that *gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help.For something I spend all my time using- I’d rather iterate with Claude. The personality makes a big difference to me.
cmrdporcupine: I don't care about "personality" I want quality.Honestly when I get codex to review the work that Claude does (my own or my coworker's) it consistently finds terrible terrible bugs, usually missing error handling / negative conditions, or full on race conditions in critical paths.I don't trust code written by Claude in a production environment.All AI code needs review by human, and often by other AIs, but Opus 4.6 is the worst. It's way too "yeet"The opus models are for building prototypes, not production software.GPT 5.4 in codex is also way more efficient with tokens or budget. I can get a lot more done with it.I don't like giving money to sama, but I hate bugs even more.
emptysongglass: Man what the hell happened to System Initiative. It was a super weird pivot from sociotechnical proclamations to a tool I honestly have no idea what it does for me? Is it n8n for agents? Is it needed when I have a bunch of skills that approximate whatever swamp is trying to do? Who knows!
comboy: You just convinced me to try it. Claude just copy pastes, does search and replace, zero abstractions and I'm the one that needs to think about the edge cases.
pawelduda: 50 days ago I wrote this [1] as the world seemed high on AI and it gave me crypto bubble vibe.Since then, I've been seeing increased critique of Anthropic in particular (several front page posts on HN, especially past few days), either due to it being nerfed or just straight up eating up usage quota (which matches my personal experience). It appears that we're once again getting hit by enshittiffication of sorts.Nowadays I rely a lot on LLMs on a daily basis for architecture and writing code, but I'm so glad that majority of my experience came from pre-AI era.If you use these tools, make sure you don't let it atrophy your software engineering "muscles". I'm positive that in long run LLMs are here to stay. The jump in what you can now self-host, or run on consumer hardware is huge, year after year. But if your abilities rely on one vendor, what happens if you come to work one day and find out you're locked out of your swiss army knife and you can no longer outsource thinking?[1] https://news.ycombinator.com/item?id=47066701
comboy: Any good reasonable alternatives? Gemini is like prodigious 3yo hopeless for my projects, anybody tested some opencode with kimi or something?
eurekin: I'm adding two extra gpus to my local rig. Turns out qwen 3.5 122b is already enough to handle (finish with moderate guidance) non-planning parts of my tasks.
losteric: It doesn’t seem like Anthropic is fucking up?I use Claude Code about 8hrs every work day extensively, and have yet to see any issues.It really does seem like PEBKAC.
denysvitali: Me and my colleagues faced, over the last ~1 month or so, the same issues.With a new version of Claude Code pretty much each day, constant changes to their usage rules (2x outside of peak hours, temporarily 2x for a few weeks, ...), hidden usage decisions (past 256k it looks like your usage consumes your limits faster) and model degradation (Opus 4.6 is now worse than Opus 4.5 as many reported), I kind of miss how it can be an user error.The only user error I see here is still trusting Anthropic to be on the good side tbh.If you need to hear it from someone else: https://www.youtube.com/watch?v=stZr6U_7S90
Narciss: This is a great article, thanks for sharing
throwaway2027: I don't want a nudge. I want a clear RED WARNING with "You've gone away from your computer a bit too long and chatted too much at the coffee machine. You're better off starting a new context!"
elthor89: Are there local models dedicated to programming already any good? That could be a way to deal with anthropic or others flipflopping with token usage or limits
KaoruAoiShiho: After googling https://www.reddit.com/r/singularity/comments/1psesym/openai...
wesammikhail: I've seen sources like this before. It's all hearsay and promo. I was asking for any publicly available verifiable information regarding the cost of inference at scale. I haven't seen any such info personally which is why I asked.I'm dying to see S-1 filing for Anthropic or OpenAI. I don't actually think inference is as cheap as people say if you consider the total cost (hardware, energy, capex, etc)
chasebank: But why would they make the product shittier and not just more expensive? A lot of the complaints have been the model getting lost and going rogue.
rawicki: For me definitely the worst regression was the system prompt telling claude to analyze file to check if it's malware at every read. That correlates with me seeing also early exhausted quotas and acknowledgments of "not a malware" at almost every step.It is a horrible error of judgement to insert a complex request for such a basic ability. It is also an error of judgement to make claude make decisions whether it wants to improve the code or not at all.It is so bad, that i stopped working on my current project and went to try other models. So far qwen is quite promising.
danbots: Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that **gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help me, that it has better things to do.On the flip side- Using Opus with a baby billy freeman persona has never been more entertaining.
peyton: I prompt it and check CI later. I couldn’t tell you how Codex feels. I’ve never had any conversation. You may want to try this sort of workflow if you’re affected personally in a negative way.
quikoa: Inference for API or subscriptions? There is a massive price difference between the two.
mvkel: Why did it suddenly become an issue, despite prompt caching behavior being unchanged?
mvkel: Why did this become an issue seemingly overnight when 1M context has been available for a while, and I assume prompt caching behavior hasn't changed.EDIT: prompt caching behavior -did- change! 1hr -> 5min on March 6th. I'm not sure how starting a fresh session fixes it, as it's just rebuilding everything. Why even make this available?
hirako2000: This comment reads as trying on principle to defend the use of AI.My argument was not about AI. Rather about the practice of Anthropic and the likes.
docheinestages: Why are you all of a sudden running into so many issues like this? Could it be that all of the Anthropics employees have completely unlimited and unbounded accounts, which means you don't get a feeling of how changes will affect the customers?
bcherny: The number of people using Claude Code has grown very quickly, which means:- More configurations and environments we need to test- Given an edge/corner case, it is more likely a significant number of users run into it- As the ecosystem has grown, more people use skills and plugins, and we need to offer better tools and automation to ensure these are efficientWe do actually dogfood rate limits, so I think it's some combination of the above.
maerF0x0: The insidious part is the thought that if you spend your limited learning and recall on AI Tools, then you wont be able to "still code by hand" because you'll have lost the skill, then there will be a local minima to cross to get back to human level productivity. Of course you'll get PIPed before you get back to full capacity.
cmrdporcupine: I mean this is blatantly false. Codex just rolled out a $100 a month plan with higher usage and lower quotas than Claude and GPT 5.4 is more capable than Opus 4.6. At least for the systems work I do.And if you can't stomach OpenAI, GLM 5.1 is actually quite competent. About Opus 4.5 / GPT 5.2 quality.
mlinsey: Different users do seem to be encountering problems or not based on their behavior, but for a rapidly-evolving tool with new and unclear footguns, I wouldn't characterize that as user error.For example, I don't pull in tons of third-party skills, preferring to have a small list of ones I write and update myself, but it's not at all obvious to me that pulling in a big list of third-party skills (like I know a lot of people do with superpowers, gstack, etc...) would cause quota or cache miss issues, and if that's causing problems, I'd call that more of a UX footgun than user error. Same with the 1M context window being a heavily-touted feature that's apparently not something you want to actually take advantage of...
bcherny: I don't think that's accurate. The malware prompt has been around since Sonnet 3.7. We carefully evaled it for each new model release and found no regression to intelligence, alongside improved scores for cyber risk. That said, we have removed the prompt for Opus 4.6 since it no longer needed it.
rawicki: I started seeing "not a malware, continuing" in almost every reply since around 2 weeks ago. Maybe you just reintroduced it with some regression? Opus 4.6
bcherny: That's weird. Would you mind running /feedback and sharing the id here next time you see this? I'd love to debug
bcherny: Ack, it is currently blue but we can make it red
elephanlemon: IMO we are currently in the ENIAC era of LLMs. Perhaps there will be a brief moment where things get worse, but long term the cost of these things will go way down.
nickstinemates: I can't really speak to the sociotechnical proclamations, because I didn't make them.What it does for you is simple: if you want to automate something, it does. Load the AI harness of your choice, tell it what to automate, swamp builds extensions for whatever it needs to to accomplish your task.It keeps a perfect memory of everything that was done, manages secrets through vaults (which are themselves extensions it can write) and leaves behind repeatable workflows. People have built all sorts of shit - full vm lifecycle management, homelab setups, manage infrastructure in aws and azure.What's also interesting is the way we're building it. I gave a brief description in my initial comment.
emptysongglass: Ah, interesting, thanks! I think you might consider elevating some of that kind of copy.The sociotechnical stuff with System Initiative was made by your CEO? The guy who is really into music? And I don't even know how long that product was a thing before the pivot. Not long!
kirby88: I've been building an AI coding agent that using the exact same prompt than claude code, but uses a virtual filesystem to minify source code + the concept of stem agents (general agents that specializes during the conversation for maximum cache hit). The results on my modest benchmark is 50% of claude code cost and 40% of the time. https://github.com/kirby88/vix-releases
bcherny: > 1hr -> 5min on March 6thThis is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.
ImPostingOnHN: > if they hiked to price infinity, you wouldn't worse off than if AI didn't exist because you could still code by handThis was addressed by the words that you perhaps mistakenly omitted from your quote:> Once people won't be able to think anymore...People who aren't able to think anymore, can't still code by hand. Think "Idiocracy".
nickstinemates: Does https://swamp.club do a better job?System Initiative was a thing for ~6.5 years. I talked to every person who ever used it or was interested in using it in the last 2.5 years. Thousands of them.Swamp is better by every metric; has a lot more promise, is a lot more interesting.
hk__2: > I bought the plan as I bought it. They can't change anything until the next renewal.So they can’t give you better models either?
echelon: 1. I've never seen this. Is there a config option to unhide it if it's happening? Is this in Claude Code? Does it have to be set to verbose or something?2. Can we pay more/do more rigorous KYC to disable it if it's active?
muyuu: it may also be local/timezone effectsit has been reported that it behaves very differently depending on those factors, presumably because people are placed in best-effort buckets, who knows
yummytummy: Ah, so cache usage impacts rate limits. There goes the ”other harnesses aren’t utilizing the cache as efficiently” argument.
bcherny: Claude Code is the most prompt cache-efficient harness, I think. The issue is more that the larger the context window, the higher the cost of a cache miss.
yummytummy: That might be, but the argument was that poor cache utilization was costing Anthropic too much money in other harnesses. If cache is considered in rate limits, it doesn’t matter from a cost perspective, you’ll just hit your rate limits faster in other harnesses that don’t try to cache optimize.
hughw: Where can i learn about concepts like prompt cache misses? I don't have a mental model how that interacts with my context of 1M or 400k tokens... I can cargo cult follow instructions of course but help us understand if you can so we can intelligently adapt our behavior. Thanks.
hughw: And why does /clear help things? Doesn't that wipe out the history of that session? Jeez.