Discussion

Anthropic admits Claude Code users hitting usage limits 'way faster than expected'

elephanlemon: Yesterday (pro plan) I ran one small conversation in which Claude did one set of three web searches, a very small conversation with no web search, and I added a single prompt to an existing long conversation. I was shocked to see after the last prompt that I had somehow hit my limit until 5:00pm. This account is not connected to an IDE or Code, super confusing.

master_crab: Tool calls (particularly fetching for context) eats the context window heavily. I explicitly send MCP calls to sub agents because they are so “wordy”.

bensyverson: Everyone who has not hit this bug thinks it’s user error… It’s not. It happened to me a few days ago, and the speed at which I tore through my 5 hour usage cap was easily 10x faster than normal.Also: sub agents do not get you free usage. They just protect your main context window.

stavros: Anthropic went about this in a really dishonest way. They had increased demand, fine, but their response was to ban third-party clients (clients they were fine with before), and to semi-quietly reduce limits while keeping the price the same.Unilaterally changing the deal to give customers less for the same price should not be legal, but companies have slowly boiled the frog in such a way that now we just go "welp, it's corporations, what can you do", and forget that we actually used to have some semblance of justice in the olden days.

ZeroCool2u: I'm finishing my annual paid Pro Gemini plan, so I'm on the free plan for Claude and I asked one (1) single question, which admittedly was about a research plan, using the Sonnet 4.6 Extended thinking model and instantly hit my limit until 2 PM (it was around 8 or 9 AM).Just a shockingly constrained service tier right now.

shafyy: What is the best way to get start with open weight models? And are they a good alternative to Claude Code?

wolvoleo: Just install ollama.And no, they're not as capable as SOTA models. Not by far.However they can help reduce your token expenditure a lot by routing them the low-hanging fruit. Summaries, translations, stuff like that.

kneel: I asked it to complete ONE task:You've hit your limit · resets 2am (America/Los_Angeles)I waited until the next day to ask it to do it again, and then:You've hit your limit · resets 1pm (America/Los_Angeles)At which point I just gave up

jdefr89: Over reliance on LLMs is going to become such a disaster in a way no one would have thought possible. Not sure exactly what, who, when, or where.. Just that having your entire product or repo dependent on a single entity is going to lead to some bad times…

xnx: > on a single entityContrary to the popular opinion here, there are other services beyond Claude Code. These usage limits might even prompt (har har) people to notice that Gemini is cheaper and often better.

nprateem: I literally ran out of tokens on the antigravity top plan after 4 new questions the other day (opus). Total scam. Not impressed.

master_crab: Yes, sorry. I meant it more as a descriptor of how many tokens it consumes. You are still stuck burning money.

piva00: Don't they consume less of the token quota in case the subagents are running cheaper models like Sonnet and Haiku compared to Opus?

bensyverson: Correct—I just wouldn't want folks to mistakenly think that the context fill % corresponds 1:1 with session token use.

dinakernel: This turned out to be a bug. https://x.com/om_patel5/status/2038754906715066444?s=20One reddit user reverse engineered the binary and found that it was a cache invalidation issue.They are doing some hidden string replacement if the claude code conversation talks about billing or tokens. Looks like that invalidates the cache at that point.If that string appears anywhere in the conversation history, I think the starting text is replaced, your entire cache rebuilds from scratch.So, nothing devious, just a bug.

kif: Anecdotally when Claude was error 500'ing a few days ago, its retries would never succeed, but cancelling and retrying manually worked most of the time.

jorvi: For a second I hoped you were gonna comment on how LLMs are going to rot out our skillset and our brains. Like some people already complaining they "have to think" when ChatGPT or Claude or Grok is down.Oh well.

Retr0id: The other day I was doing some programming without an LSP, and I felt lost without it. I was very familiar with the APIs I was using, but I couldn't remember the method names off the top of my head, so I had to reference docs extensively. I am reliant on LSP-powered tab completions to be productive, and my "memorizing API methods" skill has atrophied. But I'm not worried about this having some kind of impact on my brain health because not having to memorize API methods leaves more room for other things.It's possible some people offload too much to LLMs but personally, my brain is still doing a lot of work even when I'm "vibecoding".

dude250711: How can automatic slop-prevention be a disaster? It's a feature.

ahsillyme: I read that as implied.

aliljet: There's a weird 'token anxiety' you get on these platforms. And you basically don't know how much of this 'limit' you may consume at any time. And you actually don't even know what the 'limit' is or how it's calculated. So far, people have just assumed Anthropic will do the kind thing and give you more than you could ever use...

lukewarm707: please tell me if i'm crazy.i just refuse to use openai/google/anthropic subscriptions, i only use open source models with ZDR tokens.- i like privacy in my work, and i share when i wish. somehow we accepted that our prompts and work may be read and moderated by employees. would you accept people moderating what you write in excel, google docs, apple pages?- i want a consistent tool, not something that is quantised one day, slow one day, a different harness one day, stops randomly.- unless i am missing something, the closed source models are too slow for me to watch what they are doing. i feel comfortable with monitoring something, usually at about 200-300tps on GLM 5. above that it might even be too fast!

susupro1: You are not crazy, you are just waking up from the SaaS delusion. We somehow allowed the industry to convince us that paying $20/month to rent volatile compute, have our proprietary workflows surveilled, and get throttled mid-thought is an 'upgrade'. The pendulum is swinging violently back to local-native tools. Deterministic, privately owned, unmetered—buying your execution layer instead of renting it is the only way to build actual leverage.

staticassertion: No one was convinced to spend money to do the things you're saying. That's just disingenuous. People rent models because (a) it moves compute elsewhere (b) they provide higher quality models.

1970-01-01: This has been verified as a bug. Naturally, people should see some refunds or discounts, but I expect there won't be anything for you unless you make a stink.https://old.reddit.com/r/ClaudeCode/comments/1s7zg7h/investi...

scottcha: We offer multiple SOA models at https://portal.neuralwatt.com at very generous pricing since we have options to bill per kWh instead of per token. Recipes for your favorite tools here: https://github.com/neuralwatt/neuralwatt-tools

wutwutwat: So, like, GitHub then?

gonzalohm: Or Cloudfare or AWS

dewey: If this is reasonable or not is pretty hard to judge without any info on that "ONE" task.

notyourwork: Free is free. Want more, fork over money.

Forgeties79: They are saying even for free it is very constrained. This isn’t productive.

ZeroCool2u: Yes, exactly my point.

muskstinks: Its a question of price, quality and other factors.If my company pays for it, i do not care.If i have a hobby project were it is about converting an idea in my spare time in what i want, i'm happily paying 20$. I just did something like this on the weekend over a few hours. I really enjoy having small tools based on single html page with javascript and json as a data store (i ask it to also add an import/export feature so i can literaly edit it in the app and then save it and commit it).For the main agent i'm waiting for like the one which will read my emails and will have access tos ystems? I would love a local setup but just buying some hardware today costs still a grant and a lot of energy. Its still sign cheaper to just use a subscription.Not sure what you mean though regarding speed, they are super fast. I do not have a setup at home which can run 200-300 tps.

lukewarm707: i don't use local models, i just use the APIs of cloud providers (eg fireworks, together, friendli, novita, even cerebras or groq).you can get subscriptions to use the APIs, from synthetic, or ollama, fireworks.

pxtail: Recently after noticing how quickly limits are consumed and reading others complaints about same issue on reddit I was wondering how much about this is real error or bug hidden somewhere and how much it's about testing what threshold of constraining limits will be tolerated without cancelling accounts. Eventually, in case of "shit hits the fan" situation it can be always dismissed by waving hands and apologizing (or not) about some abstract "bug".The lack of transparency and accountability behind all of this is incredible in my perception.

joshuafuller: This feels a lot like the same playbook we’re seeing with dynamic pricing in retail, just applied to compute instead of products. You never really know what you’re getting, and the rules shift under you.What makes it worse is the lack of transparency. If there were clear, hard limits, people could plan around it. Instead it’s this moving target that makes it impossible to trust for real work.At some point it stops feeling like a bug and starts feeling like a pricing experiment on users.

tartoran: What a horrid glimpse in the future. I hope we won't get there and we all collectively fight back with our wallets.

muskstinks: I'm quite aware of my dependency and i'm balancing this in and out regularly over the last 10 years.Owning is expensive. Not owning is also expensive.Energy in germany is at 35 cent/kwh and skyrocketed to 60 when we had the russian problem.I'm planning to buy a farm and add cheap energy but this investment will still take a little bit of time. Until then, space is sparse.

pier25: https://xcancel.com/om_patel5/status/2038754906715066444

akdev1l: Ironically this is one of my main use cases for LLMs“Can you give me an example of how to read a video file using the Win32 API like it’s 2004?” - me trying to diagnose a windows game crashing under wine

adolph: I don't get this pov, maybe b/c I'm not a heavy Claude Code user, just a dabbler. Any LLM tool that can selectively use part of a code base as part of the input prompt will be useful as an augmentation tool.Note the word "any." Like cloud services there will be unique aspects of a tool, but just like cloud svc there is a shared basic value proposition allows for migration from one to another and competition among them. If Gemini or OpenAI or Ollama running locally becomes a better choice, I'll switch without a care.Subscription sprawl is likely the more pressing issue (just remembered I should stop my GH CoPilot subscription since switching to Claude).

bitwize: AI will totally rot our brains, just like television, video games, and the internet all did before.

toss1: Unsurprising people complain."Thinking is the hardest work there is, which is why so few people do it" — attrib Henry FordNow we have tools that can appear to automate your thinking for you. (They don't really think, but they do appear to, so...)

nprateem: c) It's turnkey instead of requiring months/years of custom dev and on-going maintenance.

nicce: Are they going to pay back if subscription was payed but token limit was less than advertised? Is there some tiny text somewhere preventing just suing or pulling money back with credit cards?

jadar: Part of the issue is that they don't actually advertise what the token limit is. Just some vague, "this is 5x more than free, and 5x more than pro". They seem to be free to change the basis however they please, because most of us are more than happy to use what they give us at the discounted subscription pricing.

spongebobstoes: try codex, it's really good and doesn't have the same limits issues

Tade0: I'm worried that the present is actually living off a line of credit that will be spent/closed soon.

jakobloekke: “Thinking is to humans as swimming is to cats. They can do it, but they prefer not to.” - Kahneman

bigbinary: On-premise LLMs are also getting better and likely won’t stop; as costs go up with the technical improvements, I would imagine cost saving methods to also improve

horsawlarway: I still think it's basically unavoidable that most people who might pay for api access will end up on-prem.Fixed costs, exact model pinning, outage resistant, enshittification resistant, better security, better privacy, etc...There are just so many compelling reasons to be on-prem instead of dependent on a 3rd party hoovering up all your data and prompts and selling you overpriced tokens (which eventually they MUST be, because these companies have to make a profit at some point).If the only counterbalance is "well the api is cheaper than buying my own hardware"...That's a short term problem. Hardware costs are going to drop over time, and capabilities are going to continue improving. It's already pretty insane how good of a model I can run on two old RTX-3090s locally.Is it as good as modern claude? No. Is it as good as claude was 18 months ago? Yes.Give it a decade to see companies really push into the "diminishing returns" of scaling and new models... combined with new hardware built with these workloads in mind... and I think on-prem is the pretty clear winner.

delphic-frog: The token usage differs day to day - that's the most frustrating part. You can't effectively plan a development session if you aren't sure how far you'll likely get into a feature.

kaoD: I only asked Claude to rewrite Linux in Rust.

kombine: I'd ask it to rewrite Claude code in Rust, but it's creator apparently wrote a book on Typescript..

muskstinks: Whats the big difference then? You can get a lot of tokens for 20$ and not everything is a state secret i'm doing.But if i would use some API stuff, probably openrouter, isn't that easer to switch around and also have zero konwledge savety?

lukewarm707: i think that privacy is good for wellbeing. it may be this is a dying point of view.

ChrisArchitect: Source: https://old.reddit.com/r/ClaudeCode/comments/1s7zg7h/investi... (https://news.ycombinator.com/item?id=47582671)

aperture_hq: There is no transparent metrics on the token usage count, they just compare their plans with their plans.

zackify: After using it all week on pro plan it worked fine for me. Hit limits a couple times.But if I was doing deep coding on pro plan it would have sucked.You can't expect to use massive context windows for $20

0xbadcafebee: [delayed]

kakugawa: gemini-cli has not been useable for weeks. The API endpoint it uses for subscription users is so heavily rate-limited that the CLI is non-functional. There are many reports of this issue on Github. [1]1/ https://github.com/google-gemini/gemini-cli/issues?q=is%3Ais...

tasuki: I use Gemini-CLI at work, and haven't noticed anything. I use Google Jules (free tier) on a toy project much more heavily and can't complain. I think sometimes the prompts take longer than they used to, but I couldn't care less. I'm not in a hurry.

Reader /

Discussion