Discussion
CoolGuySteve: I'm finding qwen 27b is comparable to sonnet but my self hosting has about 5 more 9s than whatever Anthropic's vibe coding. I also don't have to worry about the quality of the model I'm being served from day to day.Probably the most damning fact about LLMs is just how poorly written their parent companies' systems are.
tills13: What do you run it on? And even then, I'm guessing your tokens per second are not great?
CoolGuySteve: I get about 35-40tok/sec on a 3090.It's actually about the same speed when accounting for how much more responsive my system is to Anthropic's saas infrastructure
jasonjmcghee: People keep saying this and idk what I'm doing wrong. Using q8_0 on all the latest and greatest local models and they just don't come close to sonnet.I've tried different harnesses, building my own etc.They are reasonably close to haiku? Maybe?
Lucasoato: The real issue isn’t that Claude is down, it can happen. The problem is that the status page doesn’t report anything, even if it has been impossible to log in during the past hour. Status pages should be trusted, connected to real metrics and not fake pr stuff :/
esafak: You need to use a user-reported status page; the incentives are broken for self reporting.
wojciem: What are decent alternatives to ClaudeCode?
rongenre: I've found minimax to be quite good
inglor_cz: Interesting. I just fixed something using Claude Code. But I am located in Central Europe.
pixl97: I've not looked into it, but I'd assume they have more than one data center.
chis: Just to make one obvious critique your costs per token are probably about 1000x higher than the ones they provide.I'm pretty sympathetic to Anthropic/OpenAI just because they are scaling a pretty new technology by 10x every year. It is too bad Google isn't trying to compete on coding models though, I feel like they'd do way better on the infra and stability side.
CoolGuySteve: I've owned this GPU for 5 years already, it's fine
theanonymousone: How much is remaining untilthe last 9 is gone too?
rishabhaiover: The downtime forces me to relook at my utterly dependent relationship with agentic assistance. The inertia to begin engaging with my code is higher than it has ever been.
matheusmoreira: Yeah. It's actually starting to make me anxious. I think I got addicted to these agents.
chermi: This has consistently pissed me off. It seems like we all just accepted that whatever they define as "functioning"/"OK" is suitable. I see the status now shows, but there should be a very loud third party ruthlessly running continuous tests against all of them. Ideally it would also provide proof of the degradation we seem to all agree happens (looking at you Gemini). Like a leaderboard focused on actual live performance. Of course they'd probably quickly game that too. But something showing time to first response, "global capacity reached" etc. The one that pissed me off the lost was Gemini API displaying very clearly 1) user cancelled request in Gemini chat app 2) API showing "user quota reached". Both were blatant lies. In the latter case, you could find the actual global quota cause later in the error message. I don't know why there isn't more outrage. I'm guessing this sort of behavior is not new, but it's never been so visible to me.
user-: I am a believer that everyone should have their main flow be model/provider agnostic at a high level. I often run out of claude tokens and use GLM-5 as backup.https://gist.github.com/ManveerBhullar/7ed5c01a0850d59188632...simple script i use to toggle which backend my claude code is using
fastball: [delayed]
cyanydeez: Interesting; do you find they actually react the same way to the harness?
rvz: > The problem is that the status page doesn’t report anything, even if it has been impossible to log in during the past hour.When Claude took an extra day off, he forgot to report his hours to the dashboard when he will be unavailable / unresponsive and this is probably why people here are complaining about no status update.Wonder where I have seen that before?
ChrisArchitect: Link for up top: https://status.claude.com/incidents/vfjv5x6qkd4j
chermi: Wtf. Was this just scrubbed/pushed down from frontpage?
hgoel: IIRC threads that are just "yup, seeing this too" are not seen as being valuable here. There isn't (or at least wasn't) much discussion happening.
Danielzzzz: Seems to be good now. Just logged in successfully. Can't live without Claude nowadays is the life learning I realized in the downtime retro to myself lol.
kccqzy: But do you actually treat LLMs as glorified autocomplete or treat them as puzzle solvers where you give them difficult tasks beyond your own intellect?Recently I wrote a data transformation pipeline and I added a note that the whole pipeline should be idempotent. I asked Claude to prove it or find a counterexample. It found one after 25 minutes of thinking; I reasonably estimate that it would take me far longer. I couldn’t care less about using Claude to type code I already knew.
shimman: This says more about you than the "intellect" of these nondeterministic probability programs.Can you provide actual context to what was beyond your ability and how you're able to determine if the solution was correct?Finding out that all these comments that reference the "magical incantation" tend to be full of hot air. Maybe yours is different.
boleary-gl: If you still need access we balance across Claude and AWS via https://kilo.ai/docs/gateway - and you can BYOK for many providers
CoolGuySteve: "give them difficult tasks beyond your own intellect?"Lol no, I've yet to find a model with those properties. Sounds like a fast track to AI psychosis.The domain I work in doesn't have enough public documentation for these models to be particularly helpful without a lot of handholding though.
hombre_fatal: I've been working on a luks+btrfs+systemd tool (for managing an encrypted raid1 pool). While I have worked with each individually, it's not obvious what kind of cases you have to handle when composing them together. A lot of it is simply emergent, and the status quo has been to do your best and then see what actually happens at runtime.Documentation is helpful to describe high-level intentions, but the beauty is when you have access to source code. Now a good model can derive behavior from implementation instead of docs which are inherently limited.I implemented the luks+btrfs part by hand a few years ago, and I resurrected the project a couple months ago. Using source code for local reference, Claude discovered so many major cases I missed, especially in the unhappy-path scenarios. Even in my own hand-written tests. And it helped me set up an amazing NixOS VM test system include reproduction tests on the libraries to see what they do in weird undocumented cases.So I think "tasks beyond our intellect (and/or time and energy)" can be fitting. Otherwise I'd only be capable of polishing this project if luks+btfs+systemd were specifically my day job. I just can't fit so much in my head and working memory.
zekica: And it can fail in great ways. Last example: I asked claude for a non-trivial backup and recovery script using restic. I gave it the whole restic repo and it still made up parameters that don't exist in the code (but exist in a pull request that's been sitting not merged for 10+ months).
hombre_fatal: Interesting. I don't think I've seen hallucinations at that level when it's referencing source code.Though my workflow always starts in plan mode where Claude is clearly more thorough (which is the reason it takes 10x as long as going straight to impl). I rarely skip it.