Discussion
verdverm: so far, we're still learning how to use this new tool, which is also getting better with each release
dude250711: I agree, it was about 10.29% earlier this year, now we are standing at least at 10.35% or something.
verdverm: The last one that made the rounds was negative, so we have moved more than 10% in less than 1/2 a year
arisAlexis: because the human may be the bottleneck soon
rybosworld: > Planning, alignment, scoping, code review, and handoffs—the human parts of the SDLC—remain largely untouchedSeems likely that process is holding things back. Planning has always been a "best-guess". There's lots you can't account for until you start a task.Code review mostly exists because the cost of doing something wrong was high (because human coding is slow). If you can code faster, you can replace bad code faster. I.e., LLMs have cheapened the cost of deployment.We can't honestly assess the new way of doing things when we bring along the baggage of the old way of doing things.
felipeerias: Planning might end up being a lot more predictable thanks to coding agents: if you want to estimate how long a task would take, send an agent to do it.If the agent comes back in a few minutes with a tiny fix, it is probably a small task.If the agent produces a large solution that would need careful review, it is at least a medium task.If the agent gets stuck, runs into architectural constraints, etc. then it is definitely a hard task.
eucyclos: That's got to be more about processes around it than the tool itself though, right?
eucyclos: It might be more accurate to say humans will only work at the bottlenecks soon, unless I've misunderstood the vector of your commentary.
enraged_camel: >> November 2024 through February 2026Yeah, listen... I'm glad these types of studies are being conducted. I'll say this though: the difference between pre- and post-Opus 4.5 has been night and day for me.From August 2025 through November 2025 I led a complex project at work where I used Sonnet 4.5 heavily. It was very helpful, but my total productivity gains were around 10-15%, which is pretty much what the study found. Once Opus came out in November though, it was like someone flipped a switch. It was much more capable at autonomous work and required way less hand-holding, intervention or course-correction. 4.6 has been even better.So I'm much more interested in reading studies like this over the next two years where the start period coincides with Opus 4.5's release.
jackschultz: Very much agree. Gave a presentation on AI to a group earlier this week and I spent a third of the time talking about the Opus 4.5 inflection point in AI history. First time using that model the day it was released it was so clear that it knew what it was doing at a different level. People still jump around to different models or tools or time frames when talking about AI and usefulness, but those have no meaning if they’re not using the Opus 4.5 and 4.6 models and anthropic harnesses of Claude code or cowork.I’m interested in the studies along with the history of AI and if they’re going to realize that was the point when things changed, because for us devs, that was the moment.
esseph: [delayed]
naasking: Sounds reasonable, but gains will go up. There is a ceiling somewhere, but we don't know where it is.