Discussion
anovikov: If AI makes people so much more productive, why aren't there much more apps on the Apple store? Mobile apps involve a lot of dirty, boring scaffolding work which AI automated first thing, 2 years ago easily. It should've been the very first place where productivity boost should've been evident, a year ago at least. But it's just not there. Why not?
dudewhocodes: App Store releases are increasing due to a new gold rush on subscription apps. Review times have gotten longer as the review team at Apple is being spammed.Most of these apps are rudimentary habit trackers, time management apps etc. so not much creativity, much more recycled ideas. More code != better ideas though.https://www.a16z.news/i/185469925/app-store-engage https://42matters.com/ios-apple-app-store-statistics-and-tre...
whstl: Also a lot more clone ideas these days. AI has definitely empowered people to write things from scratch, either as a product to sell or as internal projects inside companies.
vjk800: We've had the AI tools for maybe two years, and they have only gotten really good in the past half a year or so. For fuck's sake, adopting electricity took like 50 years, why would you expect to see any kind of effect from the AI so quickly? The tools are still developing - rapidly - and people are still figuring out the best usage patterns for it.
AugustoCAS: Dora released a report last year: https://dora.dev/research/2025/dora-report/The gains are ~17% increase in individual effectiveness, but a ~9% of extra instability.In my experience using AI assisted coding for a bit longer than 2 years, the benefit is close to what Dora reported (maybe a bit higher around 25%). Nothing close to an average of 2x, 5x, 10x. There's a 10x in some very specific tasks, but also a negative factor in others as seemingly trivial, but high impact bugs get to production that would have normally be caught very early in development on in code reviews.Obviously depends what one does. Using AI to build a UI to share cat pictures has a different risk appetite than building a payments backend.
IshKebab: These sort of things are really hard to study. Combine that with the fact that the AI landscape is so varied and fast moving... It's easy to see why there aren't many studies on it.There are a mountain of things that we reasonably know to be true but haven't done studies on. Is it beneficial for programming languages to support comments? Are regexes error-prone? Does static typing improve productivity on large projects? Is distributed version control better than centralised (lock based)? Etc.Also you can't just say "AI improves productivity". What kind of AI? What are you using it for? If you're making static landing pages... yeah obviously it's going to help. Writing device drivers in Ada? Not so much.
chrisjj: > Why there are no actual studies that show AI is more productive?Beats me. With "AI" being so good at faking stuff, there should by now be ton of such studies :)
whstl: I agree. I'd also argue that local effects of productivity were already visible since the start of ChatGPT. I was already using it a lot back then for writing tests and as a "smarter scaffolding", even before Copilot and such. Often cutting the time of doing something from half an hour to a few seconds.IMO the bottleneck remains the same: doing proper engineering is more than writing code. Even 20 years ago a big corp would spend a few years writing something that a startup would do in weeks (and yes: even 20 years ago) just because of laser-focused requirements, better processes/less bureaucracy, using the right tools for the job and having less friction in tooling. That hasn't changed.
danr4: because you can just look at the commit log
Stronz: It might also depend on how the tools are used. In practice a lot of value seems to come from reducing small bits of friction rather than dramatically increasing output.
therouwboat: Yeah, AI people like to talk about how their kid made a mario bros game in a weekend, so big adults should be doing crysis clones in same time, right?
heraldgeezer: So... you want a study to prove your ready made hypothesis?
Nevermark: I think most major efficiency improvements involve more adaptation costs than expected.Those that can “see” the potential clearly push through the adaptation period over time, but it can be much longer than anyone expects.Depending on how forward looking a group is, that is a problem, or pure win.But external measurements won’t be able to distinguish between what may be very fast accumulating forward looking returns/value vs. little or negative benefit, for some time.I also wonder what the demise of non-adaptive firms does to these numbers. When underlying value lags, despite top line returns, and then disappears due to failure, is that serious previously masked lack of “efficiency” ever accounted for?—-This is the dual of measuring running productivity/effort without taking into account long term accumulation of tech debt.If/when technical debt becomes an obvious drag in performance, it suddenly goes from invisible to overriding significance.
graeber_28927: Electricity analogy is fairplay, but ChatGPT had something like 110% global adoption 5 minutes after its release. The infrastructure and the electrical appliances had to catch up, but the Internet is all built out already.So I think it's fair to be looking at results a few years in.Andrey Karpathy famously mentioned in an interview with Dwarkesh Patel [0], that the computer doesn't show up on GDP numbers, there's no noticeable jump or change in slope. Even if Excel is so damn fast, people are likely not drawing its full potential, and institutions are likely actively resisting change anyway.My take is that the general population hasn't found the productive levers yet, they're at the stage where they're happy to drag down and auto generate the date list in Excel, but don't know to adjust diagrams or read function docs, not to even mention VBS scripting. And the enthusiast (dev) community I'd say is starting adoption with internal tools, and shot-in-the-dark apps, but big successes need time to mature in all the other ways (design, reliability, user feedback, marketing...), which comes back to what you said also, that needs time. Product Market Fit isn't happening automatically by chance or good prompting, I would like to think.[0] https://youtu.be/lXUZvyajciY?is=CBJI4hIr6w_UHVs9
teew: [delayed]
unsupp0rted: For me it is a 2x or 5x or something, "but high impact bugs get to production that would have normally be caught very early in development on in code reviews" is what takes it back down to a 1.5x.There are genuinely weeks where I go 5x though, and others where I go 0.5x.
lysecret: Because we are incapable of measuring developer productivity.
blitzar: Ask HN: Why are there no actual studies that show the sky is green and the earth is at the centre of the universe?
otabdeveloper4: Just trust the vibe, bro. One trillion market cap cannot be wrong.
stephbook: "the computer doesn't show up on GDP numbers, there's no noticeable jump or change in slope."That's certainly an interesting take. Where do these people think the 1-2% annual growth came from — steam machine late adopters?
charcircuit: Because the data is private and often such studies are not measuring solely the part that AI makes more productive. And measuring productivity in general is a very hard problem so the results of whatever study often are meaningless in practice. Pair this with studies today still being based off ancient models like GPT-4o and it's even more meaningless.If you are familiar with AI it's obvious how it increases productivity. When bugs get fixed with 0 human time it's plain as day that it was productive compared to a human making the fix.
smackeyacky: The code was never the bottleneck. It’s always the org around it.
bawolff: > Many won't care unless you show them an actual studyWhy are the pro AI people so obsessed with proving the AI skeptics wrong.Is AI is working for you? Great. Go make great things. Isn't that the point after all? Who cares who believes you if the results speak for themselves?
massysett: Heh, I guess Apple needs to better use AI to review all the AI-written apps.
chrysoprace: Self-reported productivity does not equate to actual productivity. People have all sorts of biases that make such assessments fairly pointless. They only gauge how you feel about your productivity, which is not necessarily a bad thing, but it doesn't mean you're actually more productive.
rienbdj: GitHub has their own study using Copilot but given the obvious conflict of interest I would discount it.
squidbeak: > Why are the pro AI people so obsessed with proving the AI skeptics wrong.It seems to me the pro-AI types just want to be free to enjoy a transformative tech and discuss the implications of its development and innovations - without being badgered and henpecked by naysayers insisting the results the pro-AI types see are some kind of mass delusion.
aragilar: How do you know you're more productive? Humans are excellent at fooling themselves, and absent a metric (or multiple metrics) by which you can measure your productivity, you can't be sure you're actually being more productive.
esperent: [delayed]
actionfromafar: Yes.If we could even measure teams, against themselves, others and some kind of baseline, but we don't AFAIK.
charcircuit: I think you are underestimating the amount of low priority issues that exist that doing need alignment around fixing. In the past these had little upside to actually fix, but as the cost of fixing them trends towards $0, you might as well fix them.
ltning: Why are we even discussing this before the theft problem has been solved? Or the energy consumption?If anything, there needs to be studies done on- the drop in creative, novel output from actual people (due to theft and loss of jobs)- the energy cost per pax in relevant industries, pre/post LLMs being adopted
mikkupikku: Because I am long past pretending to give a shit about intellectual property when the corps don't, or caring about the energy expenditure of my hobby when all the car guys don't, and really when it all comes down to brass tacks I think the technology must be judged by what it can do for me, not according to some misguided principles that don't actually serve my interests in the grand scheme of things (IP) or quasi-ideological matters like how much every I'm morally entitled to use. Screw all of that, frankly I file it all under cope that's used by people who want to go back to the old methods to justify their decision to ignore and not learn one of the most amazing technologies created during our lifetimes. I suggest you get real, for all its faults the tools work too well for us to turn back the clock on any of this. This stuff isn't going to blow over so you should be learning to make the best of it. My two cents.
blitzar: Lines of code pushed ... obviously /sUnironically, ai evaluating the impact of those lines might be getting close to a metric that would measure output.
shawntwin: Surely, current openclaw has show AI's productive. More and more common person use it to change their lives, amazing
IdontKnowRust: If you don't know how to do something at first by using code, than the code was a bottleneck.
re-thc: > but as the cost of fixing them trends towards $0It’s not. In a proper org the cost is the testing, the release process, the coordination, the planning, etc.Any scope creep even if it fixes something often gets shouted at.
charcircuit: AI can take over testing and release planning / coordination. This is the allure of AI. Being able to fully close the loop of releasing software without needing a human.
arzke: We're incapable of putting an accurate, standardized value on developer productivity, yet there often seems to be consensus between senior engineers on who are the high performers and the low performers. I certainly can tell this about the people I work with.
kqr: We are definitely not. Point at a problem, and measure the cost of solving it. That's developer productivity.We only avoid doing it at scale because it's expensive. In particular if we want the measurement to generalise out of sample.(In particular in this case, where once we're done, proponents will claim our data is too old to be a useful guide to tomorrow.)
duncanfwalker: It's not so valuable to assess the current state - what the impact of using AI is today. From personal experience it feels like overall impact on productivity was not positive a couple of years ago, might be positive now and will be positive in a couple of years. That means by assessing the current state of impact on product where just finding where we are on that change curve. If we accept that trend is happening then we know at some point it will (or has) pass the threshold where our companies will fall behind if they're not using it. We also know it takes a while to get up to speed and make sure we're making the most of it so the earlier we start the better. That's the counter arguement that we could wait for a later wave to jump on but that's risky and the only potential reward is a small percentage short-term productivity gain.
felipeerias: Most people seem to be expecting some kind of quantitative analysis: N developers undertook M tasks with and without access to a given AI tool, here is the statistical evidence that shows (or fails to show) the effect, and this result is valid across other projects and tools.In practice, arriving at this ideal scenario can be very challenging. Actually feasible experiments will be necessarily narrow, with the expectation that their results can be (roughly) extrapolated outside of their specific experimental setup.Another valid approach would be to carry out qualitative research, for example a case study. This typically requires the study of one (or a few) developers and their specific contexts in great detail. The idea is that a deep understanding of how one person navigates their work and their tools would provide us with insights that might be related to our specific situation.Personally, in this particular area, I tend to prefer detailed qualitative accounts of how other developers are working on similar projects and with similar tools as me.But in any case, both approaches are valid and complementary.
ghostlyInc: I think the productivity gain from AI is mostly micro-friction reduction.Things like generating boilerplate, quick test scaffolding or documentation lookups. Each one is small, but they compound during the day.That’s probably why it’s hard to capture in traditional studies.Curious: has anyone seen studies measuring task-level productivity instead of overall output?
ChicagoDave: I can report all kinds of productivity using Claud AI and Code.- built AWS dashboard to identify and manage internal resources in a few hours- solved several production problems connecting Claude to devops APIs in near real-time- identified solutions for feature requests or bugs for existing internal applications including detailed source changes- built Ledga.us- built sharpee.net and its associated GitHub repo- building mach9 poker ios and android apps- working on undisclosed app that might disrupt a huge Internet sectorWe’re still in the early stages of LLM influenced development and reporting productivity will take time
mikkupikku: I don't know if it's made me more productive but I do know that for the past ten years I've been thinking about making an intermediate mode GUI toolkit for MPV user scripts, rendered with ASS subtitles and with a full suite of composable widgets, but for ten years I kept putting it off because it seemed like it would be a big quagmire of difficult to diagnose rendering errors (based off far more modest forays into making one-off GUIs in this way.) And I know that yesterday I decided to explain my idea to claude and now it just fucking works after just a few hours of easy casual back and forth.I don't know man, could just be in my head. I better defer judgement, put aside all my own opinions about what happened and let some researchers with god knows what axe to grind make that decision for me.
muvlon: So you're saying instead of assessing the current capabilities of the technology, we should imagine its future capabilities, "accept" that they will surely be achieved and then assess those?
queuep: Yeah, I mean when we’ve been on tight timelines for stuff we might have even opted out for creating admin interfaces for stuff.Now creating these admin interfaces don’t even take more time.Also where I’m at currently I’m in a lot of meetings and have 0 dev time allocated in my role.But now with AI it’s easier to get some tickets going and easier to maintain context switching.I don’t need any research to see that my productivity increased. From 0 tickets completed to being able to contribute.
metalman: I believe that individual productivity in most areas peaked long ago. Industrial production is still scaling up, and this is the model that applies to AI, or as it realy is, automation of "management", but as this is NOT a linear mechanical process,(almost, oh! so almost mechanical), it is not quite working.For exactly the same reason that industry can not make you one ,lets say,car, that is green on one side, but orange on the other, and has six headlights, but only one seat, industry cant scale down, minimum order is 250000 units, it will take 3 years, pay us now! I deal with this every week, something small,(smol), breaks, in a large corporate environment, they work in millions, they have teams, and departments, but the little handle thing on a set of automated front doors facing a main street in a significant asset, has failed, and watching the whole corporate aparatus convulse as they try and figure out how to pay for something smaller than a rounding error to a company that barely exists, and needs to be passed higher and higher to be approved as there is no button, just like a major corporate deal. People cant figure this out, AI never will. And I am exploring just how to exploit this scaling problem to my advantage.
lnsru: It’s not company. It’s always the 10x developer who uses the tools to increase his output. My buddies report at least once a month the new AI policy in corporate world. All of them are bollocks written by someone who never wrote any code.
lucasluitjes: The full report can be found here: https://services.google.com/fh/files/misc/2025_state_of_ai_a...That 17% increase is in self-reported effectiveness. The software delivery throughput only went up 3%, at a cost of that 9% extra instability. So you can build 3% faster with 9% more bugs, if I'm reading those numbers right.
yorwba: Those aren't even percentage increases, but standardized effect sizes. So if you take an individual survey respondent and all you know is that they self-reported higher AI usage, you can guess their answers to the self-reported individual effectiveness slightly more accurately, but most of the variation will be due to unrelated factors.The question that people are actually interested in, "After adopting this specific AI tool, will there be a noticeable impact on measures we care about?" is not addressed by this model at all, since they do not compare individual respondents' answers over time, nor is there any attempt to establish causality.
hennell: What's the best car? If you're trying to go fast it's one answer, if you're trying to carry as much load as possible it's another, if you're buying for your just-qualifed-teen it's another. But best is obviously subjective, so what about safest? I don't know specifics there, but if you're in the EU the "safest" car would be very different to the "safest" in the US, because their safety studies measure very different things.Which is the issue with almost all studies and statistics, what it means depends entirely on what you're measuring.I can program very very fast if I only consider the happy path, hard code everything and don't bother with things like writing tests defining types or worrying about performance under expected scale. It's all much faster right up until the point it isn't - and then it's much slower. Ai isn't quite so obviously bad, but it can still hide short term gains into long term problems which is what studies tend to focus on as the short term doesn't usually require a study to observe.I think Ai is similar to outsourcing staff to cheeper counties, replacing ingredients with cheaper alternatives and other MBA style ideas. It's almost always instantly beneficial, but the long term issues are harder to predict, and can have far more varied outcomes dependent on weird specifics of the business.
ltning: The issue with creative and novel output from people is neither about intellectual property nor energy, though. So even someone who has nothing (personal) to lose by adopting these techs should be able to reflect on how that will make things look 5, 10, 20 years from now.And I'm not talking about climate or poor starving artists here. But of course, if everyone thinks like you seem to do we might just give up on having a livable planet in 50 years. Or any significant scientific or artistic progress.
mikkupikku: Yeah that's nice, but I don't care and it's not going to stop this train. The future I envision coming is one where even local models are sufficiently capable to give common people the ability to control their own computers in a way that previously would have required them to hire a team of professionals, or to devote years of their life to study. Frontier models aren't quite good enough yet for normies to use in this way, let alone local models, but this stuff is all still very new and there's a lot of competition to improve it. I think we'll get there, and in any case the upsides are big enough already to squash all the whining objections. You can't stop this tech, all you can do is stop yourself from benefiting from it while others do.
hypeatei: Productivity was never about the lines of code written. I thought the industry as a whole had collectively decided that metric was a joke before the age of LLMs. The bottlenecks are the same: office politics, coordinating teams, consulting subject matter experts and coherent system design. AI is not a swiss army knife that results in devs becoming their own island; LLMs cannot tell me if something would jive well with our customer base -- I need people in the company who actually interact with them, for example.
georgefrowny: People say AI is great for tests. I say that is horseshit as someone wading through 10k lines per module of thoughtless test spam that's extremely redundant, riddled with bad assumptions. It's over-covering some paths at different levels and leaving some untested. There are huge chunks of repeated fixtures with subtle differences. It's slow. Huge numbers of checks are combined into single test cases that you can't run separately. But it looks like a fantastic, thorough, test suite from a distance. The more you look, the more you realise it's dog shit. Tests are hard to write and anyone who thinks they are the easy bit you can automate is a cretin.They've built an iron house of cards that simultaneously fragile and halfway to collapse and also putting the code into a straightjacket that means you can't do anything with it until you carefully peel back the shitty tests and replace with something that thinks about the interface semantics and what you're actually testing and why.Of course, because the house of cards is actually standing for now, fixing it is "a waste of time". The person who wrote it gets to crow about their AI productivity and the person who fixes it up gets to grind for a week to undo it and make no actually progress in terms of story points.
jokoon: Funny how much money is invested yet no proof those investments will yield profitsIt's all make believe
andrewstuart: Because software development - anything to do with software development is incredibly hard to quantify.And no, no-one is waiting for a “study” to believe in AI, they’re out doing it.
PunchyHamster: And 3% difference is at "the new coffee in office is kinda shit and developers are annoyed" level of difference
PunchyHamster: I do remember some of them showed some productivity improvement but it pretty much dove off cliff with the complexity of the tasks involved, or the small improvement on medium difficulty task was eaten by time to wait for responses.Note that most of them were focused on programming tasks aimed to ship a product, not other use cases like "prototype a dozen of ideas quickly before we pick direction", or "write/update documentation about this feature" which AI might be significantly more productive use case than just programming.
PunchyHamster: > If AI makes people so much more productive, why aren't there much more apps on the Apple storeThere are more apps, and webpages, and software and whole lot of stuff.It's just not good
PunchyHamster: The "badgering and henpecking" "problem" was created entirely by AI bros hyping AI to everyone and forcing it in every possible channel and avenue.You're literally trying to blame the victim. Put "don't show AI content" on every major platform and the henpecking will stop but (aside from technical annoyanced of doing it) that won't happen because companies want to force AI down our throats.
PunchyHamster: But the drop of original human created output will be worse for you. Even if you are fine with consuming AI slop the quality of it will go down with worse inputs
PunchyHamster: Or it might be horribly bad at it, as near every other problem people claim "AI might be good at it"
mikkupikku: Slop custom made for me individually, on demand to fit my individual personal needs like a glove, is the dream.Also, this slop is substantially slicker and more polished than the software I would have made myself, for myself. Judge away, but when I write something myself, for myself, I take short cuts and find little excuses to give myself less work. XDG complaint config? That can wait... Animations? Pfft, skip it. Tool tips on every interactive element? That'll never happen. But with a coding agent doing my bidding, these niceties become realities.
xigoi: > Point at a problem, and measure the cost of solving it.The problem with this is that AI will create worse code that is going to cause more problems in the future, but the measurements won’t take that into account.
make_it_sure: I'm very very sure. Based on my last 15 years of coding experience I can assist fairly accurate how much a task takes. With AI I can finish the task 2x-4x faster (this includes testing, edge case handling etc).
devilkin: I have a coworker who is obsessed by LLMs and keeps reiterating that he is super productive with them.Yet I have to still see the first delivery or codebase by that same person. (I am not his manager)I lean in the LLM skeptic camp, I know they're great for some things (never to outsource your thinking, what unfortunately a lot of people do), but I'd like to see some studies. Because there are a lot of net negatives in the business press, or max up to 10% improvement.
orwin: I think for myself, it's close to 25% if I only take my role as a dev. If I take my 'senior' role it's less, because I spend way more time in reviews or in prod incident meetings.Three months ago, with opus4.5, I would have said that the productivity improvement was ~10% for my whole team.I now have to contradict myself: juniors and even experienced new hires with little domain knowledge don't improve as fast as they used to. I still have to write new tasks/issue like I would have for someone we just hired, after 8 months. I still catch the same issues we caught in reviews three months ago.Basically, experience doesn't improve productivity as fast as it used to. On easy stuff it doesn't matter (like frontend changes, the productivity gains are extremely high, probably 10x), and on specific subjects like red teaming where a quantity of small tools is better than an integrated solution I think it can be better than that.But I'm in a netsec tooling team, we do hard automation work to solve hard engineering issues, and that is starting to be a problem if juniors don't level up fast.
sph: > Why are the pro AI people so obsessed with proving the AI skeptics wrong.Cognitive dissonance. "Why are people claiming they do not see any benefit and I do? That is unacceptable, they must be wrong."I have to admit cognitive dissonance works both ways.
sph: Yes, code is a bottleneck for those that do not know how to code.Learning to write code always was the easy part, learning to write good software is what takes the rest of our careers to get better at.
eudamoniac: I can tell you that at Cisco they just released an internal AI study that measured just about everything related to AI at Cisco except tangible gain. No mention of productivity, but tons of other data about who uses it, how long, why or why not, what correlates to usage or non usage, etc. I can only assume what that means.
duncanfwalker: I would assess the directionality and rate of the trend. If it's getting better fast and we don't see a limit to that trend then it will eventually pass whatever threshold we set for adoption.
squidbeak: > Put "don't show AI content" on every major platform and the henpecking will stopYour argument then is: "Ban the subject of AI from your platforms or we're coming at you with pitchforks. And don't say anything to us when we do, because we are the sad ones here." Correct?
kqr: The measurements should take that into account, yes. (There are ways to estimate this.)
Thunderer: Measure or estimate? What ways? Honest question, because virtually all AI discussions _convieniently_ become vague a few steps short of actually answering the question.
edanm: 1. We're very bad at measuring developer productivity. We've been trying to do it for a long time, and really have very little to show from it from my POV.2. That said, almost all the people who "want to see a study" don't make sense to me. I don't remember anyone insisting on seeing a study that shows that writing Python is more productive than C; people just used it and largely agreed that it was. How many studies show that git (or other DVCS) are better than the things that preceded it? I don't know if any exist. I do know that nobody was looking for studies before switching to git.I don't ever remember seeing any new technology in software development for which people demanded studies before adopting it. They just assumed that if the professional developers they trusted to build their software said something was better, then it was — a correct assumption IMO.Now, we're seeing a technology which most professional developers — that have used it seriously, at least — insist is orders of magnitude better than anything else that's come before it. And suddenly developers can't be trusted? Suddenly, when the claimed effect is orders of magnitude bigger than almost any other new technology, developers are biased and incapable of making this kind of determination?I really don't think that's a serious position to hold.
000ooo000: >Now, we're seeing a technology which most professional developers — that have used it seriously, at least — insist is orders of magnitude better than anything else that's come before it.You can't just assert this. I could equally-baselessly say most professional developers have used LLMs and find them, overall, more trouble than they're worth. Except it's not totally baseless because I think that was actually a result of a study, IIRC.
khuedoan: But we didn't have pressure to switch from C to Python & solved it down our throats by management, or social media telling us if you don't use Python you're getting left behind, did we?In C vs. Python case, we know about technical trade-offs and when to use what, but in AI productivity neratives, we keep pretending that technical or cognitive debt created by AI doesn't exist.Sure, person A can be 20% "faster" and suggest that this tool increases productivity by a magnitude, but if it costs person B 50% more time to review A's slop or clean up A's mess, the team's productivity doesn't really increase.