Discussion
Codegen is not productivity
some_random: You can write cope like this all you want but it doesn't change the fact I can ship a feature in few days that previously would have taken me a few weeks.
nyrulez: Bold claims that writing code was never the bottleneck. It may not be the only bottleneck but we conveniently move goal posts now that there is a more convenient mechanism and our profession is under threat.
gzread: Can you provide some examples?
jwpapi: I have to be honest. I’ve written a lot of pro-ai / dark-software articles and I think Im due an update, cause it worked great, till it didn’t.I could write a lot about what I’ve tried and learnt, but so far this article is a very based view and matches my experience.I definitely suffered under the unnecessary complexity and wished to never’ve used AI at moments and even with OPUS 4.6 I could feel how it was confused and couldn’t understand business objectives really. It became way faster to jump in code, clean it up and fix it myself. I’m not sure yet where and how the line is and where it will be.
jwilliams: > Humans and LLMs both share a fundamental limitation. Humans have a working memory, and LLMs have a context limit.But there’s a more important difference: I can’t spin up 20 decent human programmers from my terminal.The argument that "code was never the bottleneck" is genuinely appealing, but it hasn’t matched my experience at all. I’m getting through dramatically more work now. This is true for my colleagues too.My non-technical niece recently built a pretty solid niche app with AI tools. That would have been inconceivable a few years ago.
felipellrocha: I guess that what people debate on here is what “decent” mean. From my experience, these llms spit out dog shit code, so 20 agents equal 20x more dog shit.
greggyb: I actually consider that the claim is not that bold, and in fact has been common in our industry for most of the short time it has been around. I included a few articles and studies with time breakdowns of developer activity that I think help to illustrate this.If an activity (getting code into source files) used to take up <50% of the time of programmers, then removing that bottleneck cannot even double the throughput of the process. This is not taking into account non-programmer roles involved in software development. This is akin to Amdahl's law when we talk about the benefits of parallelism.I made no argument with regard to threat to the profession, and I make none here.
jwpapi: There is a saying you need to write an essay 3 times. The first time its puked out, the second is decent and the third is good.It’s quite similar with code, and with code less is more. for try 1 and 2
greggyb: Unfortunately, this post was published at the puked out phase (;(author here)
vinceguidry: I recently started using AI for personal projects, and I find it works really well for 'spike' type tasks, where what you're trying to do is grow your knowledge about a particular domain. It's less good at discovering the correct way of doing things once you've decided on a path forward, but still more useful than combing through API docs and manpages yourself.It might not actually deliver working things all that much faster than I could, but I don't feel mentally drained by the process either. I used to spend a lot of time reading architecture docs in order to understand available solutions, now I can usually get a sense for what I need to know just from asking ChatGPT how certain things might be done using X tool.In the last few days, I've stood up syncthing, tailscale with a headscale control plane, and started making working indicators and strategies in PineScript, TradingView's automated trading platform. Things I had no energy for or would have been weeklong projects take hours or a day or so. AI's strengths synergize really well with how humans want to think.I just paste an error message in, and ChatGPT figures out what I'm trying to do from context, then gives me not just a possible resolution, but also why the error is happening. The latter is just as useful as the former. It's wrong a lot, but it's easy to suss out.
swalsh: Speak for yourself, I have never thrown away code at this rate in my entire career. I couldn't keep up this pace without AI codegen.
avabuildsdata: honestly the thing that trips me up is when codegen makes me feel productive but I haven't actually validated anything. like I'll have claude write a whole data pipeline in 20 minutes and then spend 2 hours debugging edge cases it didn't think about because it doesn't know our datathe speed is real but it mostly just moves where I spend my time. less typing, more reading and testing. which is... fine? but it's not the 10x thing people keep claiming
nubg: Would getting to the same edge-case-free outcome have taken you less than 2h20min if you didn't have AI?I think it would typically have taken you longer.
lkjdsklf: > I think it would typically have taken you longer.That's actually highly doubtful to me.Tons of studies and writing about how reading and debugging code is wildly more time consuming than writing it. That time goes up even more when you're not the one that wrote the code in the first place. It's why we've spent decades on how to write readable/maintainable code.So either all this shit about reading/maintaining code being difficult was lies and we've spent decades wasting our time or AIs can only improve productivity if you stop verifying/debugging code.So I find it very unlikely that it would have taken more than a couple hours to just write it the first time.
greggyb: Hey, author here. Never thought I'd see my pokey little blog on HN and all that.Happy to discuss further.
demorro: Hey, I like your writing. You got an rss feed or anything?
greggyb: https://www.antifound.com/atom.xml
demorro: Thanks!
foolserrandboy: XD
agent5ravi: The most useful reframe I've found: codegen changes the cost structure of writing code, not the cost structure of knowing what to write.Before, if you had a vague spec you'd write a small prototype to clarify your thinking. Now you can have a complete implementation in minutes — but you still have an unclear spec. You've just moved the uncertainty forward in the process, where it's more expensive to catch.The teams I've seen use LLMs well treat the output as a rough draft that requires real review, not a finished product. The teams that get into trouble treat generation speed as the goal. Both groups produce the same lines of code. Very different results.
dahart: > codegen changes the cost structure of writing code, not the cost structure of knowing what to write.Yes, and knowing what to write has always been the more important challenge, long before AI. But - one thing I’ve noticed is that in some cases, LLMs can help me try out and iterate on more concepts and design ideas than I was doing before. I can try out the thing I thought was going to work and then see the downsides I didn’t anticipate, and then fix it or tear it down and try something else. That was always possible, but when using LLMs this cycle feels much easier and like it’s happening much faster and going through more rough draft iterations than what I used to do. I’m trying more ideas than I would have otherwise, and it feels like it’s leading in many cases to a stronger foundation on which to take the draft through review to production. It’s far more reviewing and testing than before, but I guess in short, there might be an important component of the speed of writing code that feeds into figuring out what to write; yes we should absolutely focus entirely on priorities, requirements, and quality, but we also shouldn’t underestimate the impact that iteration speed can have on those goals.
glhaynes: Yes. I'll go down a wrong path in 20 minutes that'd have taken me half a day to go down by hand, and I keep having to remind myself that code is cheap now (and the robot doesn't get tired) so it's best to throw it away and spend 10 more minutes and get it right.
ChicagoDave: I continue to jump into these discussions because I feel like these upvoted posts completely miss what’s happening…- guardrails are required to generate useful results from GenAI. This should include clear instructions on design patterns, testing depth, and iterative assessments.- architecture decision records are one useful way to prevent GenAI from being overly positive.- very large portions of code can be completely regenerated quickly when scope and requirements change. (skip debugging - just regenerate the whole thing with updated criteria)- GenAI can write thorough functional and behavioral unit tests. This is no longer a weakness.- You must suffer the questions and approvals. At no time can you let agents run for extended periods of time on progressive sets of work. You must watch what is generated. One thing that concerns me about the new 1mm context on Claude Code is many will double down on agent freedom. You can’t. You must watch the results and examine functionality regularly.- No one should care about actual code ever again. It’s ephemeral. The role of software engineering is now molding features and requirements into functional results. Choosing Rust, C#, Java, or Typescript might matter depending on the domain, but then you stop caring and focus on measuring success.My experience is rolled up in https://devarch.ai/ and I know I get productive and testable results using it everyday on multiple projects.
ip26: No one should care about actual code ever again. It’s ephemeral.Caveat: it still works best in a codebase that is already good. So while any one line of code is ephemeral, how is the overall codebase trending? Towards a bramble, or towards a bonsai?If the software is small and not mission critical, it doesn’t matter if it becomes a bramble, but not all software is like that.
greggyb: The post is about using LOC as a metric when making any sort of point about AI. Nowhere do I suggest someone shouldn't use it, nor that they should expect negative results if they opt to.
ChicagoDave: No one I’ve ever worked with in 40 years has ever seriously used loc as a measurement of progress or success. I honestly don’t know where this comes from.
eleventhborn: I feel there is a set of codebases in which LLMs aren't showing the 2-10x lift in productivity.There is also a set of codebases in which LLMs are one-shotting the most correct code and even finding edgecases that would've been hard to find in human reviews.At a surface level, it seems obvious that legacy codebases tend to fall in the first category and more greenfield work falls in the second category.Perhaps, this signals an area of study where we make codebases more LLM-friendly. It needs more research and a catchy name.Also, certain things that we worry about as software artisans like abstractions, reducing repeated code, naming conventions, argument ordering,... is not a concern for LLMs. As long as LLMs are consistent in how they write code.For e.g. One was taught that it is bad to have multiple "foo()" implementations. In LLM world, it isn't _that_ bad. You can instruct the LLM to "add feature x and fix all the affected tests" (or even better "add feature x to all foo()") and if feature x relies on "foo()", it fixes every foo() method. This is a big deal.
demorro: A well considered article, despite the author categorizing it as a rant. I appreciate the appendix quotations, as well as the acknowledgement that they are appeals to authority.Whilst the author clearly has a belief that falls down on one side of the debate, I hope folks can engage with the "Should we abandon everything we know" question, which I think is the crux of things. Evidence that AI-driven-development is a valuable paradigm shift is thin on the ground, and we've done paradigm shifts before which did not really work out, despite massive support for them at the time. (Object-Oriented-Everything, Scrum, etc.)
greggyb: I didn't set out to teach you anything, change your behavior, or give you practical takeaways, so it's a rant (: Emotions can be expressed with citations.I am fully on board with gen AI representing a paradigm shift in software development. I tried to be careful not to take a stance on other debates in the larger conversation. I just saw too many people talking about how much code they're generating as proof statements when discussing LLMs. I think that, specifically---i.e., using LOC generated as the basis of any meaningful argument about effectiveness or productivity---is a silly thing to do. There are plenty of other things we should discuss besides LOC.
demorro: I guess I over-diagnosed your stance, apologies.I wonder if you have a take on measuring productivity in light of the potential difficulty of achieving good outcomes across the general population?You mention in the second appendix (which I skipped on my first read), that you are a rather experienced LLM user, with experiences in all the harnesses and context management which are touted as "best practice" nowadays. Given the effort this seems to take, do you think we're vulnerable to mis-measuring.My mind is always thrown to arguments about Agile, or even Communism. "True Communism has never been tried" or "Agile works great when you do it right", which are still thrown about in the face of evidence that these things seem impossible, or at least very difficult, to actually implement successfully across the general population. How would we know if AI-driven-development had a theoretical higher maximum "productivity" (substitute with "value", "virtue", "the general good", whatever you want here) than non AI-driven-development, but still a lower actual productivity due to problems in adoption of the overall paradigm?
slopinthebag: > - No one should care about actual code ever again. It’s ephemeral. The role of software engineering is now molding features and requirements into functional results. Choosing Rust, C#, Java, or Typescript might matter depending on the domain, but then you stop caring and focus on measuring success.I think this has always been the case. "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." Perhaps you mean that they shouldn't worry about structures & relationships either but I think that is a fools errand. Although to be fair neither of those need to be codified in the code itself, but ignore those at your own peril...
galbar: This article describes the body of knowledge I was taught when I joined the industry and parallels with my experience with and thoughts about AI.I have come to the realization that most people in the industry don't know this body of knowledge, or even that it exists.I'm now seeing the same people trying to solve their ineffectiveness with AI.I don't know what to think about this situation. My intuition hints at it not being good.
greggyb: Yes, but it comes up in conversations of LLMs a lot. Thus, the rant in question. I think we are in agreement, or at least we lack disagreement, because that is the only stance I endeavored to take in the post.
jwpapi: I think it works great in codebases that are good, but I think it will degrade the quality of the codebase compared to what it was before.A good codebase depends on the business context, but in my case its an agile one that can react to discovered business cases. I’ve written great typed helpers that practically allow me to have typed mongo operators for most cases. It makes all operations really smooth. AI keeps finding cretaive ways of avoiding my implementations and over time there are more edge cases, thin wrappers, lint ignore comments and other funny exceptions. Whilst I’m losing the guarantees I built...
nubg: For me it's simple:1. Assume you're to work on product/feature X.2. If God were to descend and give you a very good, reality-tested spec:3. Would you be done faster? Of course, because as every AI doomer says, writing code was never the bottleneck!!1!4. So the only bottleneck is getting to the spec.5. Guess what AI can help you with as well, because you can iterate out multiple versions with little mental effort and no emotional sunk cost investment?ergo coding is a solved problem
sarchertech: And then it turns out God wrote the spec in code because that’s what any spec sufficient to produce the same program from 2 different teams/LLMs would be.
zer00eyz: I went to look at some of the authors other posts and found this:https://www.antifound.com/posts/advent-of-code-2022/So much of our industry has spent the last two decades honing itself into a temple built around the idea of "leet code". From the interview to things like advent of code.Solving brain teasers, knowing your algorithms cold in an interview was always a terrible idea. And the sort of engineers it invited to the table the kinds of thinking it propagated were bad for our industry as a whole.LLM's make this sort of knowledge, moot.The complaints about LLM's that lack any information about the domains being worked in, the means of integration (deep in your IDE vs cut and paste into vim) and what your asking it to do (in a very literal sense) are the critical factors that remain "un aired" in these sorts of laments.It's just hubris. The question not being asked is "Why are you getting better results than me, am I doing something wrong?"
greggyb: > The complaints about LLM's that lack any information about the domains being worked in, the means of integration (deep in your IDE vs cut and paste into vim) and what your asking it to do (in a very literal sense) are the critical factors that remain "un aired" in these sorts of laments.I'm not sure if this is a direct response to the article or a general point. The article includes an appendix about my use of LLMs and the domains I have used them in.
hypeatei: Not GP, but your appendix about LLM usage matches exactly how I use it too: mainly for rubber ducking and research. The codegen it's useful for (that I've found) is generating unit tests. Code coverage tools and a quick skim are more than sufficient for quality checks since unit tests are mostly boilerplate and you want to make sure that different branches are being covered.
greggyb: I've had a large project recently which has biased my view on unit testing from LLMs. It includes a lot of parsing and other workflows that require character-specific placement in strings for a lot of tests. Due to how tokenization works, that is a gnarly use case for LLMs. I am trying not to form too many strong opinions about LLM-driven-TDD based on it. My forays into other domains show better results for unit tests, but the weight of my experience is in this particular low point lately.