Discussion

Tom Johnell

simonw: I wonder if it's more or less tiring to work with LLMs in YOLO/--dangerously-skip-permissions mode.I mostly use YOLO mode which means I'm not constantly watching them and approving things they want to do... but also means I'm much more likely to have 2-3 agent sessions running in parallel, resulting in constant switching which is very mentally taxing.

anthonySs: llms aren’t exhausting it’s the hype and all the people around itsame thing happened with crypto - the underlying technology is cool but the community is what makes it so hated

cglan: I find LLMs so much more exhausting than manual coding. It’s interesting. I think you quickly bump into how much a single human can feasibly keep track of pretty fast with modern LLMs.I assume until LLMs are 100% better than humans in all cases, as long as I have to be in the loop there will be a pretty hard upper bound on what I can do and it seems like we’ve roughly hit that limit.Funny enough, I get this feeling with a lot of modern technology. iPhones, all the modern messaging apps, etc make it much too easy to fragment your attention across a million different things. It’s draining. Much more draining than the old days

hombre_fatal: I think the upper limit is your ability to decide what to build among infinite possibilities. How should it work, what should it be like to use it, what makes the most sense, etc.The code part is trivial and a waste of time in some ways compared to time spent making decisions about what to build. And sometimes even a procrastination to avoid thinking about what to build, like how people who polish their game engine (easy) to avoid putting in the work to plan a fun game (hard).The more clarity you have about what you’re building, then the larger blocks of work you can delegate / outsource.So I think one overwhelming part of LLMs is that you don’t get the downtime of working on implementation since that’s now trivial; you are stuck doing the hard part of steering and planning. But that’s also a good thing.

SchemaLoad: I've found writing the code massively helps your understanding of the problem and what you actually need or want. Most times I go into a task with a certain idea of how it should work, and then reevaluate having started. While an LLM will just do what you ask without questing, leaving you with none of the learnings you would have gained having done it. The LLM certainly didn't learn or remember anything from it.

clickety_clack: I’d love to see what you’ve built. Can you share?

veryfancy: In agent-mode mode, IMO, the sweet spot is 2-3 concurrent tasks/sessions. You don’t want to sit waiting for it, but you don’t want to context-switch across more than a couple contexts yourself.

SchemaLoad: That sounds exhausting having to non stop prompt and review without a second to stop and think.

colecut: There is nothing dictating how long you stop and think for.

siliconc0w: I mostly do 2-3 agents yoloing with self "fresh eyes" review

dinkumthinkum: Does anyone else see this as dystopian? Someone is unironically writing about how exhausted they are and up at night thinking about how they can be a better good-boy at prompting the LLM and reminding us how we shouldn't cope by blaming the AI or its supposed limitations (context size, etc). This is not a dig at the author. It just seems crazy that this is an unironic post. It's like we are gleefully running to the "Laughterhouse" and each reminding our smiling fellow passengers not to be annoyed at the driver if he isn't getting us there fast enough, without realizing the Slaughterhouse (yes, I am stealing the reference).Another way you can read this is as a new cult member that his chiding himself whenever he might have an intrusive thought that Dear Leader may not be perfect, after all.

rednafi: I have always enjoyed the feeling of aporia during coding. Learning to embrace the confusion and the eventual frustration is part of the job. So I don’t mind running in a loop alongside an agent.But I absolutely loathe reviewing these generated PRs - more so when I know the submitter themselves has barely looked at the code. Now corporate has mandated AI usage and is asking people to do 10k LOC PRs every day. Reviewing this junk has become exhausting.I don’t want to read your code if you haven’t bothered to read it yourselves. My stance is: reviewing this junk is far more exhausting. Coding is actually the fun part.

jeremyjh: Most people reading this have probably had the experience of wasting hours debugging when exhausted, only to find it was a silly issue you’ve seen multiple times, or maybe you solve it in a few minutes the next morning.Working with an agent coding all day can be exhilarating but also exhausting - maybe it’s because consequential decisions are packed more tightly together. And yes cognition still matters for now.

bluebarbet: How is "llms" pronounced?

Apocryphon: I mean, how often do we feel the same thing about the compiler?

jeremyjh: In some cases, yes. But I’ve been doing this awhile now and there is a lot of code that has to be written that I will not learn anything from. And now, I have a choice to not write it.

anonzzzies: I always wonder where HNers worked or work; we do ERP and troubleshooting on legacy systems for medium to large corps; PRs by humans were always pretty random and barely looked at as well, even though the human wrote it (copy/pasted from SO and changed it somewhat); if you ask what it does they cannot tell you. This is not an exception, this is the norm as far as I can see outside HN. People who talk a lot, don't understand anything and write code that is almost alien. LLMs, for us, are a huge step up. There is a 40 nested if with a loop to prevent it from failing on a missing case in a critical Shell ERP system. LLMs would not do that. It is a nightmare but makes us a lot of money for keeping things like that running.

somewhereoutth: Of course. Any scenario where you are expected to deliver results using non-deterministic tooling is going to be painful and exhausting. Imagine driving a car that might dive one way or the other of its own accord, with controls that frequently changed how they worked. At the end of any reasonably sized journey you would be an emotional wreck - perhaps even an actual wreck.

raincole: If you care at code quality of course it is exhausting. It's supposed to be. Now there is more code for you to assure quality in the same length of time.

bombdailer: Until that becomes the metric measured in performance reviews.

grey-area: Maintenance is the hard part, not writing new code or steering and planning.

alchemism: L-L-Mms

razorbeamz: LLMs do not actually make anything better for anyone. You have to constantly correct them. It's like having a junior coder under your wing that never learns from its mistakes. I can't imagine anyone actually feeling productive using one to work.

jatora: You need to learn to use the tool better, clearly, if you have such an unhinged take as this.

vishnugupta: el el ems

Sirental: No to be fair I do see what he's saying. I see a major difference between the more expensive models and the cheaper ones. The cheaper (usually default) ones make mistakes all the damn time. You can be as clear as day with them and they simply don't have the context window or specs to make accurate, well reasoned desicions and it is a bit like having a terrible junior work alongside you, fresh out of university.

razorbeamz: The only people who use LLMs "as a tool" are those who are incapable of doing it without using it at all.

senectus1: I imagine code reviewing is a very different sort of skill than coding. When you vibe code (assuming you're reading teh code that is written for you) you become a coder reviewer... I suspect you're learning a new skill.

pessimizer: The way I've tried to deal with it is by forcing the LLM to write code that is clear, well-factored and easy to review i.e. continually forcing it to do the opposite of what it wants to do. I've had good outcomes but they're hard-won.The result is that I could say that it was code that I myself approved of. I can't imagine a time when I wouldn't read all of it, when you just let them go the results are so awful. If you're letting them go and reviewing at the end, like a post-programming review phase, I don't even know if that's a skill that can be mastered while the LLMs are still this bad. Can you really master Where's Waldo? Everything's a mess, but you're just looking for the part of the mess that has the bug?I'm not reviewing after I ask it to write some entire thing. I'm getting it to accomplish a minimal function, then layering features on top. If I don't understand where something is happening, or I see it's happening in too many places, I have to read the code in order to tell it how to refactor the code. I might have to write stubs in order to show it what I want to happen. The reading happens as the programming is happening.

qudat: It’s easier to write code than read it.

bmurphy1976: I don't know what to think about comments like this. So many of them come from accounts that are days or at most weeks old. I don't know if this is astroturfing, or you really are just a new account and this is your experience.As somebody who has been coding for just shy of 40 years and has gone through the actual pain on learning to run a high level and productive dev team, your experience does not match mine. Even great devs will forget some of the basics and make mistakes and I wish every junior (hell even seniors) were as effective as the LLMs are turning out to be. Put the LLM in the hands of a seasoned engineer who also has the skills to manage projects and mentor junior devs and you have a powerful accelerator. I'm seeing the outcome of that every day on my team. The velocity is up AND the quality is up.

jplusequalt: I don't feel this? When my code breaks, I'm more likely to get frustrated with myself.The only time I've felt something akin to this with a compiler is when I was learning Rust. But that went away after a week or two.

nightpool: I would hope that most people who are technically competent enough to be on HN are technically competent enough to quit orgs with coding standards that bad. Or, they're masochists who have taken on the chamllenge of working to fix them

sarchertech: I currently work at one of the biggest tech companies. I’ve been doing this for over 20 years, and I’ve worked at scrappy startups, unicorns, and medium size companies.I’ve certainly seen my share of what I call slot driven development where a developer just throws things at the wall until something mostly works. And plenty if cut and paste development.But it’s far from the majority. It’s usually the same few developers at a company doing it, while the people who know what they’re doing furiously work to keep things from falling apart.If the majority of devs were doing this nothing would work. My worry is that AI lets the bad devs produce this kind of work on a massive scale that overwhelms the good devs ability to fight back or to even comprehend the system.

shiandow: The one thing I don't quite get is how running a loop alongside an agent is any different from reviewing those PRs.

rubyrfranklin2: This resonates deeply. At heyvid we're constantly prompting LLMs for video scripts, metadata, thumbnails — and the inconsistency is genuinely draining. Some days it's magic, other days you spend 45 minutes coaxing a model into something a junior copywriter would nail in 5. I've started thinking of it less like using a tool and more like managing a brilliant but unpredictable contractor. You learn their quirks, you keep notes, you celebrate the wins. Doesn't make it less exhausting, but at least it reframes the frustration.

quantum_state: It seems to me that LLM is a tool after all. One needs to learn to use it effectively.

razorbeamz: Who would I possibly be astroturfing for? The entire industry is all-in on LLMs.

qudat: > The velocity is up AND the quality is up.This is not my experience on a team of experienced SWEs working on a product worth 100m/year.Agents are a great search engine for a codebase and really nice for debugging but anytime we have it write feature code it makes too many mistakes. We end up spending more time tuning the process than it takes to just write the code AND you are trading human context with agent context that gets wiped.

coffeefirst: Oh, entirely. But the hype cycle is such that people like this are blaming themselves for legitimate issues with the workflow design or hard limits on human cognition.My pet theory is we haven't figured out what the best way to use these tools are, or even seen all the options yet. But that's a bigger topic for another day.

anonzzzies: Tech companies. How about massive non software tech companies. I don't know where it is not the norm and I have been in very many of them as supplier for the past 30 years. Tech companies are a bit different as they usually have leadership that prioritizes these things.

olejorgenb: I find working more asynchronous with the agents help. I've disabled the in-your-face agent-is-done/need-input notifications [1]. I work across a few different tasks at my own pace. It works quite well, and when/if I find a rhythm to it, it's absolutely less intense than normal programming.You might think that the "constant" task switching is draining, but I don't switch that frequently. Often I keep the main focus on one task and use the waiting time to draft some related ideas/thoughts/next prompt. Or browse through the code for light review/understanding. It also helps to have one big/complex task and a few simpler things concurrently. And since the number of details required to keep "loaded" in your head per task is fewer, switching has less cost I think. You can also "reload" much quicker by simply chatting with the agent for a minute or two, if some detail have faded.I think a key thing is to NOT chase after keeping the agents running at max efficiency. It's ok to let them be idle while you finish up what your doing. (perhaps bad of KV cache efficiency though - I'm not sure how long they keep the cache)(And obviously you should run the agent in a sandbox to limit how many approvals you need to consider)[1] I use the urgent-window hint to get a subtle hint of which workspace contain an agent ready for input.

adi_kurian: Not at all.

bmurphy1976: I can't speak for you specifically, it's just a trend I'm seeing and unfortunately your 2 day old account falls into that bucket. There's a lot of people who have a lot to lose or who are very afraid of what LLMs will do. There's plenty of incentive to do this.I would be curious to see if I'm just imaging this or it really is a trend.

j3k3: At the same time you have astro-turfing from LLM producers though, so...

sigbottle: I am rewriting an agent framework from scratch because another agent framework, combined with my prompting, led to 2023-level regressions in alignment (completely faking tests, echoing "completed" then validating the test by grepping for the string "completed", when it was supposed to bootstrap a udp tunnel over ssh for that test...).Many top labs [1] [2] already have heavily automated code review already and it's not slowing down. That doesn't mean I'm trusting everything blindly, but yes, over time, it should handle less and less "lower level" tasks and it's a good thing if it can.[1] https://openai.com/index/harness-engineering/ [2] https://claude.com/blog/code-reviewFurther I want to vent about two things:- Things can be improved.- You are allowed to complain about anything, while not improving things yourself.I think the mid 2010s really popularized self improvement in a way that you can't really argue with (if you disagree with "put in more effort and be more focused", you're obviously just lazy!). It's funny because the point of engineering is to find better solutions, but technically yes, an always valid solution is just "suck it up".But moreover, if you do not allow these two premises, what ends up happening in practice for a lot of people, is that basically you can just interpret any slightly pushback as "oh they're just a whiner", and if they're not doing something to fix their problem this instant, that "obviously" validates your claim (and even if they are, it doesn't count, they should still not be a "debbie downer", etc.).Sometimes a premise can sound extreme, but people forget that premises are not in a complete logical vaccuum, you actually live out and believe said premises, and by taking on a certain position, it's often more about what follows downstream from the behavior than the actual words themselves.

j3k3: Id argue the read-write procedures are happening simultaneously as one goes along, writing code by hand.

j3k3: There's nothing more annoying than the feeling of "oh FFS why you doing that?!".Its amazing how right and wrong LLMs can be in the output produced. Personally the variance for me is too much... I cant stand when it gets things wrong on the most basic of stuff. I much prefer doing things without output from an LLM.

Reader /

Discussion

LLMs can be absolutely exhausting | Tom Johnell

tomjohnell.com

/ pin · @ user · Ctrl+Enter

No discussions yet

Discover

youtube.com

BANDIT a 32bit baremetal computer that runs Color Forth - YouTube

Baremetal computer runs ColorForth baremetal, 100% artisanal work by yours truly.links:https://dscf.co.uk/BANDIThttps://…

1 1

github.com

GitHub - fabraix/playground: A live environment to stress-test AI agent defenses…

A live environment to stress-test AI agent defenses through adversarial play 🧠 - fabraix/playground

1 2

msn.com

MSN

1 4

github.com

GitHub - EvanZhouDev/openai-oauth: Free OpenAI API access with your ChatGPT acco…

Free OpenAI API access with your ChatGPT account. Contribute to EvanZhouDev/openai-oauth development by creating an acco…

1 6

keubiko.substack.com

Nasdaq's Shame - Keubiko’s Musings

How to rig an index to appease a billionaire

1 24

quickchat.ai

I'm Too Lazy to Check Datadog Every Morning, So I Made AI Do It | Quickchat…

A programmer's laziness is a virtue. I connected Datadog to Claude Code and built a cron job that triages production ale…

LLMs can be absolutely exhausting | Tom Johnell

More from tomjohnell.com

Pike - Solving the "should we stop here or gamble on the next exit" pr…

Discover

BANDIT a 32bit baremetal computer that runs Color Forth - YouTube

GitHub - fabraix/playground: A live environment to stress-test AI agent defenses…

MSN

GitHub - EvanZhouDev/openai-oauth: Free OpenAI API access with your ChatGPT acco…

Nasdaq's Shame - Keubiko’s Musings

I'm Too Lazy to Check Datadog Every Morning, So I Made AI Do It | Quickchat…

LLMs can be absolutely exhausting | Tom Johnell

More from tomjohnell.com

Pike - Solving the &quot;should we stop here or gamble on the next exit&quot; pr…

Discover

BANDIT a 32bit baremetal computer that runs Color Forth - YouTube

GitHub - fabraix/playground: A live environment to stress-test AI agent defenses…

MSN

GitHub - EvanZhouDev/openai-oauth: Free OpenAI API access with your ChatGPT acco…

Nasdaq&#x27;s Shame - Keubiko’s Musings

I&#39;m Too Lazy to Check Datadog Every Morning, So I Made AI Do It | Quickchat…

Pike - Solving the "should we stop here or gamble on the next exit" pr…

Nasdaq's Shame - Keubiko’s Musings

I'm Too Lazy to Check Datadog Every Morning, So I Made AI Do It | Quickchat…