Discussion
denysvitali: Article: https://openai.com/index/introducing-gpt-5-4/gpt-5.4Input: $2.50 /M tokensCached: $0.25 /M tokensOutput: $15 /M tokens---gpt-5.4-proInput: $30 /M tokensOutput: $180 /M tokensWtf
elliotbnvl: Looks like it's an order of magnitude off. Missprint?
dpoloncsak: >" GPT‑5.4 is priced higher per token than GPT‑5.2 to reflect its improved capabilities"That's just not how pricing is supposed to work...? Especially for a 'non-profit'. You're charging me more so I know I have the better model?
FergusArgyll: Maybe it's finally a bigger pretrain?
glerk: Looks like fair price discovery :)
elicash: Can't you continue to use to older model, if you prefer the pricing?But they also claim this new model uses fewer tokens, so it still might ultimately be cheaper even if per token cost is higher.
gavinray: The "RPG Game" example on the blogpost is one of the most impressive demo's of autonomous engineering I've seen.It's very similar to "Battle Brothers", and the fact that RPG games require art assets, AI for enemy moves, and a host of other logical systems makes it all the more impressive.
bazmattaz: Anyone else feel that it’s exhausting keeping up with the pace of new model releases. I swear every other week there’s a new release!
coffeemug: Why do you need to keep up? Just use the latest models and don't worry about it.
paxys: "Here's a brand new state-of-the-art model. It costs 10x more than the previous one because it's just so good. But don't worry, if you don't want all this power you can continue to use the older one."A couple months later:"We are deprecating the older model."
dpoloncsak: I feel like that would have been highlighted then. "As this is a bigger pretrain, we have to raise prices".They're framing it pretty directly "We want you to think bigger cost means better model"
davnicwil: If you think about it there shouldn't really be a reason to care as long as things don't get worse.Presumably this is where it'll evolve to with the product just being the brand with a pricing tier and you always get {latest} within that, whatever that means (you don't have to care). They could even shuffle models around internally using some sort of auto-like mode for simpler questions. Again why should I care as long as average output is not subjectively worse.Just as I don't want to select resources for my SaaS software to use or have that explictly linked to pricing, I don't want to care what my OpenAI model or Anthropic model is today, I just want to pay and for it to hopefully keep getting better but at a minimum not get worse.
7777777phil: 83% win rate over industry professionals across 44 occupations.I'd believe it on those specific tasks. Near-universal adoption in software still hasn't moved DORA metrics. The model gets better every release. The output doesn't follow. Just had a closer look on those productivity metrics this week: https://philippdubach.com/posts/93-of-developers-use-ai-codi...
twitchard: Not sure DORA is that much of an indictment. For "Change Failure Rate" for instance these are subject to tradeoffs. Organizations likely have a tolerance level for Change Failure Rate. If changes are failing too often they slow down and invest. If changes aren't failing that much they speed up -- and so saying "change failure rate hasn't decreased, obviously AI must not be working" is a little silly."Change Lead Time" I would expect to have sped up although I can tell stories for why AI-assisted coding would have an indeterminate effect here too. Right now at a lot of orgs, the bottle neck is the review process because AI is so good at producing complete draft PRs quickly. Because reviews are scarce (not just reviews but also manual testing passes are scarce) this creates an incentive ironically to group changes into larger batches. So the definition of what a "change" is has grown too.
jbellis: You can, until they turn it off.Anthropic is pulling the plug on Haiku 3 in a couple months, and they haven't released anything in that price range to replace it.
NiloCK: This March 2026 blog post is citing a 2025 study based on Sonnet 3.5 and 3.7 usage.Given that organization who ran the study [1] has a terrifying exponential as their landing page, I think they'd prefer that it's results are interpreted as a snapshot of something moving rather than a constant.[1] - https://metr.org/
7777777phil: Good catch, thanks (I really wrote that myself.) Added a note to the post acknowledging the models used were Claude 3.5 and 3.7 Sonnet.