Discussion
The Isolation Trap
anonymous_user9: This seems interesting, bit the sheer density of LLM-isms make it hard to get through.
aeonfox: A real interesting read as someone who spends a bit of time with Elixir. Wasn't aware of the atomic and counter Erlang features that break isolation.Though they do say that race conditions are purely mitigated by discipline at design time, but then mention race conditions found via static analysis:> Maria Christakis and Konstantinos Sagonas built a static race detector for Erlang and integrated it into Dialyzer, Erlang’s standard static analysis tool. They ran it against OTP’s own libraries, which are heavily tested and widely deployed.> They found previously unknown race conditions. Not in obscure corners of the codebase. Not in exotic edge cases. In the kind of code that every Erlang application depends on, code that had been running in production for years.I imagine that the 4th issue of protocol violation could possibly be mitigated by a typesafe abstracted language like Gleam (or Elixir when types are fully implemented)
cyberpunk: Eh maybe. I work on a big, mature, production erlang system which has millions of processes per cluster and while the author is right in theory, these are quite extreme edge cases and i’ve never tripped over them.Sure, if you design a shit system that depends on ETS for shares state there are dangers, so maybe don’t do that?I’d still rather be writing this system in erlang than in another language, where the footguns are bigger.
WJW: > They found previously unknown race conditions. Not in obscure corners of the codebase. Not in exotic edge cases. In the kind of code that every Erlang application depends on, code that had been running in production for years.If these race conditions are in code that has been in production for years and yet the race conditions are "previously unknown", that does suggest to me that it is in practice quite hard to trigger these race conditions. Bugs that happen regularly in prod (and maybe I'm biased, but especially bugs that happen to erlang systems in prod) tend to get fixed.
aeonfox: True. And that the subtle bugs were then picked up by static analysis makes the safety proposition of Erlang even better.> Bugs that happen regularly in prodIt depends on how regular and reproducible they are. Timing bugs are notoriously difficult to pin down. Pair that with let-it-crash philosophy, and it's maybe not worth tracking down. OTOH, Erlang has been used for critical systems for a very long time – plenty long enough for such bugs to be tracked down if they posed real problems in practice.
kamma4434: The 4th issue is a feature- it’s what allows zero downtime hot updates.
lukeasrodgers: I don’t have much experience with pony but it seems like it addresses the core concerns in this article by design https://www.ponylang.io/discover/why-pony/. I wish it were more popular.
johnisgood: > This isn’t obviously wrongI thought it was obviously wrong. Server A calls Server B, and Server B calls server A. In what way is it not wrong? Because when I read the code my first thought was that it is circular. Is it really not obvious? Am I losing my mind?
bluGill: It is too common / useful. Not everything is a tree.
thesz: Erlang has "die and be restarted" philosophy towards process failures, so these "bugs that happen to erlang systems in prod" may not be fixed at all, if they are rare enough.
boxed: I think at this point comments like this are equivalent to saying "I didn't like this article, because it's written in too good English".
andrelaszlo: I would edit sentences like this:"Erlang is the strongest form of the isolation argument, and it deserves to be taken seriously, which is why what happens next matters."It doesn't add much, and it has this condescending and pretentious LLM tone. For me as a reader, it distracts from an otherwise interesting article.
trashburger: It shows a lack of care for the reader. Use your own words.
loloquwowndueo: Sorry, good English is good grammatically and structurally while being unique and feeling creative. and AI-written English is not good. It’s correct but totally repetitive, formulaic and circular. It’s like expecting a pizza and finding it’s made of cardboard.
loloquwowndueo: It wast obvious to the AI that wrote the article. There’s still hope for humans :)
allreduce: Most things are a dag tho. :)
JackC: The article argues that shared memory and message passing are the same thing because they share the same classes of potential failure modes.Isn't it more like, message passing is a way of constraining shared memory to the point where it's possible for humans to reason about most of the time?Sort of like rust and c. Yes, you can write code with 'unsafe' in rust that makes any mistake c can make. But the rules outside unsafe blocks, combined with the rules at module boundaries, greatly reduce the m * n polynomial complexity of a given size of codebase, letting us reason better about larger codebases.
bluGill: Most is not all. And those exceptions are annoying.
rando1234: I actually disagree, thought it read reasonably well and didn't feel LLMy at all.
toast0: As of now, the post you're replying to says "bugs that regularly happen ... in prod"Now, if it crashes every 10 years, that is regular, but I think the meaning is that it happens often. Back when I operated a large dist cluster, yes, some rare crashes happened that never got noticed or the triage was 'wait and see if it happens again' and it didn't happen. But let it crash and restart from a known good state is a philosophy about structuring error checking more than an operational philosophy: always check for success and if you don't know how to handle an error fail loudly and return to a good state to continue.Operationally, you are expected to monitor for crashes and figure out how to prevent them in the future. And, IMHO, be prepared to hot load fixes in response... although a lot of organizations don't hot load.