Discussion
Hypothesis, Antithesis, synthesis
DRMacIver: Post author here btw, happy to take questions, whether they're about Hegel in particular, property-based testing in general, or some variant on "WTF do you mean you wrote rust bindings to a python library?"
hugeBirb: Not that it matters at this point but the hegelian dialectic is not thesis, antithesis and synthesis. Usually attributed to Hegel but as I understand it he actually pushed back on this mechanical view of it all and his views on these transitory states was much more nuanced.
DRMacIver: Conversation with Will (Antithesis CEO) a couple months ago, heavily paraphrased:Will: "Apparently Hegel actually hated the whole Hegelian dialectic and it's falsely attributed to him."Me: "Oh, hm. But the name is funny and I'm attached to it now. How much of a problem is that?"Will: "Well someone will definitely complain about it on hacker news."Me: "That's true. Is that a problem?"Will: "No, probably not."(Which is to say: You're entirely right. But we thought the name was funny so we kept it. Sorry for the philosophical inaccuracy)
wwilson: If I had been wearing my fiendish CEO hat at the time, I might have even said something like: "somebody pointing this out will be a great way to jumpstart discussion in the comments."One of the evilest tricks in marketing to developers is to ensure your post contains one small inaccuracy so somebody gets nerdsniped... not that I have ever done that.
pron: > property-based testing is going to be a huge part of how we make AI-agent-based software development not go terribly.There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included. But the problem remains verifying that the tests actually test what they're supposed to. Mutation tests can allow agents to get good coverage with little human intervention, and PBT can make tests better and more readable. But still, people have to read them and understand them, and I suspect that many people who claim to generate thousands of LOC per day don't.And even if the tests were great and people carefully reviewed them, that's not enough to make sure things don't go terribly wrong. Anthropic's C compiler experiment didn't fail because of bad testing. Not only were the tests good, it took humans years to write the tests by hand, and the agents still failed to converge.I think good tests are a necessary condition for AI not generating terrible software, but we're clearly not yet at a point where they're a sufficient one. So "a huge part" - possibly, but there are other huge parts still missing.
js8: > There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included.Given Curry-Howard isomorphism, couldn't we ask AI to directly prove the property of the binary executable under the assumption of the HW model, instead of running PBTs?By no means I want to dismiss PBTs - but it seems that this could be both faster and more reliable.
DRMacIver: > But the problem remains verifying that the tests actually test what they're supposed to.Definitely. It's a lot harder to fake this with PBT than with example-based testing, but you can still write bad property-based tests and agents are pretty good at doing so.I have generally found that agents with property-based tests are much better at not lying to themselves about it than agents with just example-based testing, but I still spend a lot of time yelling at Claude.> So "a huge part" - possibly, but there are other huge parts still missing.No argument here. We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.
ngruhn: > I have generally found that agents with property-based tests are much better at not lying to themselvesI also observed the cheating to increase. I recently tried to do a specific optimization on a big complex function. Wrote a PBT that checks that the original function returns the same values as the optimized function on all inputs. I also tracked the runtime to confirm that performance improved. Then I let Claude loose. The PBT was great at spotting edge cases but eventually Claude always started cheating: it modified the test, it modified the original function, it implemented other (easier) optimizations, ...
DRMacIver: Ouch. Classic Claude. It does tend to cheat when it gets stuck, and I've had some success with stricter harnesses, reflection prompts and getting it to redo work when it notices it's cheated, but it's definitely not a solved problem.My guess is that you wouldn't have had a better time without PBT here and it would still have either cheated or claimed victory incorrectly, but definitely agreed that PBT can't fully fix the problem, especially if it's PBT that the agent is allowed to modify. I've still anecdotally found that the results are better than without it because even if agents will often cheat when problems are pointed out, they'll definitely cheat if problems aren't pointed out.
zero0529: I remember first learning about Hegel when playing Fallout NV. Caesar made it seem so simple.
skybrian: It isn't used by anyone besides me, but I wrote a property-testing library for Deno [1] that has a form of "sometimes" assertions (inspired by Antithesis) and uses "internal shrinking" (inspired by Hypothesis).But it's still a "blind" fuzzer and it would be nice to write one that gets feedback from code coverage somehow. Instead, you have to run code coverage yourself and figure out how to change test data generation to improve it.[1] https://jsr.io/@skybrian/repeat-test
mullr: Why would I use this over the existing Proptest library in Rust?
lwhsiao: DRMacIver, can you comment on how this fits into the existing property-based testing ecosystems for various languages? E.g., if I use proptest in Rust, why would/should I switch to Hegel?
DRMacIver: The short answer to how it fits into existing ecosystems is... in competition I suppose. We've got a lot of respect for the people working on these libraries, but we think the Hypothesis-based approach is better than the various approaches people have adopted. I don't love that the natural languages for us to start with are ones where there are already pretty good property-based testing libraries whose toes we're stepping on, but it ended up being the right choice because those are the languages people care about writing correct software in, and also the ones we most want the tools in ourselves!I think right now if you're a happy proptest user it's probably not clear that you should switch to Hegel. I'd love to hear about people trying, but I can't hand on my heart say that it's clearly the correct thing for you to do given its early state, even though I believe it will eventually be.But roughly the things that I think are clearly better about the Hegel approach and why it might be worth trying Hegel if you're starting greenfield are:* Much better generator language than proptest (I really dislike proptest's choices here. This is partly personal aesthetic preferences, but I do think the explicitly constructed generators work better as an approach and I think this has been borne out in Hypothesis). Hegel has a lot of flexible tooling for generating the data you want.* Hegel gets you great shrinking out of the box which always respects the validity requirements of your data. If you've written a generator to always ensure something is true, that should also be true of your shrunk data. This is... only kindof true in proptest at best. It's not got quite as many footguns in this space as original quickcheck and its purely type-based shrinking, but you will often end up having to make a choice between shrinking that produces good results and shrinking that you're sure will give you valid data.* Hegel's test replay is much better than seed saving. If you have a failing test and you rerun it, it will almost immediately fail again in exactly the same way. With approaches that don't use the Hypothesis model, the best you can hope for is to save a random seed, then rerun shrinking from that failing example, which is a lot slower.There are probably a bunch of other quality of life improvements, but these are the things that have stood out to me when I've used proptest, and are in general the big contrast between the Hypothesis model and the more classic QuickCheck-derived ones.
AndrewKemendo: Eh… it’s always worth keeping in mind the time period and what was going on with the tooling for mathematics and science at the time.Statistics wasn’t really quite mature enough to be applied to let’s say political economy a.k.a. economics which is what Hegel was working in.JB Say (1) was the leading mind in statistics at the time but wasn’t as popular in political circles (Notably Proudhon used Says work as epistemology versus Hegel and Marx)I’ve been in serious philosophy courses where they take the dialectic literally and it is the epistemological source of reasoning so it’s not goneThis is especially true in how marx expanded into dialectical materialism - he got stuck on the process as the right epistemological approach, and marxists still love the dialectic and Hegelian roots (zizek is the biggest one here).The dialectic eventually fell due to robust numerical methods and is a degenerate version version of the sampling Markov Process which is really the best in class for epistemological grounding.Someone posted this here years ago and I always thought it was a good visual: https://observablehq.com/@mikaelau/complete-system-of-philos...
sigbottle: I thought the dialectic was just a proof methodology, and especially the modern political angles you might year from say a Youtube video essay on Hegel, was because of a very careful narrative from some french dude (and I guess Marx with his dialectical materialism). I mean, I agree with many perspectives from 20th century continental philosophy, but it has to be agreed that they refactored Hegel for their own purposes, no?
AndrewKemendo: Oh the amount of branching and forking and remixing of Hegel is more or less infiniteI think it’s worth again pointing out that Hegel was at the height of contemporary philosophy at the time but he wasn’t a mathematician and this is the key distinction.Hagel lives in the pre-mathematical economics world. The continental philosophy world of words with Kant etc… and never crossed into the mathematical world. So I liking it too he was doing limited capabilities and tools that he hadAgain compare this to the scientific process described by Francis Bacon. There are no remixes to that there’s just improvements.Ultimately using the dialectic is trying to use an outdated technology for understanding human behavior
sigbottle: I mean I don't know about Hegel, but Kant certainly dipped into mathematics. One of the reasons why he even wrote CPR was to unify in his mind, the rationalists (had Leibniz) versus the empiricists (had Newton). 20th century analytic philosophy was heavily informed by Kantian distinctions (Logical Positivism uses very similar terminology, and Carnap himself was a Neo-Kantian originally, though funnily enough Heidegger also was). In the 21st century, It seems like overall philosophy has gotten more specialized and grounded and people have moved away from one unified system of truth, and have gotten more domain-driven, both in continental and analytic philosophy.It's no doubt that basically nobody could've predicted a priori 20th century mathematics and physics. Not too familiar with the physics side, but any modern philosopher who doesn't take computability seriously isn't worth their salt, for example. Not too familiar with statistics but I believe you that statistics and modern economic theories could disprove say, Marxism as he envisioned it.That definitely doesn't mean that all those tools from back then are useless or even just misinformed IMO. I witness plenty of modern people (not you) being philosophically bankrupt when making claims.
AndrewKemendo: My claim is that genuinely all of those previous analytical forms are absolutely useless if you have the capacity to utilize a more updated framework but the challenges that those more mathematically challenging frameworks are inaccessible to the majority of the people and so they don’t actually take off because there’s no mechanism to translate more rigor in social studies and social sciences because humans reject the concept of being measured as though they were machine machines which is understandableSo as a function applications of mathematics trended towards things that were not human focused and they were machine focused and financial focusedSo the big transition happened after TV and Internet (really just low cost high reach advertising) became pervasive and social scientists began utilizing statistical methods across consumer and attention action as social science experimentation platformsSocial science moved from the squishy into the precise precisely to give companies a market advantage in capturing market share through manipulating human behaviorultimately that was the wet dream of political philosophers since pahotepHegel is irrelevant in the age of measurement
DRMacIver: Answered this over here: https://news.ycombinator.com/item?id=47506274