Hi David, congratulations on the release! I'm excited to play around with Hypothesis's bitstream-based shrinking. As you're aware, prop_flat_map is a pain to deal with, and I'd love to replace some of my proptest-based tests with Hegel.
I spent a little time looking at Hegel last week and it wasn't quite clear to me how I'd go about having something like a canonical generator for a type (similar to proptest's Arbitrary). I've found that to be very helpful while generating large structures to test something like serialization roundtripping against — in particular, the test-strategy library has derive macros that work very well for business logic types with, say, 10-15 enum variants each of which may have 0-10 subfields. I'm curious if that is supported today, or if you have plans to support this kind of composition in the future.
edit: oh I completely missed the macro to derive DefaultGenerator! Whoops
tybug [3 hidden]5 mins ago
Yep, `#[derive(DefaultGenerator)]` and `generators::default<T>()` are the right tools here.
This is one of the areas we've dogfooded the least, so we'd definitely be happy to get feedback on any sharp corners here!
I think `from_type` is one of Hypothesis's most powerful and ergonomic strategies, and that while we probably can't get quite to that level in rust, we can still get something that's pretty great.
sunshowers [3 hidden]5 mins ago
Thank you! I have some particularly annoying proptest-based tests that I'll try porting over to Hegel soon. (Thanks for writing the Claude skill to do this.)
LoganDark [3 hidden]5 mins ago
Using Python from other languages is terrible. I love this kind of testing but this implementation is not for me. I was so excited before learning it depends on Python.
pron [3 hidden]5 mins ago
> property-based testing is going to be a huge part of how we make AI-agent-based software development not go terribly.
There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included. But the problem remains verifying that the tests actually test what they're supposed to. Mutation tests can allow agents to get good coverage with little human intervention, and PBT can make tests better and more readable. But still, people have to read them and understand them, and I suspect that many people who claim to generate thousands of LOC per day don't.
And even if the tests were great and people carefully reviewed them, that's not enough to make sure things don't go terribly wrong. Anthropic's C compiler experiment didn't fail because of bad testing. Not only were the tests good, it took humans years to write the tests by hand, and the agents still failed to converge.
I think good tests are a necessary condition for AI not generating terrible software, but we're clearly not yet at a point where they're a sufficient one. So "a huge part" - possibly, but there are other huge parts still missing.
zoogeny [3 hidden]5 mins ago
> t took humans years to write the tests by hand, and the agents still failed to converge.
I think there is some hazard in assuming that what agents fail at today they will continue to fail on in the future.
What I mean is, if we take the optimistic view of agents continuing to improve on the trajectory they have started at for one or two years, then it is worth while considering what tools and infrastructure we will need for them. Companies that start to build that now for the future they assume is coming are going to be better positioned than people who wake up to a new reality in two years.
pmbauer [3 hidden]5 mins ago
> I think there is some hazard in assuming that what agents fail at today they will continue to fail on in the future.
I think there is some hazard in assuming a seemingly exponential curve has no asymptotes, otherwise known as faith.
zoogeny [3 hidden]5 mins ago
That is what the market is for!
I'm just pointing out "we don't need this right now" isn't necessarily an argument against "we don't need this".
There is a saying that isn't perfect but may apply: better to have it and not need it then to need it and not have it.
Here is another way of looking at it. Let's say agents don't meet the hyped up expectations and we build all of this robust tooling for nothing. So we have all of this work towards creating autonomous testing systems but we don't have the autonomous agents. That still seems like a decent outcome.
When we plan around optimistic views of the future, we tend to build generally useful things.
js8 [3 hidden]5 mins ago
> There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included.
Given Curry-Howard isomorphism, couldn't we ask AI to directly prove the property of the binary executable under the assumption of the HW model, instead of running PBTs?
By no means I want to dismiss PBTs - but it seems that this could be both faster and more reliable.
skybrian [3 hidden]5 mins ago
Proofs are a form of static analysis. Static analysis can find interesting bugs, but how a system behaves isn't purely a property of source code. It won't tell you whether the code will run acceptably in a given environment.
For example, if memory use isn't modelled, it won't tell you how big the input can be before the system runs out of memory. Similarly, if your database isn't modelled then you need to test with a real database. Web apps need to test with a real web browser sometimes, rather than a simplified model of one. Databases and web browsers are too complicated to build a full-fidelity mathematical model for.
When testing with real systems there's often the issue that the user's system is different from the one you use to test. You can test with recent versions of Chrome and Firefox, etc, which helps a lot, but what about extensions?
Nothing covers everything, but property tests and fuzzers actually run the code in some test environment. That's going to find different issues than proofs will.
js8 [3 hidden]5 mins ago
> Databases and web browsers are too complicated to build a full-fidelity mathematical model for.
I disagree - thanks to Curry-Howard isomorphism, the full-fidelity mathematical model of a database or web browser are their binaries themselves.
We could have compilers provide theorems (with proof) of correctness of the translation from source to machine code, and library functions could provide useful theorems about the resource use.
Then, if the AI can reason about the behavior of the source code, it can also build the required proof of correctness along with it.
Groxx [3 hidden]5 mins ago
And how do you know if it has proven the property you want, instead of something that's just complicated looking but evaluates to true?
js8 [3 hidden]5 mins ago
The AI would build a proof of correctness, which would be then verified in a proof checker (not AI).
groby_b [3 hidden]5 mins ago
> Given Curry-Howard isomorphism, couldn't we ask AI to directly prove the property of the binary executable under the assumption of the HW model, instead of running PBTs?
Yes, in principle. Given unlimited time and a plentiful supply of unicorns.
Otherwise, no. It is well beyond the state of the art in formal proofs for the general case, and it doesn't become possible just because we "ask AI".
And unless you provide a formal specification of the entire set of behavior, it's still not much better than PBT -- the program is still free to do whatever the heck it wants that doesn't violate the properties formally specified.
DRMacIver [3 hidden]5 mins ago
> But the problem remains verifying that the tests actually test what they're supposed to.
Definitely. It's a lot harder to fake this with PBT than with example-based testing, but you can still write bad property-based tests and agents are pretty good at doing so.
I have generally found that agents with property-based tests are much better at not lying to themselves about it than agents with just example-based testing, but I still spend a lot of time yelling at Claude.
> So "a huge part" - possibly, but there are other huge parts still missing.
No argument here. We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.
pron [3 hidden]5 mins ago
> We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.
Yeah, I know. Just an opportunity to talk about some of the delusions we're hearing from the "CEO class". Keep up the good work!
ngruhn [3 hidden]5 mins ago
> I have generally found that agents with property-based tests are much better at not lying to themselves
I also observed the cheating to increase. I recently tried to do a specific optimization on a big complex function. Wrote a PBT that checks that the original function returns the same values as the optimized function on all inputs. I also tracked the runtime to confirm that performance improved. Then I let Claude loose. The PBT was great at spotting edge cases but eventually Claude always started cheating: it modified the test, it modified the original function, it implemented other (easier) optimizations, ...
DRMacIver [3 hidden]5 mins ago
Ouch. Classic Claude. It does tend to cheat when it gets stuck, and I've had some success with stricter harnesses, reflection prompts and getting it to redo work when it notices it's cheated, but it's definitely not a solved problem.
My guess is that you wouldn't have had a better time without PBT here and it would still have either cheated or claimed victory incorrectly, but definitely agreed that PBT can't fully fix the problem, especially if it's PBT that the agent is allowed to modify. I've still anecdotally found that the results are better than without it because even if agents will often cheat when problems are pointed out, they'll definitely cheat if problems aren't pointed out.
tybug [3 hidden]5 mins ago
I actually think there's another angle here where PBT helps, which wasn't explored in the blog post.
That angle is legibility. How do you know your AI-written slop software is doing the right thing? One would normally read all the code. Bad news: that's not much less labor intensive as not using AI at all.
But, if one has comprehensive property-based tests, they can instead read only the property-based tests to convince themselves the software is doing the right thing.
By analogy: one doesn't need to see the machine-checked proof to know the claim is correct. One only needs to check the theorem statement is saying the right thing.
pron [3 hidden]5 mins ago
Right, I said that property based tests are easier to read, and that's good. But people still have to actually read them. Also, because they still work best at the "unit" level, to understand them, the people reading them need to know how all the units are connected (e.g. a single person cannot review even PBTs required for 10KLOC per day [1]).
My point isn't so much about PBT, but about how we don't yet know just how much agents help write real software (and how to get the most help from them).
[1]: I'm only using that number because Garry Tan, CEO of YC, claimed to generate 10K lines of text per day that he believes to be working code and developers working with AI agents know they can't be.
DRMacIver [3 hidden]5 mins ago
Post author here btw, happy to take questions, whether they're about Hegel in particular, property-based testing in general, or some variant on "WTF do you mean you wrote rust bindings to a python library?"
Chinjut [3 hidden]5 mins ago
You mention in the post that there are design differences between Hegel/Hypothesis and QuickCheck, partly due to attitude differences between Python/non-Haskell programmers and Haskell programmers. As someone coming from the Haskell world (though by no means considering Haskell a perfect language), could you expand on what kinds of differences these are?
DRMacIver [3 hidden]5 mins ago
So I think a short list of big API differences are something like:
* Hypothesis/Hegel are very much focused on using test assertions rather than a single property that can be true or false. This naturally drives a style that is much more like "normal" testing, but also has the advantage that you can distinguish between different types of failing test. We don't go too hard on this, but both Hegel and Hypothesis will report multiple distinct failures if your test can fail in multiple ways.
* Hegelothesis's data generation and how it interacts with testing is much more flexible and basically fully imperative. You can basically generate whatever data you like wherever in your test you like, freely interleaving data generation and test execution.
* QuickCheck is very much type-first and explicit generators as an afterthought. I think this is mostly a mistake even in Haskell, but in languages where "just wrap your thing in a newtype and define a custom implementation for it" will get you a "did you just tell me to go fuck myself?" response, it's a nonstarter. Hygel is generator first, and you can get the default generator for a type if you want but it's mostly a convenience function with the assumption that you're going to want a real generator specification at some point soon.
From an implementation point of view, and what enables the big conveniences, Hypothesis has a uniform underlying representation of test cases and does all its operations on them. This means you get:
* Test caching (if you rerun a failing test, it will immediately fail in the same way with the previously shrunk example)
* Validity guarantees on shrinking (your shrunk test case will always be ones your generators could have produced. It's a huge footgun in QuickCheck that you can shrink to an invalid test case)
* Automatically improving the quality of your generators, never having to write your own shrinkers, and a whole bunch of other quality of life improvements that the universal representation lets us implement once and users don't have to care about.
The validity thing in particular is a huge pain point for a lot of users of PBT, and is what drove a lot of the core Hypothesis model to make sure that this problem could never happen.
The test caching is because I personally hated rerunning tests and not knowing whether it was just a coincidence that they were passing this time or that the test case had changed.
anentropic [3 hidden]5 mins ago
TBH reading the first few words of that section I was definitely expecting it to continue "so we used Claude to rewrite Hypothesis in Rust..." so that was quite a surprise!
DRMacIver [3 hidden]5 mins ago
It's on the agenda! We definitely want to rewrite the Hegel core server in rust, but not as much as we wanted to get it working well first.
My personal hope is that we can port most of the Hypothesis test suite to hegel-rust, then point Claude at all the relevant code and tell it to write us a hegel-core in rust with that as its test harness. Liam thinks this isn't going to work, I think it's like... 90% likely to get us close enough to working that we can carry it over the finish line. It's not a small project though. There are a lot of fiddly bits in Hypothesis, and the last time I tried to get Claude to port it to Rust the result was better than I expected but still not good enough to use.
mullr [3 hidden]5 mins ago
Why would I use this over the existing Proptest library in Rust?
It isn't used by anyone besides me, but I wrote a property-testing library for Deno [1] that has a form of "sometimes" assertions (inspired by Antithesis) and uses "internal shrinking" (inspired by Hypothesis).
But it's still a "blind" fuzzer and it would be nice to write one that gets feedback from code coverage somehow. Instead, you have to run code coverage yourself and figure out how to change test data generation to improve it.
> But it's still a "blind" fuzzer and it would be nice to write one that gets feedback from code coverage somehow
There have been simplistic attempts at this, e.g. instead of performing 100 tests, just keep going as long as coverage increases.
The Choice Gradient Sampling algorithm from https://arxiv.org/pdf/2203.00652 feels like a nice way to steer generators in a more nuanced way. That paper uses it to avoid discards when rejection-sampling; but I have a feeling it could be repurposed to "reward" based on new coverage instead/as-well.
lwhsiao [3 hidden]5 mins ago
DRMacIver, can you comment on how this fits into the existing property-based testing ecosystems for various languages? E.g., if I use proptest in Rust, why would/should I switch to Hegel?
DRMacIver [3 hidden]5 mins ago
The short answer to how it fits into existing ecosystems is... in competition I suppose. We've got a lot of respect for the people working on these libraries, but we think the Hypothesis-based approach is better than the various approaches people have adopted. I don't love that the natural languages for us to start with are ones where there are already pretty good property-based testing libraries whose toes we're stepping on, but it ended up being the right choice because those are the languages people care about writing correct software in, and also the ones we most want the tools in ourselves!
I think right now if you're a happy proptest user it's probably not clear that you should switch to Hegel. I'd love to hear about people trying, but I can't hand on my heart say that it's clearly the correct thing for you to do given its early state, even though I believe it will eventually be.
But roughly the things that I think are clearly better about the Hegel approach and why it might be worth trying Hegel if you're starting greenfield are:
* Much better generator language than proptest (I really dislike proptest's choices here. This is partly personal aesthetic preferences, but I do think the explicitly constructed generators work better as an approach and I think this has been borne out in Hypothesis). Hegel has a lot of flexible tooling for generating the data you want.
* Hegel gets you great shrinking out of the box which always respects the validity requirements of your data. If you've written a generator to always ensure something is true, that should also be true of your shrunk data. This is... only kindof true in proptest at best. It's not got quite as many footguns in this space as original quickcheck and its purely type-based shrinking, but you will often end up having to make a choice between shrinking that produces good results and shrinking that you're sure will give you valid data.
* Hegel's test replay is much better than seed saving. If you have a failing test and you rerun it, it will almost immediately fail again in exactly the same way. With approaches that don't use the Hypothesis model, the best you can hope for is to save a random seed, then rerun shrinking from that failing example, which is a lot slower.
There are probably a bunch of other quality of life improvements, but these are the things that have stood out to me when I've used proptest, and are in general the big contrast between the Hypothesis model and the more classic QuickCheck-derived ones.
I've talked with lots of people in the PBT world who have always seen something like this as the end goal of the PBT ecosystem. It seemed like a thing that would happen eventually, someone just had to do it. I'm super excited to actually be doing it and bringing great PBT to every and any language.
It doesn't hurt that this is coming right as great PBT in every language is suddenly a lot more important thanks to AI code!
rdevilla [3 hidden]5 mins ago
This is the first time in my HN membership where I was excited to read about the dialectic, only to be disappointed upon finding out the article is about Rust.
PBT is for sure the future - which is apparently now? 10 years ago when I was talking about QuickCheck [0] all the JS and Ruby programmers in my city just looked at me like I had two heads.
TBF PBT has been the present in Python for a while now.
10 years ago might have been a little early (Hypothesis 1.0 came out 11 years ago this coming Thursday), but we had pretty wide adoption by year two and it's only been growing. It's just that the other languages have all lagged behind.
It's by no means universally adopted, but it's not a weird rare thing that nobody has heard of.
hugeBirb [3 hidden]5 mins ago
Not that it matters at this point but the hegelian dialectic is not thesis, antithesis and synthesis. Usually attributed to Hegel but as I understand it he actually pushed back on this mechanical view of it all and his views on these transitory states was much more nuanced.
jjgreen [3 hidden]5 mins ago
"Not that it matters ...", What? Of course it matters! I only come to HN for extended arguments on the meaning of the Dialectic.
AndrewKemendo [3 hidden]5 mins ago
I gave you one in a sibling ;)
DRMacIver [3 hidden]5 mins ago
Conversation with Will (Antithesis CEO) a couple months ago, heavily paraphrased:
Will: "Apparently Hegel actually hated the whole Hegelian dialectic and it's falsely attributed to him."
Me: "Oh, hm. But the name is funny and I'm attached to it now. How much of a problem is that?"
Will: "Well someone will definitely complain about it on hacker news."
Me: "That's true. Is that a problem?"
Will: "No, probably not."
(Which is to say: You're entirely right. But we thought the name was funny so we kept it. Sorry for the philosophical inaccuracy)
wwilson [3 hidden]5 mins ago
If I had been wearing my fiendish CEO hat at the time, I might have even said something like: "somebody pointing this out will be a great way to jumpstart discussion in the comments."
One of the evilest tricks in marketing to developers is to ensure your post contains one small inaccuracy so somebody gets nerdsniped... not that I have ever done that.
1-more [3 hidden]5 mins ago
A sort of broadening of Cunningham's Law (the fastest way to get an answer online is not by posting the question, but by posting the wrong answer—very true in my experience). If there's no issue of fact at hand, then you end up getting some engagement about the intentional malapropism/misattribution/mistake/whatever and then the forum rules tend to herd participants back to discussing the matter at hand: your company.
Trump did this a lot with the legacy media in his first term. He would make inaccurate statements to the media on the topic he wanted to be in the spotlight, and the media would jump to "fact check" him. Guess what, now everyone is talking about illegal immigration, tariffs, or whatever subject Trump thought was to their advantage.
amalcon [3 hidden]5 mins ago
"No such thing as bad publicity" is a very old idea. That quote is usually attributed to PT Barnum, but the idea is much older than him.
dfabulich [3 hidden]5 mins ago
If that's not motivation enough for you to rename it, well, TypeScript already has a static type checker called Hegel. https://hegel.js.org/ (It's a stronger type system than TypeScript.)
DRMacIver [3 hidden]5 mins ago
We looked at it and given that the repo was archived nearly two years ago decided it wasn't a problem.
cmrdporcupine [3 hidden]5 mins ago
I think it's more that Hegel was fine with "dialectics" but that the antithesis/synthesis stuff is not actually what's going on in his dialectic. It's a bit of a popular misconception about the role of negation and "movement" in Hegel.
Also if you're not being complained about on HN, are you even really nerd-ing?
sigbottle [3 hidden]5 mins ago
From what I understand, it's a proof technique (other techniques include Kant's Transcendental Deduction or Descartes's pure doubt) that requires generating new conceptual thoughts via internal contradiction and showing necessarily that you lead from one category to the next.
The necessity thing is the big thing - why unfold in this way and not some other way. Because the premises in which you set up your argument can lead to extreme distortions, even if you think you're being "charitable" or whatever. Descartes introduced mind-body dualisms with the method of pure doubt, which at a first glance seemingly is a legitimate angle of attack.
Unfortunately that's about as nuanced as I know. Importantly this excludes out a wide amount of "any conflict that ends in a resolution validates Hegel" kind of sophistry.
viccis [3 hidden]5 mins ago
>other techniques include Kant's Transcendental Deduction or Descartes's pure doubt
This is not quite accurate. Kant says very explicitly in the (rarely studied) Transcendental Doctrine of Method (Ch 1 Section 4, A789/B817) that this kind of proof method (he calls it "apagogic") is unsuitable to transcendental proofs.
You might be thinking of the much more well studied Antinomies of Pure Reason, in which he uses this kind of proof negatively (which is to say, the circumscribe the limits of reason) as part of his proof against the way the metaphysical arguments from philosophers of his time (which he called "dogmatic" use of reason) about the nature of the cosmos were posed.
The method he used in his Deduction is a transcendental argument, which is typically expressed using two things, X and Y. X is problematic (can be true but not necessarily so), and Y is dependent on X. So then if Y is true, then X must necessarily be true as well.
sigbottle [3 hidden]5 mins ago
Sorry I meant "proof method" as more like "this was this guy's angle of attack", not that they would've thought each others angles were valid at all or that they're commensurable with say, 20th century formal proof logic (or Aristotelian logic for example). Descartes and Leibniz were squarely the rationalists that Kant wanted to abolish, and Hegel rejected Kants distinction between noumena and phenomena entirely, so they're already starting from very different places.
I guess it would be more accurate to state Kants actual premises here as making the distinction between appearance and thing-in-itself rather than the deduction, but the deduction technique itself was fascinating when I first learned it so that's what I associate most with Kant lol.
I guess I have not thought critically why we couldn't use a Transcendental argument to support Descartes. I just treated it as a vague category error (to be fair I don't actually know Descartes philosophy that well, even less than I know Kants lol). Could be a fun exercise when I have time.
viccis [3 hidden]5 mins ago
>I guess I have not thought critically why we couldn't use a Transcendental argument to support Descartes.
The previous section within the Transcendental Dialectic that focuses on the nature of the soul goes into a refutation of Descartes' statement. Kant basically finds "I think therefore I am" to be a tautology that only works by equivocating the "I" in each clause. "I think" pretends that the "I" there is an object in the world which it then compares to the "I am" which is an object in the world. Kant argues that "I think" does not actually demonstrate an "I" that is an object but rather a redundant qualification of thinking.
I am being a bit imprecise, so here is SEP's summary:
>For in each case, Kant thinks that a feature of self-consciousness (the essentially subjectival, unitary and identical nature of the “I” of apperception) gets transmuted into a metaphysics of a self (as an object) that is ostensibly “known” through reason alone to be substantial, simple, identical, etc. This slide from the “I” of apperception to the constitution of an object (the soul) has received considerable attention in the secondary literature, and has fueled a great deal of attention to the Kantian theory of mind and mental activity.
>The claim that the ‘I’ of apperception yields no object of knowledge (for it is not itself an object, but only the “vehicle” for any representation of objectivity as such) is fundamental to Kant’s critique of rational psychology.
I remember first learning about Hegel when playing Fallout NV. Caesar made it seem so simple.
biggestlou [3 hidden]5 mins ago
This is 100% true and a major pet peeve of mine.
AndrewKemendo [3 hidden]5 mins ago
Eh… it’s always worth keeping in mind the time period and what was going on with the tooling for mathematics and science at the time.
Statistics wasn’t really quite mature enough to be applied to let’s say political economy a.k.a. economics which is what Hegel was working in.
JB Say (1) was the leading mind in statistics at the time but wasn’t as popular in political circles (Notably Proudhon used Says work as epistemology versus Hegel and Marx)
I’ve been in serious philosophy courses where they take the dialectic literally and it is the epistemological source of reasoning so it’s not gone
This is especially true in how marx expanded into dialectical materialism - he got stuck on the process as the right epistemological approach, and marxists still love the dialectic and Hegelian roots (zizek is the biggest one here).
The dialectic eventually fell due to robust numerical methods and is a degenerate version version of the sampling Markov Process which is really the best in class for epistemological grounding.
I thought the dialectic was just a proof methodology, and especially the modern political angles you might year from say a Youtube video essay on Hegel, was because of a very careful narrative from some french dude (and I guess Marx with his dialectical materialism). I mean, I agree with many perspectives from 20th century continental philosophy, but it has to be agreed that they refactored Hegel for their own purposes, no?
AndrewKemendo [3 hidden]5 mins ago
Oh the amount of branching and forking and remixing of Hegel is more or less infinite
I think it’s worth again pointing out that Hegel was at the height of contemporary philosophy at the time but he wasn’t a mathematician and this is the key distinction.
Hagel lives in the pre-mathematical economics world. The continental philosophy world of words with Kant etc… and never crossed into the mathematical world. So I liking it too he was doing limited capabilities and tools that he had
Again compare this to the scientific process described by Francis Bacon. There are no remixes to that there’s just improvements.
Ultimately using the dialectic is trying to use an outdated technology for understanding human behavior
ux266478 [3 hidden]5 mins ago
> The continental philosophy world of words with Kant
Interestingly, a lot of arguments and formulations Kant had were lifted from Leibniz and reframed with a less mathematical flavor. I remember in particular his argument against infinite regress was pretty much pound for pound just reciting some conjecture from Leibniz (without attribution)
sigbottle [3 hidden]5 mins ago
I mean I don't know about Hegel, but Kant certainly dipped into mathematics. One of the reasons why he even wrote CPR was to unify in his mind, the rationalists (had Leibniz) versus the empiricists (had Newton). 20th century analytic philosophy was heavily informed by Kantian distinctions (Logical Positivism uses very similar terminology, and Carnap himself was a Neo-Kantian originally, though funnily enough Heidegger also was). In the 21st century, It seems like overall philosophy has gotten more specialized and grounded and people have moved away from one unified system of truth, and have gotten more domain-driven, both in continental and analytic philosophy.
It's no doubt that basically nobody could've predicted a priori 20th century mathematics and physics. Not too familiar with the physics side, but any modern philosopher who doesn't take computability seriously isn't worth their salt, for example. Not too familiar with statistics but I believe you that statistics and modern economic theories could disprove say, Marxism as he envisioned it.
That definitely doesn't mean that all those tools from back then are useless or even just misinformed IMO. I witness plenty of modern people (not you) being philosophically bankrupt when making claims.
AndrewKemendo [3 hidden]5 mins ago
My claim is that genuinely all of those previous analytical forms are absolutely useless if you have the capacity to utilize a more mathematical framework
The problem is, those more mathematically challenging frameworks are inaccessible to the majority of the people
so they don’t actually take off because there’s no mechanism to translate more rigor in social studies and social sciences in large part because humans reject the concept of being measured and experimeted with, which is understandable if not optimal
So as a function, applications of mathematics trended towards things that were not human focused and they were machine focused and financial focused
So the big transition happened after TV and Internet (really just low cost high reach advertising) became pervasive and social scientists began utilizing statistical methods across consumer and attention action as social science experimentation platforms
Social science moved from the squishy into the precise precisely to give companies a market advantage in capturing market share through manipulating human behavior
ultimately that was the wet dream of political philosophers since pahotep
Hegel is irrelevant in the age of measurement
sigbottle [3 hidden]5 mins ago
Oh interesting. I've basically quotiented out all social science all my life and stuck strictly to STEM, so my stack is, a lot of analytic + philosophy of science. A lot of pure math and CS (all across the stack), and recently physics because of job. I try not to comment on social issues (though Continental vibes generally seem righter to me the more I study it)
But I've never thought critically (in a long time) about applying it back to social science / political philosophy. Mind discussing more about what you're reading and targeting? I've personally avoided a lot of studies in this area because I didn't think they were actually rigorous but I probably just don't know where to look.
jmalicki [3 hidden]5 mins ago
Microeconometrics tends to be quite rigorous and easy to validate.
They won't hold up to physics levels of rigor, of course - probably a bit more at the medical studies level of rigor.
David Card, Gary Becker, McFadden, etc.
Rigor is also... there's something about letting perfect be the enemy of the good.
If noone will apply math unless you can 100% reliably reproduce controlled experiments in a lab, the only thing left is people just talking about dialectics.
The challenge is how to get as much rigor as possible.
For instance, David Card saw New Jersey increase minimum wage. You generally can't truly conduct large-scale controlled social experiments, but he saw this as interesting.
He looked at the NJ/PA area around Philadelphia as a somewhat unified labor market, but half of it just had its minimum wage increased - which he looked at to study as a "natural" experiment, with PA as the control group and NJ as the experimental group, to investigate what happened to the labor market when the minimum wage increased. Having a major metro area split down the middle allowed for a lot of other concerns to be factored out, since the only difference was what side of the river you happened to be on.
He had lots of other studies looking at things like that, trying to find ways to get controlled-experiment like behavior where one can't necessarily do a true controlled experiment, but trying to get as close as possible, to be as rigorous as is possible.
Is that as ideal as a laboratory experiment? Hell no. But it's way closer than just arguing dialectics.
AndrewKemendo [3 hidden]5 mins ago
Well if you’re interested in the history of it the best start is really just Jeremy Bentham’s consequentialism.
To be clear I don’t believe in consequentialism
He built what was called Fellicific calculus (iirc) that would allow you to more or less take measurement of decisions. It was a mess and it obviously doesn’t work but this is kind of the first serious attempt to bring mathematical rigour to political philosophy.
You could argue that the tao te ching teaching does this in the way that it’s utilized in the sense that you have a set of things that you measure to give you predictive capabilities, but that’s closer to mysticism and tarot card reading its worth acknowledging the input as it’s the basis for like half the human population.
I have my own perspective of this which I wrote out in a fairly lengthy document (General Theory of Cohesion) on my website if you wanna go read it. Warning it’s not particularly scruitable if you’re not already pretty deep into cybernetics and systems theory.
seamossfet [3 hidden]5 mins ago
Oh my god, the rust developers are writing tests with Hegelian dialects.
I spent a little time looking at Hegel last week and it wasn't quite clear to me how I'd go about having something like a canonical generator for a type (similar to proptest's Arbitrary). I've found that to be very helpful while generating large structures to test something like serialization roundtripping against — in particular, the test-strategy library has derive macros that work very well for business logic types with, say, 10-15 enum variants each of which may have 0-10 subfields. I'm curious if that is supported today, or if you have plans to support this kind of composition in the future.
edit: oh I completely missed the macro to derive DefaultGenerator! Whoops
This is one of the areas we've dogfooded the least, so we'd definitely be happy to get feedback on any sharp corners here!
I think `from_type` is one of Hypothesis's most powerful and ergonomic strategies, and that while we probably can't get quite to that level in rust, we can still get something that's pretty great.
There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included. But the problem remains verifying that the tests actually test what they're supposed to. Mutation tests can allow agents to get good coverage with little human intervention, and PBT can make tests better and more readable. But still, people have to read them and understand them, and I suspect that many people who claim to generate thousands of LOC per day don't.
And even if the tests were great and people carefully reviewed them, that's not enough to make sure things don't go terribly wrong. Anthropic's C compiler experiment didn't fail because of bad testing. Not only were the tests good, it took humans years to write the tests by hand, and the agents still failed to converge.
I think good tests are a necessary condition for AI not generating terrible software, but we're clearly not yet at a point where they're a sufficient one. So "a huge part" - possibly, but there are other huge parts still missing.
I think there is some hazard in assuming that what agents fail at today they will continue to fail on in the future.
What I mean is, if we take the optimistic view of agents continuing to improve on the trajectory they have started at for one or two years, then it is worth while considering what tools and infrastructure we will need for them. Companies that start to build that now for the future they assume is coming are going to be better positioned than people who wake up to a new reality in two years.
I think there is some hazard in assuming a seemingly exponential curve has no asymptotes, otherwise known as faith.
I'm just pointing out "we don't need this right now" isn't necessarily an argument against "we don't need this".
There is a saying that isn't perfect but may apply: better to have it and not need it then to need it and not have it.
Here is another way of looking at it. Let's say agents don't meet the hyped up expectations and we build all of this robust tooling for nothing. So we have all of this work towards creating autonomous testing systems but we don't have the autonomous agents. That still seems like a decent outcome.
When we plan around optimistic views of the future, we tend to build generally useful things.
Given Curry-Howard isomorphism, couldn't we ask AI to directly prove the property of the binary executable under the assumption of the HW model, instead of running PBTs?
By no means I want to dismiss PBTs - but it seems that this could be both faster and more reliable.
For example, if memory use isn't modelled, it won't tell you how big the input can be before the system runs out of memory. Similarly, if your database isn't modelled then you need to test with a real database. Web apps need to test with a real web browser sometimes, rather than a simplified model of one. Databases and web browsers are too complicated to build a full-fidelity mathematical model for.
When testing with real systems there's often the issue that the user's system is different from the one you use to test. You can test with recent versions of Chrome and Firefox, etc, which helps a lot, but what about extensions?
Nothing covers everything, but property tests and fuzzers actually run the code in some test environment. That's going to find different issues than proofs will.
I disagree - thanks to Curry-Howard isomorphism, the full-fidelity mathematical model of a database or web browser are their binaries themselves.
We could have compilers provide theorems (with proof) of correctness of the translation from source to machine code, and library functions could provide useful theorems about the resource use.
Then, if the AI can reason about the behavior of the source code, it can also build the required proof of correctness along with it.
Yes, in principle. Given unlimited time and a plentiful supply of unicorns.
Otherwise, no. It is well beyond the state of the art in formal proofs for the general case, and it doesn't become possible just because we "ask AI".
And unless you provide a formal specification of the entire set of behavior, it's still not much better than PBT -- the program is still free to do whatever the heck it wants that doesn't violate the properties formally specified.
Definitely. It's a lot harder to fake this with PBT than with example-based testing, but you can still write bad property-based tests and agents are pretty good at doing so.
I have generally found that agents with property-based tests are much better at not lying to themselves about it than agents with just example-based testing, but I still spend a lot of time yelling at Claude.
> So "a huge part" - possibly, but there are other huge parts still missing.
No argument here. We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.
Yeah, I know. Just an opportunity to talk about some of the delusions we're hearing from the "CEO class". Keep up the good work!
I also observed the cheating to increase. I recently tried to do a specific optimization on a big complex function. Wrote a PBT that checks that the original function returns the same values as the optimized function on all inputs. I also tracked the runtime to confirm that performance improved. Then I let Claude loose. The PBT was great at spotting edge cases but eventually Claude always started cheating: it modified the test, it modified the original function, it implemented other (easier) optimizations, ...
My guess is that you wouldn't have had a better time without PBT here and it would still have either cheated or claimed victory incorrectly, but definitely agreed that PBT can't fully fix the problem, especially if it's PBT that the agent is allowed to modify. I've still anecdotally found that the results are better than without it because even if agents will often cheat when problems are pointed out, they'll definitely cheat if problems aren't pointed out.
That angle is legibility. How do you know your AI-written slop software is doing the right thing? One would normally read all the code. Bad news: that's not much less labor intensive as not using AI at all.
But, if one has comprehensive property-based tests, they can instead read only the property-based tests to convince themselves the software is doing the right thing.
By analogy: one doesn't need to see the machine-checked proof to know the claim is correct. One only needs to check the theorem statement is saying the right thing.
My point isn't so much about PBT, but about how we don't yet know just how much agents help write real software (and how to get the most help from them).
[1]: I'm only using that number because Garry Tan, CEO of YC, claimed to generate 10K lines of text per day that he believes to be working code and developers working with AI agents know they can't be.
* Hypothesis/Hegel are very much focused on using test assertions rather than a single property that can be true or false. This naturally drives a style that is much more like "normal" testing, but also has the advantage that you can distinguish between different types of failing test. We don't go too hard on this, but both Hegel and Hypothesis will report multiple distinct failures if your test can fail in multiple ways.
* Hegelothesis's data generation and how it interacts with testing is much more flexible and basically fully imperative. You can basically generate whatever data you like wherever in your test you like, freely interleaving data generation and test execution.
* QuickCheck is very much type-first and explicit generators as an afterthought. I think this is mostly a mistake even in Haskell, but in languages where "just wrap your thing in a newtype and define a custom implementation for it" will get you a "did you just tell me to go fuck myself?" response, it's a nonstarter. Hygel is generator first, and you can get the default generator for a type if you want but it's mostly a convenience function with the assumption that you're going to want a real generator specification at some point soon.
From an implementation point of view, and what enables the big conveniences, Hypothesis has a uniform underlying representation of test cases and does all its operations on them. This means you get:
* Test caching (if you rerun a failing test, it will immediately fail in the same way with the previously shrunk example)
* Validity guarantees on shrinking (your shrunk test case will always be ones your generators could have produced. It's a huge footgun in QuickCheck that you can shrink to an invalid test case)
* Automatically improving the quality of your generators, never having to write your own shrinkers, and a whole bunch of other quality of life improvements that the universal representation lets us implement once and users don't have to care about.
The validity thing in particular is a huge pain point for a lot of users of PBT, and is what drove a lot of the core Hypothesis model to make sure that this problem could never happen.
The test caching is because I personally hated rerunning tests and not knowing whether it was just a coincidence that they were passing this time or that the test case had changed.
My personal hope is that we can port most of the Hypothesis test suite to hegel-rust, then point Claude at all the relevant code and tell it to write us a hegel-core in rust with that as its test harness. Liam thinks this isn't going to work, I think it's like... 90% likely to get us close enough to working that we can carry it over the finish line. It's not a small project though. There are a lot of fiddly bits in Hypothesis, and the last time I tried to get Claude to port it to Rust the result was better than I expected but still not good enough to use.
But it's still a "blind" fuzzer and it would be nice to write one that gets feedback from code coverage somehow. Instead, you have to run code coverage yourself and figure out how to change test data generation to improve it.
[1] https://jsr.io/@skybrian/repeat-test
There have been simplistic attempts at this, e.g. instead of performing 100 tests, just keep going as long as coverage increases.
The Choice Gradient Sampling algorithm from https://arxiv.org/pdf/2203.00652 feels like a nice way to steer generators in a more nuanced way. That paper uses it to avoid discards when rejection-sampling; but I have a feeling it could be repurposed to "reward" based on new coverage instead/as-well.
I think right now if you're a happy proptest user it's probably not clear that you should switch to Hegel. I'd love to hear about people trying, but I can't hand on my heart say that it's clearly the correct thing for you to do given its early state, even though I believe it will eventually be.
But roughly the things that I think are clearly better about the Hegel approach and why it might be worth trying Hegel if you're starting greenfield are:
* Much better generator language than proptest (I really dislike proptest's choices here. This is partly personal aesthetic preferences, but I do think the explicitly constructed generators work better as an approach and I think this has been borne out in Hypothesis). Hegel has a lot of flexible tooling for generating the data you want.
* Hegel gets you great shrinking out of the box which always respects the validity requirements of your data. If you've written a generator to always ensure something is true, that should also be true of your shrunk data. This is... only kindof true in proptest at best. It's not got quite as many footguns in this space as original quickcheck and its purely type-based shrinking, but you will often end up having to make a choice between shrinking that produces good results and shrinking that you're sure will give you valid data.
* Hegel's test replay is much better than seed saving. If you have a failing test and you rerun it, it will almost immediately fail again in exactly the same way. With approaches that don't use the Hypothesis model, the best you can hope for is to save a random seed, then rerun shrinking from that failing example, which is a lot slower.
There are probably a bunch of other quality of life improvements, but these are the things that have stood out to me when I've used proptest, and are in general the big contrast between the Hypothesis model and the more classic QuickCheck-derived ones.
I've talked with lots of people in the PBT world who have always seen something like this as the end goal of the PBT ecosystem. It seemed like a thing that would happen eventually, someone just had to do it. I'm super excited to actually be doing it and bringing great PBT to every and any language.
It doesn't hurt that this is coming right as great PBT in every language is suddenly a lot more important thanks to AI code!
PBT is for sure the future - which is apparently now? 10 years ago when I was talking about QuickCheck [0] all the JS and Ruby programmers in my city just looked at me like I had two heads.
[0] https://github.com/ryandv/chesskell/blob/master/test/Test/Ch...
10 years ago might have been a little early (Hypothesis 1.0 came out 11 years ago this coming Thursday), but we had pretty wide adoption by year two and it's only been growing. It's just that the other languages have all lagged behind.
It's by no means universally adopted, but it's not a weird rare thing that nobody has heard of.
Will: "Apparently Hegel actually hated the whole Hegelian dialectic and it's falsely attributed to him."
Me: "Oh, hm. But the name is funny and I'm attached to it now. How much of a problem is that?"
Will: "Well someone will definitely complain about it on hacker news."
Me: "That's true. Is that a problem?"
Will: "No, probably not."
(Which is to say: You're entirely right. But we thought the name was funny so we kept it. Sorry for the philosophical inaccuracy)
One of the evilest tricks in marketing to developers is to ensure your post contains one small inaccuracy so somebody gets nerdsniped... not that I have ever done that.
https://meta.wikimedia.org/wiki/Cunningham%27s_Law
Trump did this a lot with the legacy media in his first term. He would make inaccurate statements to the media on the topic he wanted to be in the spotlight, and the media would jump to "fact check" him. Guess what, now everyone is talking about illegal immigration, tariffs, or whatever subject Trump thought was to their advantage.
I believe (unless my memory is broken) they get into this a bunch in Ep 15 of my favourite podcast "What's Left Of Philosophy": https://podcasts.apple.com/gb/podcast/15-what-is-dialectics-...
Also if you're not being complained about on HN, are you even really nerd-ing?
The necessity thing is the big thing - why unfold in this way and not some other way. Because the premises in which you set up your argument can lead to extreme distortions, even if you think you're being "charitable" or whatever. Descartes introduced mind-body dualisms with the method of pure doubt, which at a first glance seemingly is a legitimate angle of attack.
Unfortunately that's about as nuanced as I know. Importantly this excludes out a wide amount of "any conflict that ends in a resolution validates Hegel" kind of sophistry.
This is not quite accurate. Kant says very explicitly in the (rarely studied) Transcendental Doctrine of Method (Ch 1 Section 4, A789/B817) that this kind of proof method (he calls it "apagogic") is unsuitable to transcendental proofs.
You might be thinking of the much more well studied Antinomies of Pure Reason, in which he uses this kind of proof negatively (which is to say, the circumscribe the limits of reason) as part of his proof against the way the metaphysical arguments from philosophers of his time (which he called "dogmatic" use of reason) about the nature of the cosmos were posed.
The method he used in his Deduction is a transcendental argument, which is typically expressed using two things, X and Y. X is problematic (can be true but not necessarily so), and Y is dependent on X. So then if Y is true, then X must necessarily be true as well.
I guess it would be more accurate to state Kants actual premises here as making the distinction between appearance and thing-in-itself rather than the deduction, but the deduction technique itself was fascinating when I first learned it so that's what I associate most with Kant lol.
I guess I have not thought critically why we couldn't use a Transcendental argument to support Descartes. I just treated it as a vague category error (to be fair I don't actually know Descartes philosophy that well, even less than I know Kants lol). Could be a fun exercise when I have time.
The previous section within the Transcendental Dialectic that focuses on the nature of the soul goes into a refutation of Descartes' statement. Kant basically finds "I think therefore I am" to be a tautology that only works by equivocating the "I" in each clause. "I think" pretends that the "I" there is an object in the world which it then compares to the "I am" which is an object in the world. Kant argues that "I think" does not actually demonstrate an "I" that is an object but rather a redundant qualification of thinking.
I am being a bit imprecise, so here is SEP's summary:
>For in each case, Kant thinks that a feature of self-consciousness (the essentially subjectival, unitary and identical nature of the “I” of apperception) gets transmuted into a metaphysics of a self (as an object) that is ostensibly “known” through reason alone to be substantial, simple, identical, etc. This slide from the “I” of apperception to the constitution of an object (the soul) has received considerable attention in the secondary literature, and has fueled a great deal of attention to the Kantian theory of mind and mental activity.
>The claim that the ‘I’ of apperception yields no object of knowledge (for it is not itself an object, but only the “vehicle” for any representation of objectivity as such) is fundamental to Kant’s critique of rational psychology.
[1] https://plato.stanford.edu/entries/kant-metaphysics/#SouRatP...
Statistics wasn’t really quite mature enough to be applied to let’s say political economy a.k.a. economics which is what Hegel was working in.
JB Say (1) was the leading mind in statistics at the time but wasn’t as popular in political circles (Notably Proudhon used Says work as epistemology versus Hegel and Marx)
I’ve been in serious philosophy courses where they take the dialectic literally and it is the epistemological source of reasoning so it’s not gone
This is especially true in how marx expanded into dialectical materialism - he got stuck on the process as the right epistemological approach, and marxists still love the dialectic and Hegelian roots (zizek is the biggest one here).
The dialectic eventually fell due to robust numerical methods and is a degenerate version version of the sampling Markov Process which is really the best in class for epistemological grounding.
Someone posted this here years ago and I always thought it was a good visual: https://observablehq.com/@mikaelau/complete-system-of-philos...
I think it’s worth again pointing out that Hegel was at the height of contemporary philosophy at the time but he wasn’t a mathematician and this is the key distinction.
Hagel lives in the pre-mathematical economics world. The continental philosophy world of words with Kant etc… and never crossed into the mathematical world. So I liking it too he was doing limited capabilities and tools that he had
Again compare this to the scientific process described by Francis Bacon. There are no remixes to that there’s just improvements.
Ultimately using the dialectic is trying to use an outdated technology for understanding human behavior
Interestingly, a lot of arguments and formulations Kant had were lifted from Leibniz and reframed with a less mathematical flavor. I remember in particular his argument against infinite regress was pretty much pound for pound just reciting some conjecture from Leibniz (without attribution)
It's no doubt that basically nobody could've predicted a priori 20th century mathematics and physics. Not too familiar with the physics side, but any modern philosopher who doesn't take computability seriously isn't worth their salt, for example. Not too familiar with statistics but I believe you that statistics and modern economic theories could disprove say, Marxism as he envisioned it.
That definitely doesn't mean that all those tools from back then are useless or even just misinformed IMO. I witness plenty of modern people (not you) being philosophically bankrupt when making claims.
The problem is, those more mathematically challenging frameworks are inaccessible to the majority of the people
so they don’t actually take off because there’s no mechanism to translate more rigor in social studies and social sciences in large part because humans reject the concept of being measured and experimeted with, which is understandable if not optimal
So as a function, applications of mathematics trended towards things that were not human focused and they were machine focused and financial focused
So the big transition happened after TV and Internet (really just low cost high reach advertising) became pervasive and social scientists began utilizing statistical methods across consumer and attention action as social science experimentation platforms
Social science moved from the squishy into the precise precisely to give companies a market advantage in capturing market share through manipulating human behavior
ultimately that was the wet dream of political philosophers since pahotep
Hegel is irrelevant in the age of measurement
But I've never thought critically (in a long time) about applying it back to social science / political philosophy. Mind discussing more about what you're reading and targeting? I've personally avoided a lot of studies in this area because I didn't think they were actually rigorous but I probably just don't know where to look.
They won't hold up to physics levels of rigor, of course - probably a bit more at the medical studies level of rigor.
David Card, Gary Becker, McFadden, etc.
Rigor is also... there's something about letting perfect be the enemy of the good.
If noone will apply math unless you can 100% reliably reproduce controlled experiments in a lab, the only thing left is people just talking about dialectics.
The challenge is how to get as much rigor as possible.
For instance, David Card saw New Jersey increase minimum wage. You generally can't truly conduct large-scale controlled social experiments, but he saw this as interesting.
He looked at the NJ/PA area around Philadelphia as a somewhat unified labor market, but half of it just had its minimum wage increased - which he looked at to study as a "natural" experiment, with PA as the control group and NJ as the experimental group, to investigate what happened to the labor market when the minimum wage increased. Having a major metro area split down the middle allowed for a lot of other concerns to be factored out, since the only difference was what side of the river you happened to be on.
He had lots of other studies looking at things like that, trying to find ways to get controlled-experiment like behavior where one can't necessarily do a true controlled experiment, but trying to get as close as possible, to be as rigorous as is possible.
Is that as ideal as a laboratory experiment? Hell no. But it's way closer than just arguing dialectics.
To be clear I don’t believe in consequentialism
He built what was called Fellicific calculus (iirc) that would allow you to more or less take measurement of decisions. It was a mess and it obviously doesn’t work but this is kind of the first serious attempt to bring mathematical rigour to political philosophy.
You could argue that the tao te ching teaching does this in the way that it’s utilized in the sense that you have a set of things that you measure to give you predictive capabilities, but that’s closer to mysticism and tarot card reading its worth acknowledging the input as it’s the basis for like half the human population.
I have my own perspective of this which I wrote out in a fairly lengthy document (General Theory of Cohesion) on my website if you wanna go read it. Warning it’s not particularly scruitable if you’re not already pretty deep into cybernetics and systems theory.