Discussion
Search code, repositories, users, issues, pull requests...
jeffbee: This readme, this header do not seem to discuss in any way the tradeoff, which is that you're paying by the same factor with median latency to buy lower tail latency. Nobody thinks of a load as taking 800 cycles but that is the baseline load latency here.Also, having sacrificed my own mental health to watch the disgustingly self-promoting hour-long video that announces this small git commit, I can confidently say that "Graviton doesn't have any performance counters" is one of the wrongest things I've heard in a long time.Overall, I give it an F.
PunchyHamster: The video was about how rowhammer works, the lib was byproduct.
lauriewired: Nope, there isn’t a tradeoff; median latency isn’t affected. I don’t think you understand the code. The p50 is identical between a single read and the hedged strategy.The clflush is there because the technique targets data that will miss the cache anyway. If your working set fits in L1, you don’t need this.Also, AWS Graviton instances absolutely do not expose per-channel memory controller counter PMUs. That’s why you have to use timing-based channel discovery.The IBM z-system is neat! But my technique will work on commodity hardware in userspace, and you can easily only sacrifice half the space if you accept 2-way instead of 8+ way hedging. It’s entirely up to you how many channel copies you want to use.Your reply was quite rude, but I hope this is informative.
hedgehog: I was just trying to reconcile his reply with the charts. Have you tested how this scales down for smaller systems, as one might find in on the management side of a network switch?
jagged-chisel: Oh: Tail Slayer. Not Tails Layer. My brain took longer to parse that than I’d have wanted.
jeffbee: I won't be tone-policed by a person who is clearly trying to mislead and confuse people. I leave it to the other HNers to read your benchmark code and see for themselves that it is an exercise in absurdity, a work-around for its own library that doesn't measure anything other than with N threads, because of the laws of probability, this technique of reading timestamps as fast as possible and cramming them into a vector yields lower measurements with higher N.