Discussion
Amit Limaye
coppsilgold: You mentioned SECCOMP_RET_TRACE, but there is also SECCOMP_RET_TRAP which appears to perform better. There is also KVM. Both of these are options for gVisor: <https://github.com/google/gvisor>
monocasa: There's also SECCOMP_RET_USER_NOTIF, which is typically used by container runtimes for their sandboxing.
foota: Hah, I've been looking into something amusingly similar to track mmap syscalls for a process :)
coppsilgold: SECCOMP_RET_USER_NOTIF seems to involve sending a struct over an fd on each syscall. Do they really use it? Performance ought to suffer.Also gVisor (aka runsc) is a container runtime as well. And it doesn't gatekeep syscalls but chooses to re-implement them in userland.
jmillikin: This might be a very dumb question, but if the process is being run under KVM to catch `int 0x03` then couldn't you also use KVM to catch `syscall` and execute the original binary as-is? I don't understand what value the instruction rewriting is providing here.
ozgrakkurt: Really informative writing thank you.How secure does this make a binary? For example would you be able to run untrusted binary code inside a browser using a method like this?Then can websites just use C++ instead of javascript for example?
im3w1l: What about int 80h?
JSR_FDED: Love the detailed write up, thanks!This is the kind of foundation that I would feel comfortable running agents on. It’s not the whole solution of course (yes agent, you’re allowed to delete this email but not that email can’t be solved at this level)… let me know when you tackle that next :-)
hparadiz: I've been thinking of making a kernel patch that disables eBPF for certain processes as a privacy tool. Everyone is using eBPF now.
lmz: They already can use C++ if they want to. Emscripten? Jslinux?
xelaboi: You either have a writing style that is uncannily similar to what an LLM generates, or this article was substantially written by an LLM. I don't know what it is about the style, but I just find it a bit exhausting, like an overfit on "engaging writing" that strips away sincerity.
ozgrakkurt: I mean just distributing the regular compiled x86_64 binary and then running it as a normal executable on the client side but just using that syscall shim so it is safe.
renewiltord: It’s clearly LLM written but the idea was interesting enough that I read it. I suspect based on username the writer is cleaning up their voice.I think the idea of sharing the raw prompt traces is good. Then I can feed that to an LLM and get the original information prior to expansion.
rep_lodsb: Yes, that seems unneccessary. The overhead of trapping and rewriting every syscall instruction once can't be (much) greater than that required for rewriting them at the start either.Even if you disallow executing anything outside of the .text section, you still need the syscall trap to protect against adversarial code which hides the instruction inside an immediate value: foo: mov eax, 0xc3050f ;return a perfectly harmless constant ret ... call foo+1 (this could be detected if the tracing went by control flow instead of linearly from the top, but what if it's called through a function pointer?)
szmarczak: > It can’t detect the interceptionWhat's stopping the process from reading its own memory and seeing that the syscall was patched?
direwolf20: If you think about the fundamentals involved here, what you actually need is for the OS to refuse to implement any syscalls, and not share an address space.A process is already a hermetically sealed sandbox. Running untrusted code in a process is safe. But then the kernel comes along and pokes holes in your sandbox without your permission.On Linux you should be able to turn off the holes by using seccomp.
nonameiguess: Name sounds very likely not an English speaker. And the one reply here to a top-level comment is extremely obvious. I think it's unfortunate that people who write English poorly feel the need to do it, but I get it at least. The person behind this probably has a real interest and knowledge in the space but feels they can't communicate it without assistance.It is too bad, though. People bad at English will themselves be reading this forever now and think this is the way real people write, speak, or are supposed to.It's many things. The relentless ethusiasm about everything. Prefacing any answer to a question with an affirmation that it was a good question first. And yes, sorry, pedants of the web who feel witch-hunted because you knew how to employ keyboard shortcuts and used em-dashes in 2015 and have the receipts to prove it -- you never used 17 in the span of a single page. I think that was the first I can remember using ever and I had to contrive a way to do it where a semi-colon wouldn't clearly work better.
Thaxll: It's pretty much what gVisor does.https://gvisor.dev/
ghoul2: Isn't that exactly what gvisor does?
xuhu: SECCOMP_RET_USER_NOTIF appears to switch between the tracee and tracer processes for each syscall. Using SECCOMP_RET_TRAP to trigger a SIGSYS for every syscall when unpacking glibc sources using tar introduces 5% overhead (and avoids a separate tracer).I wonder if there's any mechanism that works for intercepting static ELF's like Go programs and such.
pocksuppet: Why not just use ptrace?
rep_lodsb: Thinking a bit more about it (and reading TFA more carefully), what's the point of rewriting the instructions anyway?I first assumed it was redirecting them to a library in user mode somehow, but actually the syscall is replaced with "int3", which also goes to the kernel. The whole reason why the "syscall" instruction was introduced in the first place was that it's faster than the old software interrupt mechanism which has to load segment descriptors.So why not simply use KVM to intercept syscall (as well as int 80h), and then emulate its effect directly, instead of replacing the opcode with something else? Should be both faster and also less obviously detectable.
jacobgorm: I think the point here is optimizing for the common case, the untrusted code is still running inside a VM, so you can still trap malicious or corner cases using a more heavy-handed method. The blog post does mention "self-healing" of JIT-generated code for instance.It is possible to restrict the call-flow graph to avoid the case you described, the canonical reference here is the CFI and XFI papers by Ulfar Erlingsson et.al. In XFI they/we did have a binary rewriter that tried to handle all the corner cases, but I wouldn't recommend going that deep, instead you should just patch the compiler (which funnily we couldn't do, because the MSVC source code was kept secret even inside MSFT, and GCC source code was strictly off-limits due to being GPL-radioactive...)
qbane: There is even a table copy-pasted into a paragraph without noticing.> What’s needed is something different:> Requirement ptrace seccomp eBPF Binary rewrite Low overhead per syscall No (~10-20µs) Yes Yes Yes [...]
jcalvinowens: Yeah, I had the same question. But I'd guess they probably disable IA32 completely.
amitlimaye: The follow on posts describe where I plan to run the binaries. the idea is to run in a guest with no kernel and everything running at ring 0 that makes the sysret a dangerou thing to call. we don't have anything running at ring 3 also the syscall instruction clobber some registers all in all between the int3 and syscall instruction i counted around 20 extra instructions in my runtime. ( This is a guess me trying to figure what would happen). That is why the int3 becomes faster for what i am trying to build. The toolchain approach suffers from the diversity of options you have to support even if ignore stuff you guys encountered. Might be easier with llvm based things but still too many things to patch and the movement you tell people used my build environment it meets resistance. I am currently aiming for python which is easy to do. The JIT is when i want to do javascript which i keep pushing out because once i go down there i have to worry about threading as well. Something i want to chase but right now trying to get something working.
monocasa: They use a seccomp filter to decide which syscalls get sent to the other process for processing.
amitlimaye: Int80 is a great idea but int3 is what i landed on when i was looking and at this point just trying to get something working. The good thing about int80 is a 2 byte instruction i believe rather than int3 + nop that i am doing right now
amitlimaye: AMA i am the author of that blog i have some working code just not something i want to share right away. Right now i am chasing density but yes security is something i will get to eventually. the issue is what to implement first :). This is the first of a series of blogs i am writing. you can check my substack. the next step is to show a density,launch speed demo hopefully middle of next week
amitlimaye: yes that is the goal though C++ is something i am not targetting in the short term. The idea is to be able to run untrusted binaries in a vm with no kernel. saves memory makes for faster loads and the the bin cannot escape the vm so it can never compromise your host.
amitlimaye: ptrace is atleast 2 context switches that will make it pretty slow
amitlimaye: Actually you are right nothing is stopping it from reading but that does not help it escape the kernel. If you are worried about something adversarial that tries to detect its in a sandbox but that is not what we are trying to protect from the idea is to follow the same model of a container with something that is more secure and has less surface area to protect or attack.
notepad0x90: I think it's better to just adapt to this. A lot of people write the content their own way, and get AI to rewrite it so that it is more readable, and free from errors. Content over appearance and all. I think the problem is you consider this auto-completion tool insincere. many do as well, because they anthropomorphize LLMs, it feels like a different sentient entity wrote it than the person posting it. but in reality, that isn't the case; it's more like a spellchecker that helped the person communicate their idea.The purpose of language is to communicate meaning and intent, not to sound or feel a particular way, unless you're reading for entertainment or enjoyment.This is the second post I'm commenting on within a span of like 30 minutes where someone did some really good work and shared it, but the top comments are complaining about AI usage.Either LLM-assisted content needs to be banned entirely (might be), or complaining about it should be considered a breach of etiquette at sites like HN that are tech-centric.
xelaboi: Appearance and style is content, and it always was. The way you write is fundamentally a part of how a reader interprets meaning and intent.Calling it a spellchecker is simply wrong if you give an LLM some bullet points and then instruct it to write an article. I find it more insincere because it's an extra layer between the author and the reader which substantially affects every aspect of the piece of writing, not just the spelling of individual words, or Microsoft Word nagging you to avoid passive voice.If OP is not a native English speaker and is using an LLM to create a reasonable prose, then it might be the best way for them to try and communicate their ideas. It's probably better than Google translate. It affects how the reader interprets the writing, though.My other point, which I also stand by, is that I find the default writing style of current LLMs exhausting to read. It feels like a college student has submitted an assignment on engaging writing and decided to use every technique they could find in their textbook, because they want to get top marks. It just feels forced to me.--------------------------------As an example, I asked claude to make my argument more "clear". See how it wrote it:Style isn't separate from content — it is content. The way something is written shapes how a reader interprets its meaning, and that's always been true. Calling an LLM a "spellchecker" only holds if it's catching typos. The moment you hand it bullet points and ask it to produce an article, it's not correcting your writing — it's replacing it. That's a fundamentally different thing.I'll grant one exception: if someone isn't a fluent English speaker and uses an LLM to bridge that gap, that's a legitimate trade-off, even if it still changes how the reader experiences the piece.But my broader complaint stands independent of that debate: current LLMs produce a recognizable, exhausting prose style. Every sentence is engineered to be "engaging." Every paragraph hits the expected beats. It reads like someone who learned to write from a listicle about writing — technically compliant, but hollow. The effort to sound compelling ends up undercutting any sense that a real person with a real perspective is behind it.
foota: Yeah this wasn't something like "I want to debug a program" but rather I wanted to be able to track mmaping for later cleanup.Fortunately libc doesn't mmap that much internally so I think I can get away alright with interposing lib's mmap call.
notepad0x90: > If OP is not a native English speaker and is using an LLM to create a reasonable prose, then it might be the best way for them to try and communicate their ideas. It's probably better than Google translate. It affects how the reader interprets the writing, though.That's just crazy, do you think people don't get discriminated because of that? they'll probably get flagged and blacklisted from HN just because of sharing a post riddled with grammar mistakes, it will look like spam to many. If they get lucky, the top comments would be correcting their grammar mistakes, not about the content.If you didn't talk to me before today, you don't know how I talk. You don't know what sincere is like. the term you're looking for is authentic not sincere. questioning the sincerity of the OP is just wrong. You don't like people having control over how what they have to say is conveyed to others, because you have some irrational bias against the usage of a particular tool.You argue and even use AI (you don't mind being insincere? I'd like to get your own original arguments, how about that?) to dismiss content because of style, thereby justifying the need for people to be careful of the style of the post they share. Have you considered that had they not used AI, you or others would be dismissing their post for other style-related reasons? because you care about style so much.But you're right, style is content, it was wrong of me to claim otherwise. What I meant was probably "meaning". The writing style affects how you read the content, in this case you don't like how it forces you to read it, but the meaning OP is trying to communicate (what I meant by "content") is being glossed over.The take away for me from this discussion, is people need to use better prompts, and better models, not that they shouldn't use an LLM, because even when their grammar and spelling is wrong, they get nitpicked against this way.> The effort to sound compelling ends up undercutting any sense that a real person with a real perspective is behind it.That's a fault and a bias by the reader, in my opinion. I didn't even think it was LLM written, I wasn't looking for it (we tend to find what we're looking for?). My focus was on what was done, validating the claims made, and analyzing the implications. I didn't care how they sounded, because I was able to actually read the content, and understand what they were saying. If it was the other way, and I was the OP, I would want people to focus on what I was saying, and appreciate that I took some action to ensure my post is readable.I think they can use better prompts to make it sound and feel better, but it's a real shame that they have to. It is this sort of an interaction that makes me wish we had more LLMs making decisions instead of humans out there. Things like accents, writing styles, even last names, and spelling mistakes decide the fate of many today. The real value people bring, the real human potential is dismissed (not in this case, just making a general observation), cosmetic and performative factors override all else.> it's not correcting your writing — it's replacing it. That's a fundamentally different thing.It is my writing, in that I agreed the meaning of the rewritten content is what I intended to communicate. People get to have agency on how their meaning is conveyed. You don't have any say over that. Your criticism over how it feels, although I disagree, is legitimate, but your criticism based solely on the fact that AI rewrote the content is entirely invalid.Let's imagine OP had a human copy write for them, editing and rewriting the entire content, would that change anything? If not, why are we talking about LLMs instead of the specifics of what bothered you uniquely, so that people reading this thread can use better prompts to avoid those annoying pitfalls?I didn't even pick up on this being AI rewritten, I'm only taking yours and others' word for it. My biggest concern these days is that kids are growing up interacting with LLMs a lot, and their original work will be dismissed by older people because it sounds like an LLM. There are many cases of students having their work and exams dismissed, even facing disciplinary actions leading up to lawsuits, where teachers/academics claimed wrongly it was LLM generated content (and why I keep feeling that perhaps LLMs should replace those biased academics and teachers if possible).LLM usage isn't going away, perhaps prompts and models will improve, but more likely than not, it is more economical and practical for humans to be forced to adapt one way or the other, to regular LLM usage by other humans. If you skip in 50 year increments and read books or news stories, you'll also see how the writing style and "feel" is very different. There is a very distinctive "feel" to how people on HN write, compared to reddit, gaming discord servers, twitter, bluesky, or the comments section of some conservative site. You'll see some groups use terms like "bro" and "bruh" a lot, others end everything with "lol", others yet include emoji in everything. All this will feel very weird and inappropriate to someone from the 1800s. I am not saying all that to dismiss your observations, but to say that this stuff isn't all that important. If you didn't think the cause of the annoying writing style was an LLM, I doubt you would have commented on it, so don't comment at all about it is my suggestion. There was no egregious writing style offense that was so serious that we need to talk about it, instead of the actual work OP is sharing.