Discussion
Want to Write a Compiler? Just Read These Two Papers.
downbad_: https://news.ycombinator.com/item?id=10786842
itsmemattchung: It's been about 4 years since I took a compilers course (from OMSCS, graduate program) and still shutter ... it was, hands down, the most difficult (yet rewarding) classes I've taken.
msla: fanf2 on Dec 25, 2015 [dead] | parent | prev | next [–]I quite like "understanding and writing compilers" by Richard Bornat - written in the 1970s using BCPL as the implementation language, so rather old-fashioned, but it gives a friendly gentle overview of how to do it, without excessive quantities of parsing theory.
msla: https://news.ycombinator.com/item?id=10791521
krtkush: I wonder if it makes sense to do the nand2tetris course for an absolute beginner since it too has compiler creation in it.
msla: It made me drink myself Venetian blind.
rdevilla: Almost 20 years old. This knowledge is obsolete in the AI era.
GCUMstlyHarmls: Nanopass paper seems to be dead but can be found here at least https://stanleymiracle.github.io/blogs/compiler/docs/extra/n...
kunley: Also:https://www.cambridge.org/core/journals/journal-of-functiona...
jll29: *Donald Knute -> Donald Ervin Knuth is the author of the book "The Art of Computer Programming" (in progress for a couple of decades, currently volume 4c is being written). It is quite advanced, and it will likely not cover compilers anymore (Addison-Wesley had commissioned a compiler book from Knuth when he was a doctoral candidate, now he is retired and has stated his goal for the series has changed).I disagree with the author's point: the "Dragon book"'s ("Compilers: Principles, Techniques, and Tools" by Aho et al.) Chapter 2 is a self-sufficient introduction into compilers from end to end, and it can be read on its own, ignoring the rest of the excellent book.Another fantastic intro to compiler writing is the short little book "Compilers" by Niklaus Wirth, which explains and contains the surprisingly short source code of a complete compiler (the whole book is highly understandable - pristine clarity, really) and all in <100 pages total (99).(I learned enough from these two sources to write a compiler in high school.)
Tor3: I like that book too, I bought it many decades ago and learned enough to write a transpiler for converting Fortran source code to Turbo Pascal.
morphle: Compiler writing has progressed a lot. Notably in meta compilers [1] written in a few hundred lines of code and adaptive compilation [3] and just in time compilers.[1] Ometa https://tinlizzie.org/VPRIPapers/tr2007003_ometa.pdf[2] Other ometa papers https://tinlizzie.org/IA/index.php/Papers_from_Viewpoints_Re...[3] Adaptive compilation https://youtu.be/CfYnzVxdwZE?t=4575
soegaard: An Incremental Approach to Compiler ConstructionAbdulaziz Ghuloumhttp://scheme2006.cs.uchicago.edu/11-ghuloum.pdfAbstractCompilers are perceived to be magical artifacts, carefully crafted by the wizards, and unfathomable by the mere mortals. Books on compilers are better described as wizard-talk: written by and for a clique of all-knowing practitioners. Real-life compilers are too complex to serve as an educational tool. And the gap between real-life compilers and the educational toy compilers is too wide. The novice compiler writer stands puzzled facing an impenetrable barrier, “better write an interpreter instead.”The goal of this paper is to break that barrier. We show that building a compiler can be as easy as building an interpreter. The compiler we construct accepts a large subset of the Scheme programming language and produces assembly code for the Intel-x86 architecture, the dominant architecture of personal computing. The development of the compiler is broken into many small incremental steps. Every step yields a fully working compiler for a progressively expanding subset of Scheme. Every compiler step produces real assembly code that can be assembled then executed directly by the hardware. We assume that the reader is familiar with the basic computer architecture: its components and execution model. Detailed knowledge of the Intel-x86 architecture is not required.The development of the compiler is described in detail in an extended tutorial. Supporting material for the tutorial such as an automated testing facility coupled with a comprehensive test suite are provided with the tutorial. It is our hope that current and future implementors of Scheme find in this paper the motivation for developing high-performance compilers and the means for achieving that goal.
KnuthIsGod: Some of us enjoy intellectual challenges....The Bornat book looks great.The fact that it is in BCPL is ultra cool.
rdevilla: Liking intellectual challenges on Hacker News is very 2008. It's 2026, the AI will write a compiler in 5 minutes, no headache required.If you're still playing with Rubik's cubes you're going to get left behind.
guitarlimeo: > you're going to get left behind.What's wrong with that? Why do you fear getting left behind? This is just fearmongering.Mind you these are legitimate interests of people and in most of cases probably not related to professional work.And lol @ "It's 2026, the AI will write a compiler in 5 minutes, no headache required.", no it will not, have you seen Anthropic's post about Claude writing the "C compiler"?
wartywhoa23: I highly recommend nand2tetris to everyone. For me, nothing ever explained the whole domain from logic gates and inner workings of a CPU to compilers better than this course.
wartywhoa23: On a side note, why is imrozim's comment dead? What in the world is wrong with it? It's perfectly fine IMO.
omcnoe: Been working on a toy compiler for fun recently.I have ignored all the stuff about parsing theory, parser generators, custom DSL's, formal grammers etc. and instead have just been using the wonderful Megaparsec parser combinator library. I can easily follow the parsing logic, it's unambiguous (only one successful parse is possible, even if it might not be what you intended), it's easy to compose and re-use parser functions (was particularly helpful for whitespace sensitive parsing/line-fold handling), and it removes the tedious lexer/parser split you get with traditional parsing approaches.
petcat: And Nystrom's book
vlaaad: Yeah, I really enjoyed Crafting Interpreters, wholeheartedly recommend!
voidUpdate: I've been having a look at the Crenshaw series, and it seems pretty good, but one thing that kinda annoys me is the baked-in line wrapping. Is there a way to unwrap the text so its not all in a small area on the left of my screen?
armchairhacker: Nowadays I’ve heard recommended Crafting Interpreters. (https://craftinginterpreters.com)The Nanopass paper link doesn’t work.
gobdovan: Compilers are broad enough that when someone recommends a "compiler book", it's rarely exactly the slice you wanted.So this made me do a runnable cheat sheet for Crafting Interpreters. I keep parsing demonstrative, and the AST is a little more Lisp-y than the book's.Disclaimer: it's meant to convey the essence of what you'll learn, it is NOT by any means a replacement for the book. I'd also describe the book as more of an experience (including some things Nystrom clearly enjoyed, like the visitor pattern) than a compilers manual. If anyone's interested, I can do a separate visitor-pattern cheat sheet too, also in Python.I turned it into a 'public-facing artifact' from private scripts with an AI agent.[0] https://ouatu.ro/blog/crafting-interpreters-cheat-sheet/
voidUpdate: I've been handwriting my website since forever. That way I don't have to also load megabytes of JS libraries, and it keeps it nice and lean (ish)
tjarjoura: What did you find more painful about compilers than other forms of programming?
kuboble: I think there is a million ways to make a compilers course.The course I did was organized perfectly with big parts of compiler boiler plate already written, and I only had to implement parser/lexer rules and the translation of language structures into assembly instructions. Also it was a compiler for a language designed just for this course with the intention of it being specifically easy to write a compiler for it and not programming.Without this I can imagine it being a painful experience
NordStreamYacht: Also you could go to Andy Keep's site: https://nanopass.org/
77: https://xmonader.github.io/letsbuildacompiler-pretty/
voidUpdate: Perfect!
mrkeen: I'll push back and say that the lexer/parser split is well worth it.And the best thing about the parser combinator approach is that each is just a kind of parser, something like type Lexer = ParsecT e ByteString m [Token] type Parser = ParsecT e [Token] Expr All the usual helper functions like many or sepBy work equally well in the lexing and parsing phases.It really beats getting to the parentheses-interacting-with-ordering-of-division-operations stage and still having to think "have I already trimmed off the whitespace here or not?"
stupefy: One nice piece of advice that I received is that books are like RAMs, you do not have to go through them sequentially, but can do random access to the parts of it you need. With this in mind I find it doable to get one the thick books and only read the part that I need for my task.But, to also be fair, the above random access method does not work when you don't know what you don't know. So I understand why having a light, but good introduction to the topic is important, and I believe that's what the author is pointing out.
ac50hz: Sadly, Richard died in January this year. This book is available to download from one of his academic pages:https://www.eis.mdx.ac.uk/staffpages/r_bornat/#compilerbookhttps://www.eis.mdx.ac.uk/staffpages/r_bornat/books/compilin...https://en.wikipedia.org/wiki/Richard_Bornat
orthoxerox: Crafting Interpreters is great, I wish it had a companion book that covered: - types and typing - optimization passes - object files, executables, libraries and linking Then two of them would be sufficient for writing a compiler.
armchairhacker: It seems to me LL and LR parser generators are overrated, and hand-written recursive descent is best in practice. I understand why academics teach them, but not why some spend so long on different parsing techniques, nor why hobbyists who just want to compile their toy language are directed to them.I work in PL, and from my first compiler to today, have always found recursive descent easiest, most effective (less bugs, better error diagnostics, fast enough), and intuitive. Many popular language compilers use recursive descent: I know at least C# (Roslyn) and Rust, but I believe most except Haskell (GHC) and ocaml.The LR algorithm was simple once I learned it, and yacc-like LR (and antlr-like LL) parser generators were straightforward once I learned how to resolve conflicts. But recursive descent (at least to me) is simpler and more straightforward.LR being more expressive than LL has never mattered. A hand-written recursive descent parser is most expressive: it has unlimited lookahead, and can modify parsed AST nodes (e.g. reordering for precedence, converting if into if-else).The only solution that comes close is tree-sitter, because it implements GLR, provides helpful conflict messages, and provides basic IDE support (e.g. syntax highlighting) almost for free. But it’s a build dependency, while recursive descent parsers can be written in most languages with zero dependencies and minimal boilerplate.
_old_dude_: Parser generators are great in Python (Lark for me) so you can iterate fast and get a runtime spec of your grammar.A hand-written recursive descent parser is something you do later when you start to industrialize your code, to get better error messages, make the parser incremental, etc.Bison/ANTLR are code generators, they do not fit well in that model.