Dear sir, you have built a compiler (2022)

380 points224 comments7 months ago

quantadev

In software development it's pretty important to know when to build "on top" of something else, and when to start from scratch.

Lots of developers will find it much more interesting, challenging, rewarding and just plain fun to develop something from scratch, even when there are better things that already exist.

They'll cleverly manipulate and convince the boss, against the better discretion of their elder developers, that they can do it, and if they're one of the better developers, the boss won't want to risk losing them so they'll agree to the escapade.

Then said escapade turns into a shambles, as predicted by the elder devs, and the developer who created the mess simply quits and moves to some other job, in search of more fun and greener pastures. Any developer with decades of experience has probably seen this same pattern multiple times.

show comments

pjungwir

I've seen this a lot when someone wants to add "workflow automation" or "scripting" to their app. The most success I'd had is embedding either Lua or Javascript (preferably Lua) with objects/functions from the business domain available to the user's script. This is what games do too. I think it's a great way to dodge most of the work. For free you can support flow control, arbitrary boolean expressions, math, etc.

show comments

burnt-resistor

<old-guy-high-school-glory-days-and-nobody-today>

Reminds me of the pain of intentionally building a compiler for Java 2 (subset) to MIPS compiler by writing out each AST node class by hand. And, I did it twice, once in C++03 with bison and flex and again in Java 2 with CUP and JFlex... each was developed to build and run as a host portably across Solaris (sparc), Linux (x86), HP-UX (68k), SGI (MIPS), and Windows (x86) with compiled with targets run on the SPIM emulator. It did have dead code, dead string, and dead variable elimination, but that was as far my optimization passes went. I recall the only build tool I used for each was the portable subset of make without GNU extensions.

Speaking of reinventing the wheel, in 1998, I built a flexible almost framework for a "portable" generic installer using Java 2, JWT (native GUI controls), and JNI on Windows to create a program group and desktop shortcut icon. The hilarious part was shipping a full JRE on a CD. It took forever to load but the additional time seemed impressive for expensive, niche software in a way similar to the now fake "loading..." delayed progress bar.

</old-guy-high-school-glory-days-and-nobody-today>

show comments

iamthepieman

I get the solution for this and I know what all the terms mean. But I don't understand the problem. Whether it's facetious or hyperbole or whatever, I just don't get who or what circumstances this is addressing.

This is written like a Jeopardy answer. I just don't know what the question is.

Can anyone enlighten me?

show comments

DHaldane

It's ok to build a compiler sometimes -- it's just very important to make that choice intentionally

show comments

Pedro_Ribeiro

Having recently built 90% of a compiler by mistake, I felt like this post was written specifically about me. Hilarious writing, congrats to the author.

show comments

PittleyDunkin

I don't think building compilers is that bad, tbh. It's very difficult to do this without realizing it.

I've written a dozen different programs that might be considered compilers; some very simple, others very complex and whose life continued once I left the organization. Writing a functional compiler that provides the needs of the organization where existing tooling doesn't takes discipline and focus on what you actually want to accomplish. I don't know what "defining a struct inside a loop" might mean and this strikes me as, very obviously, having no clue what you actually want to build.

Perhaps the issue is not building a compiler but rather the lack of focus to begin with.

bsenftner

Back in the earlier days of AI, not that early, but the late 80's I was the lead developer for an AI research program being jointly conducted by 3 business professors from MIT, Harvard, and Boston University. We were working on "frame based knowledge representation" - frame of reference based node links between nodes containing something: a number, a word, a sentence, or a "function that combines linked nodes into a new frame of reference".

Long story short, we thought we were making a new type of N-dimensional spreadsheet, but after 3 semesters of work one of the advisors at MIT told us we need to meet his colleague, and that guy informed us we had a working compiler for a hybrid of Lisp and C.

pvg

Thread at the time https://news.ycombinator.com/item?id=29891428

swyx

I wrote a similar recently: Oops! you built a database https://news.ycombinator.com/item?id=34941650

direct link https://dx.tips/oops-database

show comments

praptak

The conclusion is similar to the Greenspun quote

"Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

show comments

teaearlgraycold

I know of someone that did this for a bespoke form definition language to drive onboarding. Tens of thousands of lines, months of delays, and a bus factor of 1 later it was all eventually ripped out and replaced with plain old page templates. When your 10 question onboarding flow has a back-end class named “PredicateEvaluator” something is wrong.

habitue

So, we should build more compilers. The only limiting factor, I think, is that it's damn hard to design a good DSL that wraps your domain well and is neither too flexible (increasing boilerplate) nor too rigid (increasing workarounds and escape hatch usage).

But generically, a compiler is the exact kind of thing you want when you're doing "Take this data structure and transform it into this other data structure". In a traditional compiler, we usually deserialize the first data structure from a string (parsing), call that data structure a CST, validate the data structure (syntax & type checking), do the transform, then serialize the output.

This kind of validate and transform pattern is all over programming though. And it's pretty easy to test with things like property tests. So yeah, we should build little compilers more as abstraction boundaries in our code.

neilv

I had this kind of risk in mind when I wrote a server-side "HTML template" feature for Racket.

The template language intentionally only handles static chunks of HTML, escaping of values, and a little safety guards.

Everything else (including the usual template language behavior like iterating over a collection/stream, such as from a database query result) is done with arbitrary normal Racket language, which the template feature's implementation doesn't have to know about nor handle specially.

https://www.neilvandyke.org/racket/html-template/

More recently (for employability reasons, or under-resourced startup pragmatics), doing Python with Flask, JavaScript with SvelteKit, and Swift with SwiftUI, I still miss the clean simplicity and available power that I had with Scheme/Racket.

vishnugupta

There’s an insider joke at Uber that if you start out building configuration manager you’ll end up with a full blown version control system.

dgfitz

Man, the yocto framework could do for a read over of this.

show comments

taeric

I am not clear on why reaching for an existing compiler's AST would ever be top of list?

Don't get me wrong. I think many language design points should be used more. But starting from scratch makes a ton of sense. Skip the parsing stage and build up supported AST style constructs of your own.

Done simply, this is basically the command pattern. Keep execution separate from declaration and you should be fine?

Sure, you may want a parser for a dedicated serialization language some day. Hard to think you need start there?

But starting with the full AST of an existing language feels like a terrible idea. In any world.

tda

So what do you use to know if you need to build it yourself or if there is already something out there? Niot being able to find a tool for the problem does not mean it doesn't exist, just that you haven't found it. Especially when you lack the familiarity with the problem to know the correct keywords.

I find ChatGPT to be of great help to explore the area, find relevant keywords or the name of the research domain. Sometimes you really need to know exactly what you are looking for before you can find the link to that one super helpful github library that solves you problem. The of course the next step is figuring out if you want to take on the dependency or not...

I have wasted hours searching for an (analytical) inverse kinematics library for robotic arms. There are tons of slow non analytical libraries out there, and some horrible ones like ikfast that is a effectively a code generator that spits out c that can be compiled with python bindings. I eventually did find https://github.com/Jmeyer1292/opw_kinematics, which someone ported rust (for which it was easy to create python bindings).

einpoklum

A point to note here, is that even if you're working on a software system that already _is_ a compiler, you might still find you're building a small, different compiler somewhere else within that project.

https://imgflip.com/i/9be66w

tn1

Many older .NET applications saved programmers from this by providing "C# scripts". The framework includes the compiler and then it's trivial to use the compiled artifact. You can still do it by including the Roslyn libraries. I don't see it as much anymore, or it's some half-baked Python or Lua interface.

show comments

casey2

It's presented like a magic fact of life, in reality people do what they are taught and are quite impotent without knowledge, most universities have some sort of compiler course probably using the dragon book, or a derivative copy and these students proceed to go out into the real world and write more or less the same implementation they saw in school with the same mistakes.

Compilers are interesting, but there is literally no proof that they are optimal for any of their popular applications. Which is what I think you are trying to imply by this narrative you have constructed of people constantly reinventing compilers. This is just the same propagandist argument lispweenies make to claim that their language is special.

layer8

But can it send email?

ndesaulniers

I'm reminded of the (broken) C parser in the Linux kernel used for modversions. God forbid you just use libclang.

brunospars

every config parser is a compiler. if platforms (e.g. programming languages) made run-time plugins easier, we wouldn't even have config files.

Imagine a config file with type checking and control flow. You have it-- it's your programming language. you just need to load the code at runtime, like erlang.

show comments

stmw

I have seen this happen countless of times at companies large and small. The article is brilliant in humorously highlighting the denial (usually) or the lack of knowledge (sometimes) that leads engineers down this path again and again.

benrutter

This was a fun read! It has a link at the bottom to "if architects had to work like software engineers" which sounds fun, but the link no longer works, and searching doesn't bring anything up.

Anyone here know where I can find it?

show comments

ok123456

Goes along with, "Dear sir, you have built a lisp"---usually from first principles, ad hoc, and with glaring defects.

torginus

I do not understand this rant. If you have the vagues pretensions of being an actual software engineer, and your file format isn't brain-dead simple, the way to parse it tokenize->grammar based parser->ast binding phase. ASTs are simple recursive data structures, if you handle them correctly, it doesn't matter if they contain 50 or 5000 nodes or how they nest, as long as the code is correct.

SSA is a nice ish format for representing program code, but it's not the only choice and may or may not be appropriate for your domain. For example, if your language describes data instead of control flow, imo SSA is a bad choice.

I have done this and if you take care to do things right, you won't need to bother with these hacky corner cases.

ris

Terraform

w10-1

CIS-10: encoding is the essence of the data+algorithms dyad

WORK-n: write an interpreter

WORK-n+1: add indirection

...

akshayshah

I enjoyed the article, but the unintentional Easter egg at the end left me in stitches: the link to “If Architects had to work like Programmers” just 404s, which feels spot on.

show comments

fragmede

So at what point does Kubernetes become justified?

show comments