Thoughts on Generating C

spankalee

I'm not a C programmer - having coded in high-level languages only for the past 20 years - but I've been doing a lot of WASM recently, and eagerly looking forward to the stack switching proposal so I don't have to implement an asincify-type transform for an async/await feature.

If it's true that a C program doesn't have control of the stack, what does that mean for supporting the stack switching in Wastrel? Can you not reify the stack and replace it with another from a suspended async function? Do you need some kind of userland stack for all stacks once you support WASM stack switching?

20k

Static inline functions can sometimes serve as an optimisation barrier to compilers. Its very annoying. I've run into a lot of cases when targeting C as a compilation target where swapping something out into an always-inline function results in worse code generation, because compilers have bugs sadly

There's also the issue in that the following two things don't have the same semantics in C:

    float v = a * b + c;

    static_inline float get_thing(float a, float b) {
        return a*b;
    }

    float v = get_thing(a, b) + c;

This is just a C-ism (floating point contraction) that can make extracting things into always inlined functions still be a big net performance negative. The C spec mandates it sadly!

uintptr_t's don't actually have the same semantics as pointers either. Eg if you write:

    void my_func(strong_type1* a, strong_type2* b);

a =/= b, and we can pull the underlying type out. However, if you write:

    void my_func(some_type_that_has_a_uintptr_t1 ap, some_type_that_has_a_uintptr_t2 bp) {
        float* a = get(ap);
        float* b = get(bp);
    }

a could equal b. Semantically the uintptr_t version doesn't provide any aliasing semantics. Which may or may not be what you want depending on your higher level language semantics, but its worth keeping the distinction in mind because the compiler won't be able to optimise as well

show comments

titzer

I think I may end up coming full circle on Virgil. Circa 2005 Virgil I compiled to C and then with avr-gcc to AVR. I did that because who the heck wants to write an AVR backend? Circa 2009 I wrote a whole new compiler for Virgil III and since then it has JVM, x86, x86-64, wasm, wasm-gc and (incomplete) arm64.

I like compiler backends, but truth be told, I grow weary of compiler backends.

I have considered generating LLVM IR but it's too quirky and unstable. Given the Virgil wasm backend already has a shadow stack, it should now be possible for me to go back to square one and generate C code, but manage roots on the stack for a precise GC.

Hmm....

show comments

whizzter

Having done this for a dozen of experiments/toys I fully agree with most of the post, would be nice if the the addition of must_tail attribute could be reliable across the big 3 compilers, but it's not something that can be relied on (luckily Clang seems to be fairly reliable on Windows these days).

2 additional points,

1: The article mentions DWARF, even without it you can use #line directives to give line-numbers in your generated code (and this goes a very long way when debugging), the other part is local variables and their contents.

For variables one can get a good distance by using a C++ subset(a subset that doesn't affect compile time, so avoid any std:: namespaced includes) instead and f.ex. "root/gc/smart" ptr's,etc (depending on language semantics), since the variables will show up in a debugger when you have your #line directives (so "sane" name mangling of output variables is needed).

2: The real sore point of C as a backend is GC, the best GC's are intertwined with the regular stack-frame so normal stack-walking routines also gives everything needed for accuracte GC (required for any moving GC designs, even if more naive generation collectors are possible without it).

Now if you want accurate somewhat fast portable stack-scanning the most sane way currently is to maintain a shadow-stack, where you pass prev-frame ptrs in calls and the prev-frame ptr is a ptr to the end of a flat array that is pre-pended by a magic ptr and the previous prev-frame ptr (forming a linked list with the cost of a few writes, one extra argument with no cleanup cost).

Sadly, the performant linked shadow-stack will obfuscate all your pointers for debugging since they need to be clumped into one array instead of multiple named variables (and restricts you from on-stack complex objects).

Hopefully, one can use the new C++ reflection support for shadow-stacks without breaking compile times, but that's another story.

show comments

kazinator

Generators don't have to put out portable code. You document what compilers are required for the output and that's something you can change with any given release of your generator. Then the generated code uses whatever works with those compilers. If you use the output with some other compiler, then that's undefined behavior w.r.t. the documentation of the generator; you are on your own. "Whatever works" could be something undocumented that works de facto.

Joker_vD

> And finally, source-level debugging is gnarly. You would like to be able to embed DWARF information corresponding to the code you residualize; I don’t know how to do that when generating C.

I think emitting something like

    #line 12 "source.wasm"

for each line of your source before the generated code for that line does something that GDB recognizes well enough.

show comments

kccqzy

I’ve done something similar during my intern days as well. We had a Haskell-based C AST library that supports the subset of C we generate, and an accompanying pretty printing library for generating C code that has good formatting by default. It really was a reasonable approach for good high-level abstraction power and good optimizations.

jbreckmckye

I wonder what Zig would be like as an ILR. Easy cross compilation, plus, you can compile with runtime checks to help debug your compiler output. Might be fun for a sideproject

sph

Has anyone defined a strict subset of C to be used as target for compilers? Or ideally a more regular and simpler language, as writing a C compiler itself is fraught with pitfalls.

show comments

WalterBright

I've thought of doing that, but it's too much fun writing an optimizer and code generator!

(My experience with "compile to C" is with cfront, the original C++ implementation that compiled to C. The generated code was just terrible to read.)

uecker

What language features would make C better as a target language for compilers?

show comments

rirze

Love how he put a paragraph for someone asking, "why not generate Rust?". Beautiful.

show comments

themafia

> it could be that you end up compiling a function with, like 30 arguments, or 30 return values; I don’t trust a C compiler to reliably shuffle between different stack argument needs at tail calls to or from such a function.

Yet you trust it to generate the frame for this leviathan in the first place. Sometimes C is about writing quality code, apparently, sometimes it's about spending all day trying to outsmart the compiler rather than take advantage of it.

bjourne

Last I checked static inline was merely a hint that compilers need not take. They all do, but by definition it's not a zero cost abstraction.

FpUser

This is weird. As soon as I thought about the subject the relevant article showed up on HN.

I was thinking about how to embed custom high level language into my backend application written in C++. Each individual script would compile to native shared lib loadable on demand so that the performance stays high. For this I was contemplating exactly this approach. Compile this high level custom language with very limited feature set to plain C and then have compiler that comes with Linux finish the job.

show comments

yearolinuxdsktp

Java JIT compilers perform function inlining across virtual function boundaries… this is why JIT’d Java can outperform the same C or C++ code. Couple it with escape analysis to transfer short-lived allocations to be stack-allocated (avoiding GC).

Often times virtual functions are implemented in C to provide an interface (such as filesystem code in the Linux kernel) via function pointers—-just like C++ vtable lookups, these cannot be inlined at compile time.

What I wonder is whether code generated in C can be JIT-optimized by WASM runtimes with similar automatic inlining.

yxhuvud

"static inline", the best way of getting people doing bindings in other languages to dislike your library (macros are just as bad, FWIW).

I really wish someone on the C language/compiler/linker level took a real look at the problem and actually tried to solve it in a way that isn't a pain to deal with for people that integrate with the code.

show comments