Memory Safe Inline Assembly

166 points39 comments3 days ago
rurban

> If FilPizlonator determined that the inline assembly is not safe, then it'll replace it with a Fil-C panic. That panic will provide diagnostics about why the assembly was rejected.

Most stupid thing I ever heard. If a safety violation is known at compile-time, you error at compile-time. You might never catch it in a test, and there you have the panic at the customer. He will be pleased.

show comments
mananaysiempre

> While reviewing folks' C and C++ code, I've found the following reasons for inline assembly, where 1 is most common:

3a. rdpru (similar issues to cpuid) or rdpmc perhaps surrounded with lfence or cpuid inside the same assembly chunk

For obvious reasons, this is somewhat niche and may not even make it into production code, but it’s also important when you do need it. It’s also memory safe. I guess in such cases you’d use fast C rather than Fil-C though.

4a. rseq

Probably even less feasible than atomics TBH, as such blocks will usually also contain control flow (at least that implied by to the nature of rseqs).

> Before the advent of AI, writing a parser for x86_64 assembly would have been such an annoying task that I might have never gotten around to implementing support for memory safe inline assembly [...].

It is annoying, but even before the advent of AI that didn’t stop the developers of TCC for instance.

With that said, given Fil-C is Clang/LLVM-based, shouldn’t an assembly parser, at least, be already available somewhere? I was under the impression that Clang (unlike GCC) actually parsed asm blocks.

show comments
anitil

I find it charming that to distinguish Fil-C from the K&R language they use the term 'Yolo-C'. I have never used inline asm before, I actually didn't realise how much behaviour it's specifying! (When I've needed asm it was on non-gcc compilers)

Edit to add: If I'm understanding this correctly we should be able to run this against projects and detect asm violations, I feel like this would be very valuable to be able to feed these back to maintainers

show comments
jdw64

What is more frightening about this than safe C assembly is that this level of implementation is achievable not with a SOTA model, but with a cost effective model like KIMI. There was human judgment involved in the middle, but reading the article, My reading of the process is as follows:

1.A developer identified the necessity of inline assembly.

2.Defined the safety boundaries for 'memory-safe' inline assembly.

3.Established strict policies for memory access.

4.Curated an allowlist of permissible instructions.

5.Set rigorous test criteria and 'done' conditions.

In short, with the overall guardrails in place, a sub agent loop was run, and this level of code was produced. This raises a number of interesting points about how we should use AI. I haven't looked at all the code, but the idea of passing assembly through safe zones without memory access, and using that as a foundation to achieve this level of implementation through AI, is quite impressive

show comments
anitil

Do we have a sense for how many of the programs that work [0] are now detected as having asm violations?

[0] https://fil-c.org/programs_that_work

show comments
IAmLiterallyAB

I wonder if an adversarial user could bypass the checks and achieve memory corruption / code execution. Maybe not a practical attack in most situations but a fun exercise.

> This includes things like asm volatile("" : : : "memory"), which is an old-school way of saying atomic_signal_fence(memory_order_seq_cst).

Not quite. AIUI, the first is just a barrier for the compiler, while the second is also a CPU memory barrier. Godbolt seems to confirm that.

https://godbolt.org/z/a844zKej8

show comments
torginus

> we can validate if an inline assembly expression is safe by ... Ensuring that the assembly's effects are fully captured by the constraints. For example, if an assembly instruction modifies a register, then the constraints must capture that register mutation...

I mean, I'm not sure if LLVM parses the assembly (I strongly suspect it does, I remember inline GCC assembly allowed stuff like referencing variables in asm), shouldn't LLVM figure out that the asm modifies things its not supposed to?

If you clobber a register in asm the compiler stores something into, your code certainly won't work right.

show comments
petesergeant

Unrelated to this specific post, but on the subject of Fil-C and Programs That Work[0].

Let's say I compile curl using Fil-C, and later an exploitable memory bug is found in curl. The implication here is that my fil-c-compiled curl will crash safely, rather than be able to be exploited? And the "cost" to me is that my curl executable will be slower than the standard one?

0: https://fil-c.org/programs_that_work

dataflow

Unrelated question but since you're here: what's the state of support for Boost?

show comments
sureglymop

While we're at it, does anyone else want something like a good LSP but for assembly?

I mean one that infers as much context as possible and tries to help as much as possible.

This has to be assembler specific of course. For example, I use fasm which has higher level macros. An LSP could suggest struct fields and other stuff.

ozgrakkurt

You are not thinking straight if you are making out of bounds errors in inline asm.

Inline asm should take 10x or more effort compared to writing the surrounding c++ code and should be tested with protected pages at the edges if possible. It should always have assertions before/after that check invariants too.

Also there are at a lot of cases that this won’t work. One example is implementing strlen using avx512 where you want to align the address down to a multiple of 64 and run until the end of the page, so you can do simd while avoiding segfault.

Another example is just handling loop remainders with masking in avx512.

Also it is pretty naive to think an LLM got this right

Overall it seems like a huge waste of time.

If you are writing inline asm and want to make it better, just get as many LLMs or, even better, humans to review it. LLMs are really good at finding mistakes in inline asm, with a high false positive rate though, so you have to understand the concept.

For example one bug I had was about not consuming the inputs before writing to the outputs. Compiler can assign the same register to input and outputs unless outputs are marked with & (or something like that). It was super frustrating to debug this until I asked an LLM and it found the problem.