I have, and to be fair that has solved the “basically incorrect code” issue with reasonable regularity. Occasionally the error messages don’t seem helpful enough for it, which is understandable, and I’ve had a few occurrences of it getting “stuck” in a loop trying to e.g. use an invalid addressing mode (it may have gotten itself out of those situations if I were more patient) but generally, with one of the Claude 4 models in agent mode in cursor or Claude code, I’ve found it’s possible to get reasonably good results in terms of “does it assemble”.I’m still working on a good way to integrate more feedback for this kind of workflow, e.g. for the attempt it made at AP bootstrap - debugging that is just hard, and giving an agent enough control over the running code and the ability to extract the information it would need to debug the resulting triple fault is an interesting challenge (even if probably not all that generally useful).I have a bunch of pretty ad-hoc test harnesses and the like that I use for general hosted testing, but that can only get you so far in this kind of low-level code.

&gt; They’ll often generate code that won’t quite assembleHave you tried using a coding agent that can run the compiler itself and fix any errors in a loop?The first version I got here didn&#x27;t compile. Firing up Claude Code and letting it debug in a loop fixed that.

As far as i can tell they have trouble with sustained satisfaction of multiple constraints, and asm has more of that than higher level languages. (An old Boss once said his record for bug density was in asm: he&#x27;d written 3 bugs in a single opcode)

Similar experience - they seem to generally have a lot more problems with ASM than structured languages. I don&#x27;t know if this reflects less training data, or difficulty.

The few times I&#x27;ve messed with it I&#x27;ve noticed they&#x27;re pretty bad at keeping track of registers as they move between subroutines. They&#x27;re just not great at coming up with a consistent &quot;sub language&quot; the way human assembly programmers tend to.

This fits my experience. I’m definitely getting considerably better results with 4 than previous Claudes. I’d essentially dropped sonnet from my rotation before 4 became available, but now it’s a go-to for this sort of thing.

A bit tangential, but I&#x27;ve found 4 Sonnet to be much, much better at SIMD intrinsics (in my case, in Rust) than Sonnet 3.5 and 3.7, which were kind of atrocious. For example, 3.7 would write a scalar for loop and tell you &quot;I&#x27;ve vectorized...&quot;, when I explicitly asked to do the operations with x86 intrinsics and gave it the capabilities of the hardware. Also, telling it to use AVX2 as supported would not make it use SSE or it would make conditionals to use them, which makes no sense. Seems Claude 4 solves most of that.Edit: that -&gt; than

I’ve been trying out various LLMs for working on assembly code in my toy OS kernel for a few months now. It’s mostly low-level device setup and bootstrap code, and I’ve found they’re pretty terrible at it generally. They’ll often generate code that won’t quite assemble, they’ll hallucinate details like hardware registers etc, and very often they’ll come up with inefficient code. The LLM attempt at an AP bootstrap (real-mode to long) was almost comical.All that said, I’ve recently started a RISC-V port, and I’ve found that porting bits of low-level init code from x86 (NASM) to RISC-V (GAS) is actually quite good - I guess because it’s largely a simple translation job and it already has the logic to work from.

It&#x27;s supposed to be faster than mov ecx, 0.

I haven&#x27;t done much asm - I&#x27;m guessing that a human would do `xor ecx, ecx` as well?

Have you found anything else? If so, could you provide rationale?

Could have done worse:<pre><code> loop $</code></pre>

I wonder how many demoscene productions it was trained on. Probably not many, because stuff like this sticks out like a sore thumb:<pre><code> xor eax, eax ; z_real = 0 
 xor ebx, ebx ; z_imag = 0 
 mov ecx, 0 ; iteration counter</code></pre>

Tangent: godbolt.org greeted me with a popup but boy, I have never seen a clearer privacy notice, minimal possible data retention, including a diff with the last version. Great job, Matt!

It can also do a pretty good 3d star field: <a href="https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;a7v4xnbef" rel="nofollow">https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;a7v4xnbef</a>First try worked but didn&#x27;t use correct terminal size.

Yeah it&#x27;s an excellent resource, ditto all of mathr.co.uk :)If you&#x27;d like to join our great little fractal community, here&#x27;s a Discord invite link: <a href="https:&#x2F;&#x2F;discord.gg&#x2F;beKyJ8HSk5" rel="nofollow">https:&#x2F;&#x2F;discord.gg&#x2F;beKyJ8HSk5</a>

Not to be confused with the excellent Mandelbook[0] and related work on the Mandelbrot[1] by Claude Heiland-Allen :)[0]: <a href="https:&#x2F;&#x2F;mathr.co.uk&#x2F;mandelbrot&#x2F;book-draft-2017-11-10.pdf" rel="nofollow">https:&#x2F;&#x2F;mathr.co.uk&#x2F;mandelbrot&#x2F;book-draft-2017-11-10.pdf</a>[1]: <a href="https:&#x2F;&#x2F;mathr.co.uk&#x2F;web&#x2F;mandelbrot.html" rel="nofollow">https:&#x2F;&#x2F;mathr.co.uk&#x2F;web&#x2F;mandelbrot.html</a>

I think that opening a window and rendering something inside it using the native Win32 API from assembly code on Windows would not be so terrifyingly complex. It&#x27;s just more code as it needs to call the appropriate GUI APIs (not just syscalls), and it&#x27;s OS-specific... but such code is anyway always OS-specific (the one mentioned here seems to be for Linux, given the used syscalls). No idea how complex it would be with X or on Mac, as I don&#x27;t know their low-level GUI APIs.

It does.... I was just surprised that it turned up as terminal output - for some reason I was expecting something in some form of GUI window for some OS or other but I guess that&#x27;s orders of magnitude more complex and more likely to not work. But he did actually ask for ASCII output, so that does make sense - unlike my assumption!

Googling &quot;Mandelbrot set in assembly&quot; returns a bunch of examples of this.

OP may want to test this setup here [0]. This is a bit more challenging than replacing a google query with a LLMs pipeline[0] <a href="https:&#x2F;&#x2F;code.golf&#x2F;mandelbrot#assembly" rel="nofollow">https:&#x2F;&#x2F;code.golf&#x2F;mandelbrot#assembly</a>

Mmmmyeah, well, one thing LLM are very decent at is translating, it being from human language to human language or from code to code, so not sure your point stands.

It does fine on Arm assembly (and Neon).

Might be interesting to try this in ARM assembly where it&#x27;s a lot less likely to be existing code in the training set.

The x64 assembly would probably work natively on the Mac, no need for docker, provided the 2 syscall numbers (write and exit) are adjusted. Which llms can likely do.If it’s an ARM Mac, under Rosetta. Otherwise directly.

Llm is useless in real world codebase. Tons of hallucination and nonsense. Garbagd everywhere. The danger thing is they messed things up rdomly, o consistence at all.It is fine to treat it as a better autocompletion tool.

&gt; which might not be the ones you wantAnd in your experience, how often is that ?

It&#x27;s a lot more agentic. I&#x27;m an Aider fan and use it the most because I prefer its simplicity, but it tends to require you be more &quot;involved&quot; in development and decision making than tools like Claude Code which can cycle around more on their own to figure things out and make decisions (which might not be the ones you want).

Given the price of Claude Code I&#x27;m surprised that not more people go the route of using claude through aider with copilot or something like that. Is Claude Code the tool worth the extra expense?

Ascii art mandelbrot seems like the perfect toy example with tons of examples to copy from. That&#x27;s what LLMs are really good at.

I actually expected the struggle to continue based on experience. Though these things can produce some magical results sometimes.

In very old CPUs and in many microcontrollers, floating-point operations are slower than integer operations.However, in many old but not very old CPUs, e.g. in many from the last decade of the 20th century and from the first decade of this century, floating-point multiplication and division were much faster than integer multiplication and division.So for those CPUs, which include models like Pentium 4, or even the first Intel Core models (those older than Nehalem, which got a dedicated integer multiplier), there were many cases when converting an integer computation into a floating-point computation could increase the speed many times.

Floating point calculations used to be a lot slower than integer calculations so it was very common to use fixed point numbers. Also for good performance, you would usually not do what this code does, calculate the coordinates for each pixel explicitly. Instead you would calculate the coordinates of the starting corner and the delta between adjacent pixels and then just add the delta each time you move to the next pixel. That is also generally easier to do with fixed point numbers as adding 0.1 with floating point numbers a 1000 times will not yield 100 because 0.1 is not exactly representable with base 2 floating point numbers. For this visualization this probably does not matter too much, but if you care that you are not slightly off and you want to calculate stuff incrementally, then doing it in fixed point might make things easier. I have no clue about the first point on current hardware, if floating point calculations are still notably less performant, if there is a relevant difference in the number of execution units and so on.

Maybe because of existing code like <a href="https:&#x2F;&#x2F;svn.fractint.net&#x2F;trunk&#x2F;fractint&#x2F;dos&#x2F;calcmand.asm" rel="nofollow">https:&#x2F;&#x2F;svn.fractint.net&#x2F;trunk&#x2F;fractint&#x2F;dos&#x2F;calcmand.asm</a>

&gt; Integers are much faster to process than floating pointThis was common and accepted knowledge circa 1990-2005.Is that still the case?In 2025, I&#x27;m not so sure, and if speed is indeed what you&#x27;re after, what you typically want is to pump calculations into both the floating point units and the integer units in parallel using SIMD instructions, something that should be easy on an embarrassingly parallel problem like Mandelbrot.And regarding precision ... I was under the impression that proving with 100% certainty that a point (x,y) actually belongs to the Mandelbrot set is basically impossible unless you prove some sort of theorem about that specific point.The numerical method is always something along the lines of &quot;as long as the iteration hasn&#x27;t diverged after N iterations we consider the point to be inside&quot;.And both N and the definition of &quot;diverged&quot; are usually completely arbitrary ... so precision, meh, unless you&#x27;re going to try draw something with a zoom of 1e+20 on the frontier.

This is _very_ common, so I&#x27;m not surprised an LLM would use the method.Integers are much faster to process than floating point, so if you have a fixed acceptable lower level of precision it is usually a good optimisation to scale your values so they exist in integer space, so 12345678 instead of 1.2345678, perform the mass calculations on those integers, and scale back down for subsequent display.As well as speed, this also (assuming your scaling to ints gives enough precision for your use) removes the rounding issues inherent with floating point (see <a href="https:&#x2F;&#x2F;floating-point-gui.de&#x2F;" rel="nofollow">https:&#x2F;&#x2F;floating-point-gui.de&#x2F;</a>), which can balloon over many iterations. This is less important to fractal calculations then the speed issue, though can be visible at very high magnifications, and a bigger issue for things like monetary work. Monetary values are often stored as pennies (or other local minimum subdivision) rather than the nominal currency, so £123.45 is stored and processed as 12345, or for thing like calcs where factions of pennies may be significant, thousandths of pennies (£122.45 -&gt; 12345000).

Maybe it would have generated with floats if it was prompted for a generator in x87 assembly? It did originate as an extension to x86 on a separate chip, so it could explain the AI sticking to integers.

The code seem to be doing calculations with integers instead of floats.If so, why?

Mandelbrot in x86 Assembly by Claude