> sp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan. It works with the big compilers and it works with TCC.
> And, best of all, it does all all of that because it’s small, not because it’s big.
vs
> Non-goals
> Obscure architectures and OSes
> I write code for x86_64 and aarch64. WASM is becoming more important, but is still secondary to native targets. I don’t care to bloat the library to support a tiny fraction of use cases.
> That being said, if you’re interested in using the library on an unsupported platform, I’m more than happy to help, and if we can make the patch reasonable, to merge it.
Those are contradictory. Either the code is extremely portable, or it can't support "obscure" platforms, but not both.
Zig, one of the giants upon whose shoulders this library stands, coined a name for this
almost-but-not-quite-UTF encoding: WTF-8 and WTF-16. These encodings mean, simply, the
same as their UTF counterpart but allowing unpaired surrogates to pass through.
To give credit where credit is due, both WTF-8 and WTF-16 were devised by Simon Sapin [1] and Zig simply picked them up.
Wait, is a compound literal an l-value in that sense (as opposed to, just being able to take its reference)?! Take a look at the C99 standard Oh my, it indeed is (C99 §6.5.2.5 p5). Good to know!
show comments
p4bl0
First, thanks for sharing this link, it was an interesting read! A few remarks below.
I had a hard time reading the wc code in the article. First I had to go to the GitHub to understand that "da" stands for dynamic array, and then understand that what the author calls wc is not at all the wc linux commands, which by default gives you the number of lines, words, and characters in a file, not the count of occurrences of each word in the file, which is what the proposed code does.
Also, since I had to read the GitHub README, another remark: it says that sp_io uses pthreads rather than fork and exec. Both of those approach (but especially pthreads) are contradictory to the explicit goals of programming against lowest level interfaces. I believe the lowest level syscall is clone3 [1], which gives you more fine grained control on what is shared between the parent and child processes, allowing to implement fork or threads.
I agree that pointer and length is better than null-terminated strings (although it is difficult in C, and as they mention you will have to use a macro (or some additional functions) to work this in C).
Making the C standard library directly against syscalls is also a good idea, although in some cases you might have an implementation that needs to not do this for some reason, generally it is better for the standard library directly against syscalls.
FILE object is sometimes useful especially if you have functions such as fopencookie and open_memstream; but it might be useful (although probably not with C) to be able to optimize parts of a program that only use a single implementation of the FILE interface (or a subset of its functions, e.g. that does not use seeking).
show comments
teo_zero
Interesting project! I'm eagerly reading through it.
Probably I would have made different choices. For example, I'd rather have many modules that can be individually included, than one giant file.
Also from a purely aesthetic point of view, I would have opted for more readable function and type names: no sp_ prefix, recognizable names like dict istead of ht, vec instead of da, etc.
And I know there are compilers out there still stuck in the 90s, but I would have targeted C23, these days.
But that would be my highly opinionated library!
P.S. be aware that word frequency is not what the standard 'wc' does.
Retr0id
> Program directly against syscalls
Works nicely on Linux where the syscall interface is explicitly stable, but on many (most?) other platforms this is not the case.
> There Is No Heap
I don't understand what this means, when it's followed by the definition of a heap allocation interface. The paragraph after the code block conveys no useful information.
> Null-terminated strings are the devil’s work
Agreed! I also find the stance regarding perf optimization agreeable.
show comments
skybrian
My impression of the sample programs is that they're unreadably noisy, but maybe this would be a good compiler target if you're writing your own language?
show comments
tialaramex
> I’ve been working on fixing C by giving it a high quality, ultra portable standard library
If the only problem with C was that the stdlib is terrible that would be a very different situation.
There are much more fundamental problems with the language. Problems that are entirely understandable in K&R C but aren't acceptable half a century later. A "high quality" standard library can't fix these problems. In some cases it can paper over them though not others, and even then the actual problem wasn't fixed it's just not obvious with superficial examination any more.
First, the type system is crap. The array types don't work across function boundaries, there's no Empty type at all, you are provided with a user defined product type with names, but not one without names etc. There is no fat pointer type, slice reference, nothing like that.
Second, naming is also crap. There's no namespacing feature provided so you're left with the convention of picking a few letters as a prefix and hoping it doesn't overlap and yet is succinct enough to not be annoying.
Third, everything coerces, all the coercions you could want if you like coercions, and then ten times that many on top. Some people really like coercions, C will see them learn that actually they don't like them that much.
show comments
Panzerschrek
It's a disadvantage, that it's header-only. It needs to include <windows.h> and a bunch of other stuff, which slow-downs compilation. Splitting it into a couple of files (a header and an implementation) would be much better.
show comments
pjmlp
We should have left C in the 90's already, but then FOSS happened,
"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."
How does this library work in programs with parts still requiring libc?
How does it deal with code executing before main? Libc does a bunch of necessary stuff, like calling initializers for global variables.
show comments
504118318
Just taking a quick look at the atomics section:
First, (on unix) it's wrapping pthread mutex. That's part of libc! (Technically it might not be libc.so, but it's still the standard library.)
Also, none of the atomics talk about the memory model. You don't _have_ to use the C11 memory model (Linux, for example, doesn't). But if you're not using the C11 memory model and letting the compiler insert fences for you, you definitely need to have fence instructions, yourself.
While C11 atomics do rely on libgcc, so do the __sync* functions that this library uses (see https://godbolt.org/z/bW1f7xGas) for an example.
Oops... apparently this is vibecoded. Welp, I just wasted ten minutes of my life reviewing slop that I'm not going to get back.
How do they all know that it is vibe-coded?
I missed the meeting where they were handing out vibe-code-detectors?
Please, describe..
P.S. sad to see that HN becomes a witch hunting place
show comments
charcircuit
I do not want to include and compile a standard library for every file that includes it.
Why do standard library headers always have to be insane?
show comments
KnuthIsGod
"The library’s stance, to put it simply, that the juice ain’t worth the squeeze when it comes to low level, compute-bound performance.
Designing software and data structures for performance against unknown use cases on unknown hardware is extremely difficult and the resulting code is much more complicated. Even then, it’s often better to use code written against your actual use case and hardware when performance is that critical.
Things that are off the table might be:
SIMD
A highly optimized hash table rewrite
Figuring out where inlining or LIKELY causes the compiler to produce better code."
LOL...
Classic vibe coder.
TZubiri
> Every language that depends on third party libraries, like js and python, is getting massively infected with supply chain worms
> Only couple of languages not affected are those that don't have a culture of downloading third party code, like C and C++
> Ex js and python developer publishes a 'library'
> Library is vibe coded
> Published on github amidst GitHub being hit by supply chain attacks, had their source code leaked.
The timing is terrible for starters, and I don't trust the vibe coded code at all. Imagine a pandemic and the cities are on fire, and you arrive to a rural town asking to kiss people.
> Principles
> Be extremely portable
> sp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan. It works with the big compilers and it works with TCC.
> And, best of all, it does all all of that because it’s small, not because it’s big.
vs
> Non-goals
> Obscure architectures and OSes
> I write code for x86_64 and aarch64. WASM is becoming more important, but is still secondary to native targets. I don’t care to bloat the library to support a tiny fraction of use cases.
> That being said, if you’re interested in using the library on an unsupported platform, I’m more than happy to help, and if we can make the patch reasonable, to merge it.
Those are contradictory. Either the code is extremely portable, or it can't support "obscure" platforms, but not both.
https://spader.zone/sp/#null-terminated-strings-are-the-devi...
Pointer/length is not just for strings - but for all arrays.
See my proposal:
https://www.digitalmars.com/articles/C-biggest-mistake.html
[1] https://wtf-8.codeberg.page/
Wait, is a compound literal an l-value in that sense (as opposed to, just being able to take its reference)?! Take a look at the C99 standard Oh my, it indeed is (C99 §6.5.2.5 p5). Good to know!First, thanks for sharing this link, it was an interesting read! A few remarks below.
I had a hard time reading the wc code in the article. First I had to go to the GitHub to understand that "da" stands for dynamic array, and then understand that what the author calls wc is not at all the wc linux commands, which by default gives you the number of lines, words, and characters in a file, not the count of occurrences of each word in the file, which is what the proposed code does.
Also, since I had to read the GitHub README, another remark: it says that sp_io uses pthreads rather than fork and exec. Both of those approach (but especially pthreads) are contradictory to the explicit goals of programming against lowest level interfaces. I believe the lowest level syscall is clone3 [1], which gives you more fine grained control on what is shared between the parent and child processes, allowing to implement fork or threads.
[1] https://manpages.debian.org/trixie/manpages-dev/clone3.2.en....
I agree with most of the criticisms they make.
I agree that pointer and length is better than null-terminated strings (although it is difficult in C, and as they mention you will have to use a macro (or some additional functions) to work this in C).
Making the C standard library directly against syscalls is also a good idea, although in some cases you might have an implementation that needs to not do this for some reason, generally it is better for the standard library directly against syscalls.
FILE object is sometimes useful especially if you have functions such as fopencookie and open_memstream; but it might be useful (although probably not with C) to be able to optimize parts of a program that only use a single implementation of the FILE interface (or a subset of its functions, e.g. that does not use seeking).
Interesting project! I'm eagerly reading through it.
Probably I would have made different choices. For example, I'd rather have many modules that can be individually included, than one giant file.
Also from a purely aesthetic point of view, I would have opted for more readable function and type names: no sp_ prefix, recognizable names like dict istead of ht, vec instead of da, etc.
And I know there are compilers out there still stuck in the 90s, but I would have targeted C23, these days.
But that would be my highly opinionated library!
P.S. be aware that word frequency is not what the standard 'wc' does.
> Program directly against syscalls
Works nicely on Linux where the syscall interface is explicitly stable, but on many (most?) other platforms this is not the case.
> There Is No Heap
I don't understand what this means, when it's followed by the definition of a heap allocation interface. The paragraph after the code block conveys no useful information.
> Null-terminated strings are the devil’s work
Agreed! I also find the stance regarding perf optimization agreeable.
My impression of the sample programs is that they're unreadably noisy, but maybe this would be a good compiler target if you're writing your own language?
> I’ve been working on fixing C by giving it a high quality, ultra portable standard library
If the only problem with C was that the stdlib is terrible that would be a very different situation.
There are much more fundamental problems with the language. Problems that are entirely understandable in K&R C but aren't acceptable half a century later. A "high quality" standard library can't fix these problems. In some cases it can paper over them though not others, and even then the actual problem wasn't fixed it's just not obvious with superficial examination any more.
First, the type system is crap. The array types don't work across function boundaries, there's no Empty type at all, you are provided with a user defined product type with names, but not one without names etc. There is no fat pointer type, slice reference, nothing like that.
Second, naming is also crap. There's no namespacing feature provided so you're left with the convention of picking a few letters as a prefix and hoping it doesn't overlap and yet is succinct enough to not be annoying.
Third, everything coerces, all the coercions you could want if you like coercions, and then ten times that many on top. Some people really like coercions, C will see them learn that actually they don't like them that much.
It's a disadvantage, that it's header-only. It needs to include <windows.h> and a bunch of other stuff, which slow-downs compilation. Splitting it into a couple of files (a header and an implementation) would be much better.
We should have left C in the 90's already, but then FOSS happened,
"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."
The GNU Coding Standard in 1994, http://web.mit.edu/gnu/doc/html/standards_7.html#SEC12
How does this library work in programs with parts still requiring libc?
How does it deal with code executing before main? Libc does a bunch of necessary stuff, like calling initializers for global variables.
Just taking a quick look at the atomics section:
First, (on unix) it's wrapping pthread mutex. That's part of libc! (Technically it might not be libc.so, but it's still the standard library.)
Also, none of the atomics talk about the memory model. You don't _have_ to use the C11 memory model (Linux, for example, doesn't). But if you're not using the C11 memory model and letting the compiler insert fences for you, you definitely need to have fence instructions, yourself.
While C11 atomics do rely on libgcc, so do the __sync* functions that this library uses (see https://godbolt.org/z/bW1f7xGas) for an example.
Oops... apparently this is vibecoded. Welp, I just wasted ten minutes of my life reviewing slop that I'm not going to get back.
Bun and Anthropic wants to know your location.
This doesn't look good:
since There should be a fallback for very long paths.I love how hyper-opinionated this is.
Best library name.
not one mention of Zig on the whole page?
DJB was saying similar things in the 1990s -- eg https://cr.yp.to/proto/netstrings.txt
How do they all know that it is vibe-coded? I missed the meeting where they were handing out vibe-code-detectors?
Please, describe..
P.S. sad to see that HN becomes a witch hunting place
I do not want to include and compile a standard library for every file that includes it.
Why do standard library headers always have to be insane?
"The library’s stance, to put it simply, that the juice ain’t worth the squeeze when it comes to low level, compute-bound performance.
Designing software and data structures for performance against unknown use cases on unknown hardware is extremely difficult and the resulting code is much more complicated. Even then, it’s often better to use code written against your actual use case and hardware when performance is that critical.
Things that are off the table might be:
SIMD A highly optimized hash table rewrite Figuring out where inlining or LIKELY causes the compiler to produce better code."
LOL...
Classic vibe coder.
> Every language that depends on third party libraries, like js and python, is getting massively infected with supply chain worms
> Only couple of languages not affected are those that don't have a culture of downloading third party code, like C and C++
> Ex js and python developer publishes a 'library'
> Library is vibe coded
> Published on github amidst GitHub being hit by supply chain attacks, had their source code leaked.
The timing is terrible for starters, and I don't trust the vibe coded code at all. Imagine a pandemic and the cities are on fire, and you arrive to a rural town asking to kiss people.
Wonderful !
Yet another slop coded library.
What could possibly go wrong...