I have lived my whole professional life with this being 'beyond obvious'... It's hard to imagine a generation where it's not. But then again, I did work with EBCDIC for awhile and we were reading and translating ASCII log tapes (ITT/Alcatel 1210 switch, phone calls, memory dumps).
I once got drunk with my elderly unix supernerd friend and he was talking about TTYs and how his passwords contained embedded ^S and ^Q characters and he traced the login process to learn they were just stalling the tty not actually used to construct the hash. No one else at the bar got the drift. He patched his system to put do 'raw' instead of 'cooked' mode for login passwords. He also used backspaces ^? ^H as part of his passwords. He was a real security tiger. I miss him.
show comments
dcminter
It doesn't seem to have been mentioned in the comments so far, but as a floppy-disk era developer I remember my mind was blown by the discovery that DEL was all-bits-set because this allowed a character on paper tape and punched card to be deleted by punching any un-punched holes!
show comments
fix4fun
For me was interesting that all digits in ASCII starts with 0x3, eg. 0x30 - 0, 0x31 - 1, ..., 0x39 - 9. I thought it was accidental, but in real it was intended. This was giving possibility to build simple counting/accounting machines with minimal circuit logic with BCD (Binary Coded Decimals). That was wow for me ;)
show comments
kazinator
This is by design, so that case conversion and folding is just a bit operation.
The idea that SOH/1 is "Ctrl-A" or ESC/27 is "Ctrl-[" is not part of ASCII; that idea comes from they way terminals provided access to the control characters, by a Ctrl key that just masked out a few bits.
show comments
kazinator
If Unicode had used a full 32 bits from the start, it could have usefully reserved a few bits as flags that would divide it into subspaces, and could be easily tested.
Imagine a Unicode like this:
8:8:16
- 8 bits of flags.
- 8 bit script family code: 0 for BMP.
- 16 bit plane for every script code and flag combination.
The flags could do usefuil things like indicate character display width, case, and other attributes (specific to a script code).
Unicode peaked too early and applied an economy of encoding which rings false now in an age in which consumer devices have two digit gigabyte memories, multi terabyte of storage, and high definition video is streamed over the internet.
taejavu
For whatever reason, there are extraordinarily few references that I come back to over and over, across the years and decades. This is one of them.
show comments
california-og
I made an interactive viewer some time ago (scroll down a bit):
I came across this a week ago when I was looking at some LLM generated code for a ToUpper() function. At some point I “knew” this relationship, but I didn’t really “grok” it until I read a function that converted lowercase ascii to uppercase by using a bitwise XOR with 0x20.
It makes sense, but it didn’t really hit me until recently. Now, I’m wondering what other hidden cleverness is there that used to be common knowledge, but is now lost in the abstractions.
show comments
jez
I have a command called `ascii-4col.txt` in my personal `bin/` folder that prints this out:
It's neat because it's the only command I have that uses `tail` for the shebang line.
dveeden2
Also easy to see why Ctrl-D works for exiting sessions.
rbanffy
This is also why the Teletype layout has parentheses on 8 and 9 unlike modem keyboards that have them on 9 and 0 (a layout popularised by the IBM Selectric). The original Apple IIs had this same layout, with a “bell” on top of the G.
show comments
gpvos
Back in early times, I used to type ctrl-M in some situations because it could be easier to reach than the return key, depending on what I was typing.
seyz
This is why Ctrl+C is 0x03 and Ctrl+G is the bell. The columns aren't arbitrary. They're the control codes with bit 6 flipped. Once you see it, you can't unsee it. Best ASCII explainer I've read.
If Ctrl sets bit 6 to 0, and Shift sets bit 5 to 1, the logical extension is to use Ctrl and Shift together to set the top bits to 01. Surely there must be a system somewhere that maps Ctrl-Shift-A to !, Ctrl-Shift-B to " etc.
show comments
ezekiel68
I love this stuff. It's the kind of lore that keeps getting forgotten and re-discovered by swathes of curious computer scientists over the years. So easy to assume many of the old artifacts (such as the ASCII table) had no rhyme or reason to them.
renox
I still find weird that they didn't make A,B... just after the digits, that would make binary to hexadecimal conversion more efficient..
show comments
mac3n
credit to William Crosby, "Note on an ASCII-Octal Code Table", CACM 8.10, Oct 1965
anyone remember 005 ENQ (also called WRU who are you) and its effect on a teletype?
meken
Very cool.
Though the 01 column is a bit unsatisfying because it doesn’t seem to have any connection to its siblings.
y42
first I was like "What but why? You don't save any space or what's that excercise about" then I read it again and it blew my mind. I thought I knew everything about ASCII. What a fool I am, Sokrates was right. Always.
msarnoff
On early bit-paired keyboards with parallel 7-bit outputs, possibly going back to mechanical teletypes, I think holding Control literally tied the upper two bits to zero. (citation needed)
Also explains why there is no difference between Ctrl-x and Ctrl-Shift-x.
joshcorbin
Just wait until someone finally gets why CSI ( aka the “other escape” from the 8-bit ansi realm, which is now eternalized in unicode C1 block ) is written ESC [ in 7-bit systems, such as the equally now eternal utf-8 encoding
SUDEEPSD25
Love this!
timonoko
where does this character set come from? It looks different on xterm.
for x in range(0x0,0x20): print(chr(x),end=" ")
show comments
Aardwolf
Imho ascii wasted over 20 of its precious 128 values on control characters nobody ever needs (except perhaps the first few years of its lifetime) and could easily have had degree symbol, pilcrow sign, paragraph symbol, forward tick and other useful symbols instead :)
I have lived my whole professional life with this being 'beyond obvious'... It's hard to imagine a generation where it's not. But then again, I did work with EBCDIC for awhile and we were reading and translating ASCII log tapes (ITT/Alcatel 1210 switch, phone calls, memory dumps).
I once got drunk with my elderly unix supernerd friend and he was talking about TTYs and how his passwords contained embedded ^S and ^Q characters and he traced the login process to learn they were just stalling the tty not actually used to construct the hash. No one else at the bar got the drift. He patched his system to put do 'raw' instead of 'cooked' mode for login passwords. He also used backspaces ^? ^H as part of his passwords. He was a real security tiger. I miss him.
It doesn't seem to have been mentioned in the comments so far, but as a floppy-disk era developer I remember my mind was blown by the discovery that DEL was all-bits-set because this allowed a character on paper tape and punched card to be deleted by punching any un-punched holes!
For me was interesting that all digits in ASCII starts with 0x3, eg. 0x30 - 0, 0x31 - 1, ..., 0x39 - 9. I thought it was accidental, but in real it was intended. This was giving possibility to build simple counting/accounting machines with minimal circuit logic with BCD (Binary Coded Decimals). That was wow for me ;)
This is by design, so that case conversion and folding is just a bit operation.
The idea that SOH/1 is "Ctrl-A" or ESC/27 is "Ctrl-[" is not part of ASCII; that idea comes from they way terminals provided access to the control characters, by a Ctrl key that just masked out a few bits.
If Unicode had used a full 32 bits from the start, it could have usefully reserved a few bits as flags that would divide it into subspaces, and could be easily tested.
Imagine a Unicode like this:
8:8:16
- 8 bits of flags. - 8 bit script family code: 0 for BMP. - 16 bit plane for every script code and flag combination.
The flags could do usefuil things like indicate character display width, case, and other attributes (specific to a script code).
Unicode peaked too early and applied an economy of encoding which rings false now in an age in which consumer devices have two digit gigabyte memories, multi terabyte of storage, and high definition video is streamed over the internet.
For whatever reason, there are extraordinarily few references that I come back to over and over, across the years and decades. This is one of them.
I made an interactive viewer some time ago (scroll down a bit):
https://blog.glyphdrawing.club/the-origins-of-del-0x7f-and-i...
It really helps understand the logic of ASCII.
Some of this elegance discussed from a programmatic point of view
https://www.pixelbeat.org/docs/utf8_programming.html
I came across this a week ago when I was looking at some LLM generated code for a ToUpper() function. At some point I “knew” this relationship, but I didn’t really “grok” it until I read a function that converted lowercase ascii to uppercase by using a bitwise XOR with 0x20.
It makes sense, but it didn’t really hit me until recently. Now, I’m wondering what other hidden cleverness is there that used to be common knowledge, but is now lost in the abstractions.
I have a command called `ascii-4col.txt` in my personal `bin/` folder that prints this out:
https://github.com/jez/bin/blob/master/ascii-4col.txt
It's neat because it's the only command I have that uses `tail` for the shebang line.
Also easy to see why Ctrl-D works for exiting sessions.
This is also why the Teletype layout has parentheses on 8 and 9 unlike modem keyboards that have them on 9 and 0 (a layout popularised by the IBM Selectric). The original Apple IIs had this same layout, with a “bell” on top of the G.
Back in early times, I used to type ctrl-M in some situations because it could be easier to reach than the return key, depending on what I was typing.
This is why Ctrl+C is 0x03 and Ctrl+G is the bell. The columns aren't arbitrary. They're the control codes with bit 6 flipped. Once you see it, you can't unsee it. Best ASCII explainer I've read.
Related. Others?
Four Column ASCII (2017) - https://news.ycombinator.com/item?id=21073463 - Sept 2019 (40 comments)
Four Column ASCII - https://news.ycombinator.com/item?id=13539552 - Feb 2017 (68 comments)
If Ctrl sets bit 6 to 0, and Shift sets bit 5 to 1, the logical extension is to use Ctrl and Shift together to set the top bits to 01. Surely there must be a system somewhere that maps Ctrl-Shift-A to !, Ctrl-Shift-B to " etc.
I love this stuff. It's the kind of lore that keeps getting forgotten and re-discovered by swathes of curious computer scientists over the years. So easy to assume many of the old artifacts (such as the ASCII table) had no rhyme or reason to them.
I still find weird that they didn't make A,B... just after the digits, that would make binary to hexadecimal conversion more efficient..
credit to William Crosby, "Note on an ASCII-Octal Code Table", CACM 8.10, Oct 1965
https://dl.acm.org/doi/epdf/10.1145/365628.365652
also defined 6-bit ASCII subset
anyone remember 005 ENQ (also called WRU who are you) and its effect on a teletype?
Very cool.
Though the 01 column is a bit unsatisfying because it doesn’t seem to have any connection to its siblings.
first I was like "What but why? You don't save any space or what's that excercise about" then I read it again and it blew my mind. I thought I knew everything about ASCII. What a fool I am, Sokrates was right. Always.
On early bit-paired keyboards with parallel 7-bit outputs, possibly going back to mechanical teletypes, I think holding Control literally tied the upper two bits to zero. (citation needed)
Also explains why there is no difference between Ctrl-x and Ctrl-Shift-x.
Just wait until someone finally gets why CSI ( aka the “other escape” from the 8-bit ansi realm, which is now eternalized in unicode C1 block ) is written ESC [ in 7-bit systems, such as the equally now eternal utf-8 encoding
Love this!
where does this character set come from? It looks different on xterm.
for x in range(0x0,0x20): print(chr(x),end=" ")
Imho ascii wasted over 20 of its precious 128 values on control characters nobody ever needs (except perhaps the first few years of its lifetime) and could easily have had degree symbol, pilcrow sign, paragraph symbol, forward tick and other useful symbols instead :)