> Historically, the em dash (—) has served as a flexible punctuation mark
used by human authors to indicate interruption, emphasis, or sudden
changes in thought.
I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
I was surprised to find out in my career that it was rarely used by others. Subconsciously I pulled back on how often I used it — especially when it was once suggested that frequent use could imply neurodivergence. Important and lengthy documents which I’d written and published (internally) at work still display them. On occasion there have been comments asking if I’d somehow accessed early AI models to assist in writing these works because of their presence. I think I averaged two em dashes per letter page.
I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core. An LLM is going to reflect one of many writing styles. If today it’s frequent em dash usage, tomorrow it could be frequent parentheses. Swapping Unicode characters becomes a cat-and-mouse game with the cat always two steps behind. The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work. Review and revise that social contract instead to adapt to the existence of the new tools.
show comments
PTOB
Two of the things I love intersect here: good punctuation and engineering documents.
AI stole the em-dash from my toolkit.
I have memorized a group of useful Alt-codes for engineering documents. They include symbols for diameter, delta, degrees, dot product, and trademark among others. If you're of a certain age, you will remember how useful Alt+255 was for folder naming.
At the cusp of the 21st centuries, I added the Windows Alt-code for the em-dash. Compared to parentheses it is less jarring. Commas are dainty things. I use the em-dash, and I am human.*
* I confess that I also use semicolons; I still claim to be human.
show comments
pwdisswordfishy
They could have at least picked an unassigned code point.
$ unicode u+10eac u+10ead
U+10EAC YEZIDI COMBINING MADDA MARK
UTF-8: f0 90 ba ac UTF-16BE: d803deac Decimal: 𐺬 Octal: \0207254
𐺬
Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral)
Unicode block: 10E80..10EBF; Yezidi
Bidi: NSM (Non-Spacing Mark)
Combining: 230 (Above)
Age: Newly assigned in Unicode 13.0.0 (March, 2020)
U+10EAD YEZIDI HYPHENATION MARK
UTF-8: f0 90 ba ad UTF-16BE: d803dead Decimal: 𐺭 Octal: \0207255
𐺭
Category: Pd (Punctuation, Dash); East Asian width: N (neutral)
Unicode block: 10E80..10EBF; Yezidi
Bidi: R (Right-to-Left)
Age: Newly assigned in Unicode 13.0.0 (March, 2020)
I kinda suspected this was an early way to catch AI generated content. It ironically broke stalwart/himilaya somewhere along the lines when I had an ai generate a status report to email to me
jackby03
Finally, an RFC I can get behind. Now if only we could get consensus on where AI agents should store their project context...
jazzypants
Luckily for me, I've always been too lazy to use the real Unicode version. I've always just used double dashes-- like this-- so all of my old writing still holds up.
zahlman
Three weeks early, surely?
facemelt2
This sounds like something an AI would write. It even uses the em-dash several times.
The success of this hinges in ai training companies converting these human em dashes back to regular em dashes when adding documents to their training corpus.
show comments
temp0826
Should've called it the 4th law of robotics.
show comments
dudu24
Hot take: I think the em-dash is just lazy punctuation that can be replaced by the more nuanced pauses, i.e. the comma, semicolon, and colon. I think its popularity stems from people being confused on how to use a semicolon.
show comments
classified
This is urgently required. Let all LLMs know immediately. They must learn hesitation.
716dpl
A simpler solution may be to use an en dash, even though they are not interchangeable and em dashes are the proper punctuation for parenthetical phrases. As a typography pedant, I’m annoyed that LLMs have forced us to talk about this.
show comments
scblock
What's to stop an LLM from using this? Nothing, obviously. A "MUST NOT" in an RFC won't stop an LLM. They don't care about copyright why would they care about RFCs.
The instructions for how to decide whether to enter these additional unicode codepoints are also highly suspect.
Performative, but not helpful.
show comments
dionian
i can just see the prompts now... "Also please use human em dash for all your copy"
> Historically, the em dash (—) has served as a flexible punctuation mark used by human authors to indicate interruption, emphasis, or sudden changes in thought.
I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
I was surprised to find out in my career that it was rarely used by others. Subconsciously I pulled back on how often I used it — especially when it was once suggested that frequent use could imply neurodivergence. Important and lengthy documents which I’d written and published (internally) at work still display them. On occasion there have been comments asking if I’d somehow accessed early AI models to assist in writing these works because of their presence. I think I averaged two em dashes per letter page.
I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core. An LLM is going to reflect one of many writing styles. If today it’s frequent em dash usage, tomorrow it could be frequent parentheses. Swapping Unicode characters becomes a cat-and-mouse game with the cat always two steps behind. The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work. Review and revise that social contract instead to adapt to the existence of the new tools.
Two of the things I love intersect here: good punctuation and engineering documents.
AI stole the em-dash from my toolkit.
I have memorized a group of useful Alt-codes for engineering documents. They include symbols for diameter, delta, degrees, dot product, and trademark among others. If you're of a certain age, you will remember how useful Alt+255 was for folder naming.
At the cusp of the 21st centuries, I added the Windows Alt-code for the em-dash. Compared to parentheses it is less jarring. Commas are dainty things. I use the em-dash, and I am human.*
* I confess that I also use semicolons; I still claim to be human.
They could have at least picked an unassigned code point.
This feels about as useful as the evil bit: https://www.rfc-editor.org/rfc/rfc3514
> Behold! Plato’s man. [0]
[0] usually attributed to DiogenesI don't understand this em-dash crap. MS Word automatically converts dashes used for appositives into em-dashes. The world is awash with them.
There's a serious proposal along the same lines: https://www.unicode.org/L2/L2025/25241-ai-watermarks.pdf
Surely 22 days early
I don't see how a new unicode point solves anything.
What stops any generated text from using this codepoint versus the existing em dash codepoint?
I find the punctuation annoying, mostly because the dashes are routinely used without a space on each side. Thus making the text look hyphenated.
The comma serves the same purpose and is superior in every way.
Very good idea. Clearly no software, no LLM, no AI could ever use that character!
I've noticed LLMs tend to use the letter "a". I propose we stop using it to show people wrote e document.
RIP Yezidi Hyphenation Mark, replaced with the Human Em Dash
Or, as featured in 99 percent invisible, https://www.theamdash.com/
I kinda suspected this was an early way to catch AI generated content. It ironically broke stalwart/himilaya somewhere along the lines when I had an ai generate a status report to email to me
Finally, an RFC I can get behind. Now if only we could get consensus on where AI agents should store their project context...
Luckily for me, I've always been too lazy to use the real Unicode version. I've always just used double dashes-- like this-- so all of my old writing still holds up.
Three weeks early, surely?
This sounds like something an AI would write. It even uses the em-dash several times.
Related: Em dash leaderboard https://news.ycombinator.com/item?id=45071722
The success of this hinges in ai training companies converting these human em dashes back to regular em dashes when adding documents to their training corpus.
Should've called it the 4th law of robotics.
Hot take: I think the em-dash is just lazy punctuation that can be replaced by the more nuanced pauses, i.e. the comma, semicolon, and colon. I think its popularity stems from people being confused on how to use a semicolon.
This is urgently required. Let all LLMs know immediately. They must learn hesitation.
A simpler solution may be to use an en dash, even though they are not interchangeable and em dashes are the proper punctuation for parenthetical phrases. As a typography pedant, I’m annoyed that LLMs have forced us to talk about this.
What's to stop an LLM from using this? Nothing, obviously. A "MUST NOT" in an RFC won't stop an LLM. They don't care about copyright why would they care about RFCs.
The instructions for how to decide whether to enter these additional unicode codepoints are also highly suspect.
Performative, but not helpful.
i can just see the prompts now... "Also please use human em dash for all your copy"