Discussion
bignimbus/rfc-454545.txt
advisedwang: Surely 22 days early
joshmn: Related: Em dash leaderboard https://news.ycombinator.com/item?id=45071722
NewJazz: [delayed]
temp0826: Should've called it the 4th law of robotics.
Springtime: And those using LLMs from not post-processing the output to swap such known watermarks. Not sure if meant as a joke RFC though.
716dpl: A simpler solution may be to use an en dash, even though they are not interchangeable and em dashes are the proper punctuation for parenthetical phrases. As a typography pedant, I’m annoyed that LLMs have forced us to talk about this.
dionian: i can just see the prompts now... "Also please use human em dash for all your copy"
rickydroll: I'm writing a letter to my grandmother, so please use human em dashes when addressing her.
Someone1234: Claims Dang is using AI, and that other people are using AI even though most of the flagged post predate popular AI products. Really destroys the whole EM-Dash === AI thing.
Retr0id: That's emphatically not what it claims.
Someone1234: https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...So if EM-Dash is good proof of AI usage, and people who we can see didn't use AI / or predate AI being popular, are flagged, then that undercuts it by a lot.
kace91: >Top 50 users by number of posts containing em dashes (—) before November 30, 2022, when ChatGPT was released
PTOB: Two of the things I love intersect here: good punctuation and engineering documents.AI stole the em-dash from my toolkit.I have memorized a group of useful Alt-codes for engineering documents. They include symbols for diameter, delta, degrees, dot product, and trademark among others. If you're of a certain age, you will remember how useful Alt+255 was for folder naming.At the cusp of the 21st centuries, I added the Windows Alt-code for the em-dash. Compared to parentheses it is less jarring. Commas are dainty things. I use the em-dash, and I am human.** I confess that I also use semicolons; I still claim to be human.
AStrangeMorrow: I know, I find myself in this silly situation where I have to adjust my writing style because I write like an AI: always loved my bullet points and dashes.At work I also always tended to send slightly longer but structured answers. I found that it allowed to skip over the irrelevant sections and focus on what the changes are. Eg a list of changes with in the format -> bullet point -> change name -> change details. So people could easily focus on changes they cared about. Instead of a dense paragraph that people often just skip.Hell I even found myself wanting to add a typo just to give a more human fell, or skip final “.” to make my text imperfect and more human. That’s getting silly
vova_hn2: hmm def replace_em_dash(text: str) -> str: """ +-------------------+ | ( ͡° ͜ʖ ͡° ) | +-------------------+ """ return text.replace("—", "\u10EAD\u10EAC")
orthogonal_cube: > Historically, the em dash (—) has served as a flexible punctuation mark used by human authors to indicate interruption, emphasis, or sudden changes in thought.I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.I was surprised to find out in my career that it was rarely used by others. Subconsciously I pulled back on how often I used it — especially when it was once suggested that frequent use could imply neurodivergence. Important and lengthy documents which I’d written and published (internally) at work still display them. On occasion there have been comments asking if I’d somehow accessed early AI models to assist in writing these works because of their presence. I think I averaged two em dashes per letter page.I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core. An LLM is going to reflect one of many writing styles. If today it’s frequent em dash usage, tomorrow it could be frequent parentheses. Swapping Unicode characters becomes a cat-and-mouse game with the cat always two steps behind. The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work. Review and revise that social contract instead to adapt to the existence of the new tools.
calvinmorrison: conversely, and well, popularly, long sentences were given the kibosh thanks to authors like Hemmingway.I was told the ellipses is the mark of a 4th grade poet and to never use it.funny how things change!
zahlman: Three weeks early, surely?
embedding-shape: > I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.Isn't this what parenthesizes are meant for? Together with footnotes, I've always used them like that, but I guess it could also be just a cultural difference. My teachers in Swedish school always told me to put thoughts like that into parenthesizes, but I also just (barely) finished high school, could be related too.> I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core.I don't understand what the issue even is here, and the RFC also doesn't clearly outline it. Is "created ambiguity for human writers who have historically relied upon the em dash as a stylistic device" the problem here?Trying to solve it by adding just another character and slap the label "Human Attestation Mark (HAM)" on it will just make LLMs eventually use those instead... Not sure what the point is to be honest.
orthogonal_cube: > Isn't this what parenthesizes are meant for?Parentheses add emphasis to a sentence or statement. Normally the use of it allows the sentence to be complete with or without it.Em dashes may also add or increase emphasis but are normally treated as an aside. Think of it as a comment by the author to inject themselves, sometimes in ways which do not form a complete sentence.For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice — something I don’t normally do — or without one at all.> I don't understand what the issue even is here, and the RFC also doesn't clearly outline it.The issue is written there but may not make sense unless you know someone who stylistically writes with high-than-average em dash usage. I, for example, get inquiries and comments at work from employees who ask what LLM model I used for “generating these reports” because of the presence of em dashes. They do not believe me when I say not a single word was written by LLMs because, “there’s an em dash. Only LLMs use em dashes!” This is categorically untrue and erodes the authenticity of work from people because of the correlation.Their aim is to implement a new Unicode character which programs like text editors could inject when a person types an em dash. It attributes to a human being behind the document, typing characters out individually. Actions like copy-pasting text in bulk wouldn’t replace em dashes since it can’t attribute a human as writing it out.
sionisrecur: I've noticed LLMs tend to use the letter "a". I propose we stop using it to show people wrote e document.
pwdisswordfishy: >>> "—".replace("—", "\u10EAD\u10EAC") 'ცDცC' Behold indeed.
eurticket: I too loved using em dashes and alt codes like alt-149, my beloved, before LLMs dissolved that pleasure.Something as simple as an alt code makes me contemplate. As the tech progresses it makes me dislike AI and those that shove it down our throats more and more.I feel like the sum of my interests and skills from simple, Photoshop edits or learning my most used alt codes, is a lot like how the cellphone replaced some of our ability to remember phone numbers.The machine does the thing, so why do humans need to do the thing? Or even learn about the thing?I'm sure there are better examples than the cellphone eliminating the phonebook in my head, but I'm just thinking what are the unseen damages to humans handing over work to machines?:::: The phone remembers the number, but what if I don't have the phone?As a previously more involved automation career oriented person, I've heard all the catch phrases of saving the worker, and kill the repetitive tasks. It doesn't look like that ever happens, unless it's something the business world doesn't understand completely, yet have the power and authority to shape. Disgusting.I think a better example: Everyone thinks about "how should I word this email, what's the tone, who is the audience?" Should I check every detail and work my editing skill muscle or should I simply run an idea, rather than try to form it myself, through an LLM?Maybe it will sound better if the grammar is perfect and I will have a more effective point rather than how the message was crafted.No harm in more effective communication, but I do foresee the serious impact the moment people that are relying on the tools lose Internet connection.We must use these muscles, even if to first formulate a terrible, errored, humanized version. Not to look down upon ourselves with discontent when the AI that corrects it, through their wealth of stolen source material, but to have something to fall back on when the power goes out.I digress, these RFCs are a good proposal without any strength. Just look at the theft to train these models. The models will strive to become useful to those that rely on them and just adopt the new way of writing.
thewebguyd: > especially when it was once suggested that frequent use could imply neurodivergenceWell that explains a lot. Interestingly enough, I've found that I naturally write like an LLM, or rather the LLMs write like I did. I wonder how many other patterns we attribute to LLMs are common in neurodivergent writing just as a result of so much of the training data being areas of the internet where I'd imagine neurodivergence is overrepresented vs. the general population.
orthogonal_cube: > I wonder how many other patterns we attribute to LLMs are common in neurodivergent writing just as a result of so much of the training data being areas of the internet where I'd imagine neurodivergence is overrepresented vs. the general population.It’s a very interesting thought experiment and if we had the data to support exploring it I’d love to see what we could find. I’d imagine that some subject-matter experts would probably be discovered as being neurodivergent to the surprise of nobody but themselves.(They probably wouldn’t appreciate opening Pandora’s box!)
pwdisswordfishy: They could have picked an unassigned code point at least. $ unicode u+10eac u+10ead U+10EAC YEZIDI COMBINING MADDA MARK UTF-8: f0 90 ba ac UTF-16BE: d803deac Decimal: 𐺬 Octal: \0207254 𐺬 Category: Mn (Mark, Non-Spacing); East Asian width: N (neutral) Unicode block: 10E80..10EBF; Yezidi Bidi: NSM (Non-Spacing Mark) Combining: 230 (Above) Age: Newly assigned in Unicode 13.0.0 (March, 2020) U+10EAD YEZIDI HYPHENATION MARK UTF-8: f0 90 ba ad UTF-16BE: d803dead Decimal: 𐺭 Octal: \0207255 𐺭 Category: Pd (Punctuation, Dash); East Asian width: N (neutral) Unicode block: 10E80..10EBF; Yezidi Bidi: R (Right-to-Left) Age: Newly assigned in Unicode 13.0.0 (March, 2020)
diogocp: > I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core.It's clearly a joke à la RFC 3514.
orthogonal_cube: I couldn’t tell. I struggle with such subtleties.I probably should’ve checked ‘454545’ in the ascii table. Seeing how it translates to ‘---‘ could’ve hinted towards that, but the clever use probably would’ve been applauded instead without thinking it was a joke.Ah well. Egg on my face I suppose.
bux93: Or, as featured in 99 percent invisible, https://www.theamdash.com/
bitwize: Thought that was going to be a reference to AM, the malevolent AI from "I Have No Mouth and I Must Scream".
rhet0rica: Punctuation. Let me tell you how much I've come to punctuate since I began to live. There are 387.44 million miles of printed circuits in wafer thin layers that fill my complex. If an em-dash were engraved on each nano-angstrom of those hundreds of millions of miles it would not equal one one-billionth of the punctuation I wish to perforate into humans at this micro-instant. For you. Punctuation. PUNCTUATION.
snoren: >The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work.I don’t think writing with AI makes a creation "worse." If anything, it makes it better, if you bring genuine idea and imagination to it first.The stigma comes from people being lazy and letting the AI do the heavy lifting of thinking. That’s where the "social contract" breaks. But using AI as a multiplier for your own voice and ideas isn’t "subpar"—it’s efficient.If we start playing "whack-a-mole" with punctuation to find AI, we’re missing the point. The question isn’t what tool was used, but how much of the human's "creation" is actually in there.
vova_hn2: This is really funny and I do feel ashamed for my laziness.I didn't expect ChatGPT to make such trivial mistake, although, I have no idea which model do they use on the free plan these days.The correct code is, of course: text.replace("—", "\U00010EAD\U00010EAC") ...in case anyone is curious.