Discussion
Protect your contentfrom scrapers and AI bots
dec0dedab0de: Reminds me of when AOL broke all the script kiddy tools in 1996 by adding an extra space to the title of the window. I didn't have AOL, but my friend made one of those tools, and I helped him figure it out.
dwa3592: Nice. I have been working on something which utilizes obfuscation, honeypots etc and I have come to a few realizations-- today you don't have to be a dedicated/motivated reverse engineer- you just need Sonnet 4.6 and let it do the work.- you need to throw constant/new gotchas to LLMs to keep them on their tows while they try to reverse engineer your website.
lich_king: You break highlighting and copy-and-paste. If I want to share or comment on a piece of your website... I can't. I guess this can be a "feature" in some rare cases, but a major usability pain otherwise.
well_ackshually: I too, hate people that:* Copy text* use a screen reader for accessibility purposes (not just on the web, but on mobile too. Your 'light' obfuscation is entirely broken with TalkBack on Android. individual words/characters read, text is not a single block)* use an RSS feed* use reader mode in their browserIf you don't want your stuff to be read, and that includes bots, don't put it online.
kevinsync: I'm surprised that you don't appear to be using it on obscrd.dev lol
mystraline: This is also what Facebook does.Same result: screen readers and assistive software is rendered useless. Basically is a sign of "I hate disabled people, and AI too"
larsmosr: Fair concern. obscrd actually preserves screen reader access. CSS flexbox order is a visual reordering property, so assistive tech follows the visual order and reads the text correctly. Contact components use sr-only spans with clean text and aria-hidden on the obfuscated layer. We target WCAG 2.2 AA compliance.Happy to have a11y experts poke at it and point out gaps.
PaulHoule: Accessibility APIs have long been the royal road to automation. If scrapers were well-written they'd be using this already, but of course if scrapers were well-written they would scrape your site and you'd never notice.
GaryBluto: > Your content, obscured.Is that supposed to be a good thing?
gzread: Another thing you can do is to install a font with jumbled characters: "a" looks like "x", "b" looks like "n", and so on. Then instead of writing "abc" you write "jmw" and it looks like "abc" on the screen. This has been used as a form of DRM for eBooks.It breaks copy/paste and screen readers, but so does your idea.
larsmosr: Font remapping is actually on the v2 roadmap. The reason v1 uses CSS ordering instead is it preserves screen reader access. Tradeoff is it's reversible (as another commenter just showed). Font remapping is stronger but breaks assistive tech. Solving both is the hard problem.
obsrcdsucks: function decodeObscrd(htmlOrElement) { let root; if (typeof htmlOrElement === 'string') { root = new DOMParser().parseFromString(htmlOrElement, 'text/html').body; } else { root = htmlOrElement || document; } const container = root.querySelector('[class*="obscrd-"]'); if (!container) { return; } const words = [...container.children].filter(el => el.hasAttribute('data-o')); words.sort((a, b) => +a.dataset.o - +b.dataset.o); const result = words.map(word => { const chars = [...word.querySelectorAll('[data-o]')] .filter(el => el.querySelector('[data-o]') === null); chars.sort((a, b) => +a.dataset.o - +b.dataset.o); return chars.map(c => c.textContent).join(''); }).join(''); console.log(result); return result; }
larsmosr: Yep, that works. The data-o attributes are readable in the DOM so you can reverse it with custom code. That's in the threat model. The goal is raising the cost from "curl + cheerio" to "write a custom decoder per site." Most scrapers move on to easier targets.
verse: couldn't read the hero text on my phoneit's white text and the shader background is also mostly white
larsmosr: Thanks, what phone/browser? I'll fix that.
larsmosr: For content you want public, no.
larsmosr: The TalkBack issue is useful feedback, thank you. I tested with NVDA and VoiceOver but not TalkBack on Android. If light mode is reading individual words instead of a continuous block that's a real bug I want to fix.On the broader point, I hear you, but I think there's a middle ground. Not all content is public knowledge. Some of it is premium, proprietary, or behind a paywall. The people publishing it should get to decide whether it becomes free training data.
larsmosr: Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3;Official OpenAI documentation: https://platform.openai.com/docs/gptbot
yesitcan: The irony of building an anti-AI project but writing your marketing and HN post with AI.