Discussion
davidsojevic: I suspect part of the issue is that people are still using things like `acme.com` and `demo.com` as an example domain in their documentation and tests instead of relying on `example.com` which is reserved exactly for this purpose [0][0]: https://www.iana.org/domains/reserved
Frieren: > The LLM companies are not picking on me in particular, they are pounding every site on the net.Why is not this a criminal offense? They are hurting business for profit (or for higher valuation as they probably have no profit at all).Why are corporations allowed to do with impunity what could land even a teenager years in prison? Is there no rule of law anymore?The five-year and ten-year penalties kick in only when the government can show the offense caused at least $5,000 in losses across all victims during a one-year period. https://legalclarity.org/what-are-the-punishments-for-a-ddos...
budududuroiu: Normative vs prerogative state [1]. See US v. Swartz compared to Meta use of LibGen for Llama[1] https://en.wikipedia.org/wiki/Dual_state_(model)
avazhi: Is what an offence lol? Bot scraper traffic?How do you think search engines work?
avazhi: > Someone really ought to do something about it.What is bro proposing here?
chupchap: Bot traffic is crazy even for smaller sites, but still manageable. I was getting 2,000 visitors a day on my infrequently updated website, but after I blocked all the bots via Cloudflare it went back to the normal double digit visitor count.
kristianp: > Nearly all of them were for non-existent pages.Do any webservers have a feature where they keep a list in memory of files/paths that exist?
will4274: It's a bit more like a physical business with a "public welcome" policy like a coffee shop going viral and then having tens of thousands of people walking in and taking pictures but not buying coffee. It's disruptive, but not illegal.Acme.com is welcome to require authentication for all pages but their home page, which would quickly cause the traffic to drop. They don't want to do this - like the coffee shop, they want to be open to public, and for good reasons.Sometimes the use profile changes dramatically in a short time. 15 years ago, Netflix created the video streaming market and shared bandwidth capacity that had been excessive before wasn't enough. 15 years before that, Google did the same thing when they created search and started driving tremendous traffic to text based websites which had spread through word of mouth before.Turns out the micro transaction people probably had the right idea.
arjie: The only real solution is to put Anubis in front. For me, I just use Cloudflare in front and that suffices. But it's only a few thousand per hour by default. My homeserver can handle that quite well on its own.
dannyobrien: So, I knew Aaron and I definitely would not presume to predict what he would have thought, but I’d point out there is a sizeable state space where he should never have been prosecuted, and scraping by others including large commercial companies should not prosecutable on the same grounds.I repeat what Aaron’s friends and lawyers said at the time: we were going to fight that case, and we were going to win.
maplethorpe: > Why are corporations allowed to do with impunity what could land even a teenager years in prison? Is there no rule of law anymore?Those laws are intended to protect corporations. If corporations are the ones doing the scraping, it doesn't make sense for the same laws to affect them.
spiderfarmer: I have 6M pages across 8 domains. I have 10 unique IP residential bots per second working hard to scrape every single page.
spiderfarmer: They work because they offer ways to opt out, they honor crawl delay, setting ideal scraping times, IndexNow, etc.And they give you real, valuable traffic in return.
spiderfarmer: I have added a DB replica server just to keep my website from succumbing to AI bot traffic.