Discussion
keane: Beautiful work! This is an amazing resource to have online. Reminds me a little of greensdictofslang.com or of Webster’s 1913, a perennial HN favorite: https://news.ycombinator.com/item?id=29733648
yodon: The most important entry I found in my physical copy of the 1911 Britannica is for Eavesdropping[0], detailing the original historical origins of the term and how it was thought about just before our modern era.> Though the offence of eavesdropping still exists at common law, there is no modern instance of a prosecution or indictment.Thanks for posting this resource, I've often wanted to share a link to this and other entries.[0]https://britannica11.org/article/08-0867-eavesdrip/eavesdrip...
robin_reala: A seriously trivial bug report, but the font you’ve chosen doesn’t support ℔, making articles like https://britannica11.org/article/22-0688-s2/putting_the_shot look odd. Potentially might be worth rewriting ℔ to a more normal (these days) lb?
ahaspel: I rebuilt the 1911 Encyclopædia Britannica into a clean, structured, navigable site:https://britannica11.org/What it does:– ~37k articles reconstructed from the original volumes – section-level structure (contents are clickable within articles) – cross-references extracted and linked – contributors indexed and searchable – original volume + page references preserved and shown while reading – links to the original scans for each page – ancillary material included (prefaces, abbreviations, etc.) – topic index reproduced and cross-linked – full-text search with article metadata (length, volume, etc.)Most of the work was in parsing and reconstruction: headings, multi-page articles, tables, math, languages, footnotes, plates, and all the small edge cases that come up in a work like this.The goal was to make something that feels like the original, but is actually usable.I’d especially appreciate feedback on: – search quality – navigation (sections, cross-references) – anything that looks structurally offHappy to answer questions about the pipeline or data model
gnerd00: legal terms question here also -- several major world economies are operating under very different rules regarding datasets and publication rights. I am in the USA / California.. will there be terms for me, given that I am not a giant deep-pockets FAANG, just a book person ? commercial use terms for "small business" scale ?
ahaspel: The 1911 text itself is public domain, so anyone is free to use it.What I’ve built here is a structured edition — the parsing, reconstruction, linking, indexing, etc. I haven’t published a formal license for that yet.For casual or small-scale use there’s no issue at all. For bulk use (e.g. dataset / training / redistribution), I’d prefer people get in touch so I can figure out a sensible way to support that.
ahmedfromtunis: No entry on the Great War? Really?!!!Just kidding, of course. This is incredible and surprisingly nostalgic. Reading some of the entries took me right back to being a kid huddled in my room for hours pouring over an encyclopedia or even the dictionary.And I still vividly remember the rush of installing Encarta for the first time on the family PC.I couldn't believe that I, a mere kid, have now access to iconic historical footage and that I can watch anytime I felt like it. I can't describe how amazingly cool that felt at the time! It still gives me a hit of endorphins when I remember it today.
rustyhancock: I spent ages trying to work out if it would be possible to find a copy of the 2021 Encarta or Britannica.Pre LLM And post COVID and perhaps the best we can hope for before AI taints all the info.One of my prized possessions as a child was a CDROM based encyclopedia (well before the internet was common). I don't know why I liked it so much but on a rainy afternoon I'd kick up some of my favourite articles and read and learn more of them.
ahaspel: I know exactly what you mean — I had the same experience with CD-ROM encyclopedias. There’s something about just browsing and falling into articles that’s hard to replicate.Part of the motivation here was to bring that kind of exploration back, but with the original 1911 text and structure.
pawsocks: Do you happen to use a language model to translate or format your comments?
ahaspel: Just me. I spent a lot of time thinking about this, so I like talking about it.
neonscribe: You can discover beliefs that are shocking today, such as this excerpt from the article "Adolescence":"In the case of girls, let them run, leap and climb with their brothers for the first twelve years or so of life. But as puberty approaches, with all the change, stress and strain dependent thereon, their lives should be appropriately modified. Rest should be enforced during the menstrual periods of these earlier years, and milder, more graduated exercise taken at other times. In the same way all mental strain should be diminished. Instead of pressure being put on a girl’s intellectual education at about this time, as is too often the case, the time devoted to school and books should be diminished. Education should be on broader, more fundamental lines, and much time should be passed in the open air."
zozbot234: You can nowadays paste the text from pretty much anything that's in the public domain into a near-SOTA LLM such as Kimi or GLM and it will give you a pretty nice summary of what it's about in modern language (Extremely useful: the LLM tendency to go overboard on formatting nicely balances out the wall-of-text format from historical publications, which was aimed at saving paper and minimizing manual layout effort), and then gladly tell you about all the things in the historical text that would be absolutely beyond the pale today. (Sometimes you have to nudge it by prompting "How would this text be received today?" or something like it after it has put its nice summary in context, but once you do that it tends to be quite thorough.)
smallerize: You didn't really explain what that does for you. Why do you paste it into an LLM?
quamserena: You can also read the text yourself and draw your own conclusions...