Very, very cool. Hats off. I've considered attempting a more limited form of this for years.
For those who don't know, the 1911 Britannica is heralded for several reasons (and rightly criticized for regrettable others), but the most well-known is that it was the last encyclopedia before The Great War, and hence had a good amount of steam/optimism coming from the first and second industrial revolutions and the "Progressive Era", not sullied yet by thoughts of "the war to end all wars".
A question/idea for nice-to-haves, most respectfully. I don't know if it would be feasible. It's probably perfect as it is, simply linking to the image-page in unobtrusive text for each section. But I would love an option (emphasis on option) to see the text side by side with the page images. That parallel view would load all of the page images on the same page as the full article text. That way, I could "confirm" or "fact check" the faithfulness of the OCR, and also see the beautiful printing, at once, without opening each page separately and managing the images/windows myself. Most likely, I would use the site to jump to the articles, and read them mainly as images, only switching to the text form to verify what something said, or to copy-paste cleanly, etc. (As it is, initially, I thought I read the original images were available, but had to visit the page three (3!) times before finding where the side-links to them were.) Maybe thumbnails could be a middle-ground option (again, optional) for salience.
Very, very well done. And it's fast!
show comments
ahaspel
I rebuilt the 1911 Encyclopædia Britannica into a clean, structured, navigable site:
– ~37k articles reconstructed from the original volumes
– section-level structure (contents are clickable within articles)
– cross-references extracted and linked
– contributors indexed and searchable
– original volume + page references preserved and shown while reading
– links to the original scans for each page
– ancillary material included (prefaces, abbreviations, etc.)
– topic index reproduced and cross-linked
– full-text search with article metadata (length, volume, etc.)
Most of the work was in parsing and reconstruction: headings, multi-page articles, tables, math, languages, footnotes, plates, and all the small edge cases that come up in a work like this.
The goal was to make something that feels like the original, but is actually usable.
I’d especially appreciate feedback on:
– search quality
– navigation (sections, cross-references)
– anything that looks structurally off
Happy to answer questions about the pipeline or data model
show comments
neonscribe
You can discover beliefs that are shocking today, such as this excerpt from the article "Adolescence":
"In the case of girls, let them run, leap and climb with their brothers for the first twelve years or so of life. But as puberty approaches, with all the change, stress and strain dependent thereon, their lives should be appropriately modified. Rest should be enforced during the menstrual periods of these earlier years, and milder, more graduated exercise taken at other times. In the same way all mental strain should be diminished. Instead of pressure being put on a girl’s intellectual education at about this time, as is too often the case, the time devoted to school and books should be diminished. Education should be on broader, more fundamental lines, and much time should be passed in the open air."
show comments
spudlyo
I'm curious how the information is structured under the hood. I just recently learned about how folks in the digital humanities use the XML-TEI format for semantic markup of works like this. I've recently been exploring the Latin-English Lewis & Short dictionary encoded in XML-TEI.
I've had a ton of fun playing learning about BaseX and XQuery to ask questions like "Which classical authors are responsible for writing words that appear only once in the entire corpus (hapax legomena)" or "what are longest hapax words" (usually the funniest ones) and that kind of thing. Shout out to Tufts University for making this available!
I would love to be able to load the 1911 Britannica into BaseX and and see what interesting things I could learn about it via XQuery!
show comments
bentley
The first article I looked up was New Mexico, because I knew, as does anyone familiar with New Mexico history, that it became a state in January 1912 (before which it was a territory). Arizona also became a state, in February. I was surprised to find both described as states of the United States in this 1911 encyclopedia. I suppose the editors just made a confident guess? The last sentence of both articles is, “In June 1910 the President approved an enabling act providing for the admission of Arizona and New Mexico as separate states.”
shantara
Interesting how different both the tone and the structure of the articles are compared to the modern texts.
Take the article about Copenhagen as an example: https://britannica11.org/article/07-0111-copenhagen/copenhag...
The geography and key points of interest are described very accurately, but the authors aren’t shy about inserting emotionally charged adjectives and personal options on what they consider interesting or curious. Also, the huge portion about the Battle of Copenhagen in the bottom is a complete departure and shifts the genre from a geographical description to the shot-per-shot narration of a naval battle.
show comments
robin_reala
A seriously trivial bug report, but the font you’ve chosen doesn’t support ℔, making articles like https://britannica11.org/article/22-0688-s2/putting_the_shot look odd. Potentially might be worth rewriting ℔ to a more normal (these days) lb?
show comments
rustyhancock
I spent ages trying to work out if it would be possible to find a copy of the 2021 Encarta or Britannica.
Pre LLM And post COVID and perhaps the best we can hope for before AI taints all the info.
One of my prized possessions as a child was a CDROM based encyclopedia (well before the internet was common). I don't know why I liked it so much but on a rainy afternoon I'd kick up some of my favourite articles and read and learn more of them.
show comments
doctor_blood
Small world - I'm currently cleaning up scans of the EB 9th edition to put it online as a mediawiki site; I'm including all the illustrations and plates so I'm only a third of the way through.
I've been testing different OCR tools and so far I've been the most impressed with paddleOCR - it correctly split the text columns, labled the illustrations, and noted the maragin text.
Still, it's not perfect, so I'm having to hand-edit some tables. I plan to put the source pages online as well so you can switch between the scanned page and the electronic text.
show comments
entrepy123
Bravo. People who like the 1911 Encyclopedia Britannica might like https://OldEncyc.com to dig into the volumes (by letter range) of 22 editions of old encyclopedias dated 1728-1926 (though not searchable like the OP).
show comments
sammy2255
This encyclopedia is racist:
Mentally the negro is inferior to the white, The remark of F. Manetta, made after a long study of the negro in America, may be taken as generally true of the whole race: “the negro children were sharp, intelligent and full of vivacity, but on approaching the adult period a gradual change set in. The intellect seemed to become clouded, animation giving place to a sort of lethargy, briskness yielding to indolence.
ternaryoperator
I have the hard copy of this edition and it does contain some curious things.
For example, if you look up "boiling." You might expect to read about what happens to a liquid when it's heated to a certain temperature, or perhaps a way of cooking foods, or sterilizing equipment. But the entry covers none of those. Instead, the only entry for boiling describes a punishment for persons convicting of poisoning who were, in England, dipped into a large cauldron of boiling water.
And, in the ways that violence and torture were wantonly reveled in centuries ago, they wouldn't just submerge the criminal and let him die there. Instead, they would lower him into the boiling water for a while and then pull him out. They'd repeat the process until eventually they finally killed him. That is the EB 11 ed entry for boiling. Yow!
lkm0
A very simple addition that makes casual browsing much more fun is to add a menu with adjacent articles, as is done in this reconstruction of Littré's 19th century french dictionary: https://www.littre.org/ (see mots voisins)
Aardwolf
Very neat!
Some bugs I noticed:
Searching for Zurich allows you to go to the article for the canton of Zurich, not the city. Clicking the link "Zürich (city)" inside of this article, opens this same article again about the canton, rather than opening the actual article for the city
When viewing an article, the search for articles (leftmost search box) doesn't seem to work at all for me (in Firefox). When being on the main page, it does work
There's a small clickable 'home' button on the right, but muscle memory from how other websites work makes me expect that clicking the big title "Encyclopædia Britannica, 11th Edition" on the top left also goes to home
show comments
golem14
It's very insightful to look up fission, fusion, atom and find yourself ... definitely before the great war.
As a time travel machine for the mind, this is great!
It would also be an invaluable resource for any Dungeon Master aspiring to lead a campaign at the end of the 19th century (Sherlock Holmes, or PG Wodehouse style, as it were), as doubtless many here are ...
keane
Beautiful work! This is an amazing resource to have online. Reminds me a little of greensdictofslang.com or of Webster’s 1913, a perennial HN favorite: https://news.ycombinator.com/item?id=29733648
show comments
yodon
The most important entry I found in my physical copy of the 1911 Britannica is for Eavesdropping[0], detailing the original historical origins of the term and how it was thought about just before our modern era.
> Though the offence of eavesdropping still exists at common law, there is no modern instance of a prosecution or indictment.
Thanks for posting this resource, I've often wanted to share a link to this and other entries.
Some parts are ... amusing to read. For example the article on stars [0]...
"anything approaching a uniform distribution of the stars cannot extend Limits of the Universe. indefinitely. It can be shown that, if the density of distribution of the stars through infinite space is nowhere less than a certain limit (which may be as small as we please), the total amount of light received from them (assuming that there is no absorption of light in space) would be infinitely great, so that the background of the sky would shine with a. dazzling brilliancy ...."
This is good. I picked up a copy of the encyclopedia britannica from 1973 and quite enjoy browsing that rather than the internet. The articles seem well written, and as mentioned here, you have the fact and the history and everything all mixed in to some articles, and it's super interesting.
I highly recommend getting an old set of volumes.
orsenthil
Love this. I couldn't have imagined the quality of this Encyclopædia with this form that you have presented. Plus, the contributors! I love human race.
indigodaddy
Just as a random data point, I searched for Genghis and nothing came up. Was there not much knowledge on Genghis Khan in 1911 I wonder?
show comments
hax0ron3
Nice. Reading old books is a great way to be exposed to ways of thinking that have fallen out of fashion - some for (in my opinion) good reason, such as having been discovered to be incorrect or genuinely immoral, some for (in my opinion) bad reason, such as having become "politically incorrect", and some simply because they were forgotten.
But whatever the reason is why the ideas have fallen out of fashion, it can broaden the mind to encounter them.
peterldowns
I've been meaning to build ~exactly this experience, but for the 1952 Encyclopedia Brittanica Great Books of the World collection and its experimental index Syntopicon [0]. Would love to know more about how you OCR'd or otherwise ingested and parsed the raw material. I have a physical copy of the books, and I found some samizdat raw-image scans and started working on a custom OCR pipeline, but wondering if maybe I could learn from your approach...
Reading medical texts from 1911 is a great way to see how far psychiatry has advanced. there was a widespread medical and societal belief that masturbation was harmful to physical and mental health. https://britannica11.org/article/14-0628-insanity/insanity?q...
zeckalpha
Note that the subsequent 12th edition (1922) may be in public domain in your jurisdiction.
I wanted to let everyone know that article search from articles is now working properly again. A path problem. Apologies.
Quitschquat
Read the sections on nebula since this book predates the discovery of galaxies
throw253245235
Interesting that the articles on Euler and Gauss are so much shorter than the ones on Kant and Schopenhauer. I guess authors of Britannica were not very interested in mathematics.
bronlund
Interesting article about aether in there :)
ahmedfromtunis
No entry on the Great War? Really?!!!
Just kidding, of course. This is incredible and surprisingly nostalgic. Reading some of the entries took me right back to being a kid huddled in my room for hours pouring over an encyclopedia or even the dictionary.
And I still vividly remember the rush of installing Encarta for the first time on the family PC.
I couldn't believe that I, a mere kid, have now access to iconic historical footage and that I can watch anytime I felt like it. I can't describe how amazingly cool that felt at the time! It still gives me a hit of endorphins when I remember it today.
show comments
shevy-java
Already better than all AI wikipedias.
SilentM68
Do you have access to the original 1958 Edition of The Encyclopedia Americana Volume 2?
Just to confirm if this is real or Memorex or just another hoax?
Very, very cool. Hats off. I've considered attempting a more limited form of this for years.
For those who don't know, the 1911 Britannica is heralded for several reasons (and rightly criticized for regrettable others), but the most well-known is that it was the last encyclopedia before The Great War, and hence had a good amount of steam/optimism coming from the first and second industrial revolutions and the "Progressive Era", not sullied yet by thoughts of "the war to end all wars".
Trying https://britannica11.org specifically, it quickly found and displayed the article I searched for, chosen (to search for) at random: Portuguese East Africa, at https://britannica11.org/article/22-0177-portuguese-east-afr...
A question/idea for nice-to-haves, most respectfully. I don't know if it would be feasible. It's probably perfect as it is, simply linking to the image-page in unobtrusive text for each section. But I would love an option (emphasis on option) to see the text side by side with the page images. That parallel view would load all of the page images on the same page as the full article text. That way, I could "confirm" or "fact check" the faithfulness of the OCR, and also see the beautiful printing, at once, without opening each page separately and managing the images/windows myself. Most likely, I would use the site to jump to the articles, and read them mainly as images, only switching to the text form to verify what something said, or to copy-paste cleanly, etc. (As it is, initially, I thought I read the original images were available, but had to visit the page three (3!) times before finding where the side-links to them were.) Maybe thumbnails could be a middle-ground option (again, optional) for salience.
Very, very well done. And it's fast!
I rebuilt the 1911 Encyclopædia Britannica into a clean, structured, navigable site:
https://britannica11.org/
What it does:
– ~37k articles reconstructed from the original volumes – section-level structure (contents are clickable within articles) – cross-references extracted and linked – contributors indexed and searchable – original volume + page references preserved and shown while reading – links to the original scans for each page – ancillary material included (prefaces, abbreviations, etc.) – topic index reproduced and cross-linked – full-text search with article metadata (length, volume, etc.)
Most of the work was in parsing and reconstruction: headings, multi-page articles, tables, math, languages, footnotes, plates, and all the small edge cases that come up in a work like this.
The goal was to make something that feels like the original, but is actually usable.
I’d especially appreciate feedback on: – search quality – navigation (sections, cross-references) – anything that looks structurally off
Happy to answer questions about the pipeline or data model
You can discover beliefs that are shocking today, such as this excerpt from the article "Adolescence":
"In the case of girls, let them run, leap and climb with their brothers for the first twelve years or so of life. But as puberty approaches, with all the change, stress and strain dependent thereon, their lives should be appropriately modified. Rest should be enforced during the menstrual periods of these earlier years, and milder, more graduated exercise taken at other times. In the same way all mental strain should be diminished. Instead of pressure being put on a girl’s intellectual education at about this time, as is too often the case, the time devoted to school and books should be diminished. Education should be on broader, more fundamental lines, and much time should be passed in the open air."
I'm curious how the information is structured under the hood. I just recently learned about how folks in the digital humanities use the XML-TEI format for semantic markup of works like this. I've recently been exploring the Latin-English Lewis & Short dictionary encoded in XML-TEI.
I've had a ton of fun playing learning about BaseX and XQuery to ask questions like "Which classical authors are responsible for writing words that appear only once in the entire corpus (hapax legomena)" or "what are longest hapax words" (usually the funniest ones) and that kind of thing. Shout out to Tufts University for making this available!
I would love to be able to load the 1911 Britannica into BaseX and and see what interesting things I could learn about it via XQuery!
The first article I looked up was New Mexico, because I knew, as does anyone familiar with New Mexico history, that it became a state in January 1912 (before which it was a territory). Arizona also became a state, in February. I was surprised to find both described as states of the United States in this 1911 encyclopedia. I suppose the editors just made a confident guess? The last sentence of both articles is, “In June 1910 the President approved an enabling act providing for the admission of Arizona and New Mexico as separate states.”
Interesting how different both the tone and the structure of the articles are compared to the modern texts.
Take the article about Copenhagen as an example: https://britannica11.org/article/07-0111-copenhagen/copenhag... The geography and key points of interest are described very accurately, but the authors aren’t shy about inserting emotionally charged adjectives and personal options on what they consider interesting or curious. Also, the huge portion about the Battle of Copenhagen in the bottom is a complete departure and shifts the genre from a geographical description to the shot-per-shot narration of a naval battle.
A seriously trivial bug report, but the font you’ve chosen doesn’t support ℔, making articles like https://britannica11.org/article/22-0688-s2/putting_the_shot look odd. Potentially might be worth rewriting ℔ to a more normal (these days) lb?
I spent ages trying to work out if it would be possible to find a copy of the 2021 Encarta or Britannica.
Pre LLM And post COVID and perhaps the best we can hope for before AI taints all the info.
One of my prized possessions as a child was a CDROM based encyclopedia (well before the internet was common). I don't know why I liked it so much but on a rainy afternoon I'd kick up some of my favourite articles and read and learn more of them.
Small world - I'm currently cleaning up scans of the EB 9th edition to put it online as a mediawiki site; I'm including all the illustrations and plates so I'm only a third of the way through.
I've been testing different OCR tools and so far I've been the most impressed with paddleOCR - it correctly split the text columns, labled the illustrations, and noted the maragin text.
Still, it's not perfect, so I'm having to hand-edit some tables. I plan to put the source pages online as well so you can switch between the scanned page and the electronic text.
Bravo. People who like the 1911 Encyclopedia Britannica might like https://OldEncyc.com to dig into the volumes (by letter range) of 22 editions of old encyclopedias dated 1728-1926 (though not searchable like the OP).
This encyclopedia is racist:
Mentally the negro is inferior to the white, The remark of F. Manetta, made after a long study of the negro in America, may be taken as generally true of the whole race: “the negro children were sharp, intelligent and full of vivacity, but on approaching the adult period a gradual change set in. The intellect seemed to become clouded, animation giving place to a sort of lethargy, briskness yielding to indolence.
I have the hard copy of this edition and it does contain some curious things.
For example, if you look up "boiling." You might expect to read about what happens to a liquid when it's heated to a certain temperature, or perhaps a way of cooking foods, or sterilizing equipment. But the entry covers none of those. Instead, the only entry for boiling describes a punishment for persons convicting of poisoning who were, in England, dipped into a large cauldron of boiling water.
And, in the ways that violence and torture were wantonly reveled in centuries ago, they wouldn't just submerge the criminal and let him die there. Instead, they would lower him into the boiling water for a while and then pull him out. They'd repeat the process until eventually they finally killed him. That is the EB 11 ed entry for boiling. Yow!
A very simple addition that makes casual browsing much more fun is to add a menu with adjacent articles, as is done in this reconstruction of Littré's 19th century french dictionary: https://www.littre.org/ (see mots voisins)
Very neat!
Some bugs I noticed:
Searching for Zurich allows you to go to the article for the canton of Zurich, not the city. Clicking the link "Zürich (city)" inside of this article, opens this same article again about the canton, rather than opening the actual article for the city
When viewing an article, the search for articles (leftmost search box) doesn't seem to work at all for me (in Firefox). When being on the main page, it does work
There's a small clickable 'home' button on the right, but muscle memory from how other websites work makes me expect that clicking the big title "Encyclopædia Britannica, 11th Edition" on the top left also goes to home
It's very insightful to look up fission, fusion, atom and find yourself ... definitely before the great war.
As a time travel machine for the mind, this is great!
It would also be an invaluable resource for any Dungeon Master aspiring to lead a campaign at the end of the 19th century (Sherlock Holmes, or PG Wodehouse style, as it were), as doubtless many here are ...
Beautiful work! This is an amazing resource to have online. Reminds me a little of greensdictofslang.com or of Webster’s 1913, a perennial HN favorite: https://news.ycombinator.com/item?id=29733648
The most important entry I found in my physical copy of the 1911 Britannica is for Eavesdropping[0], detailing the original historical origins of the term and how it was thought about just before our modern era.
> Though the offence of eavesdropping still exists at common law, there is no modern instance of a prosecution or indictment.
Thanks for posting this resource, I've often wanted to share a link to this and other entries.
[0]https://britannica11.org/article/08-0867-eavesdrip/eavesdrip...
Some parts are ... amusing to read. For example the article on stars [0]...
"anything approaching a uniform distribution of the stars cannot extend Limits of the Universe. indefinitely. It can be shown that, if the density of distribution of the stars through infinite space is nowhere less than a certain limit (which may be as small as we please), the total amount of light received from them (assuming that there is no absorption of light in space) would be infinitely great, so that the background of the sky would shine with a. dazzling brilliancy ...."
[0] https://britannica11.org/article/25-0806-star/star#section-1...
This is good. I picked up a copy of the encyclopedia britannica from 1973 and quite enjoy browsing that rather than the internet. The articles seem well written, and as mentioned here, you have the fact and the history and everything all mixed in to some articles, and it's super interesting.
I highly recommend getting an old set of volumes.
Love this. I couldn't have imagined the quality of this Encyclopædia with this form that you have presented. Plus, the contributors! I love human race.
Just as a random data point, I searched for Genghis and nothing came up. Was there not much knowledge on Genghis Khan in 1911 I wonder?
Nice. Reading old books is a great way to be exposed to ways of thinking that have fallen out of fashion - some for (in my opinion) good reason, such as having been discovered to be incorrect or genuinely immoral, some for (in my opinion) bad reason, such as having become "politically incorrect", and some simply because they were forgotten.
But whatever the reason is why the ideas have fallen out of fashion, it can broaden the mind to encounter them.
I've been meaning to build ~exactly this experience, but for the 1952 Encyclopedia Brittanica Great Books of the World collection and its experimental index Syntopicon [0]. Would love to know more about how you OCR'd or otherwise ingested and parsed the raw material. I have a physical copy of the books, and I found some samizdat raw-image scans and started working on a custom OCR pipeline, but wondering if maybe I could learn from your approach...
[0] https://en.wikipedia.org/wiki/A_Syntopicon
Reading medical texts from 1911 is a great way to see how far psychiatry has advanced. there was a widespread medical and societal belief that masturbation was harmful to physical and mental health. https://britannica11.org/article/14-0628-insanity/insanity?q...
Note that the subsequent 12th edition (1922) may be in public domain in your jurisdiction.
Excellent resource. Small bug to report, the table here is broken (BANTU NEGROIDS section) https://britannica11.org/article/01-0358-africa/africa#secti.... Its quite fascinating to read what they thought about Africans as an African.
I wanted to let everyone know that article search from articles is now working properly again. A path problem. Apologies.
Read the sections on nebula since this book predates the discovery of galaxies
Interesting that the articles on Euler and Gauss are so much shorter than the ones on Kant and Schopenhauer. I guess authors of Britannica were not very interested in mathematics.
Interesting article about aether in there :)
No entry on the Great War? Really?!!!
Just kidding, of course. This is incredible and surprisingly nostalgic. Reading some of the entries took me right back to being a kid huddled in my room for hours pouring over an encyclopedia or even the dictionary.
And I still vividly remember the rush of installing Encarta for the first time on the family PC.
I couldn't believe that I, a mere kid, have now access to iconic historical footage and that I can watch anytime I felt like it. I can't describe how amazingly cool that felt at the time! It still gives me a hit of endorphins when I remember it today.
Already better than all AI wikipedias.
Do you have access to the original 1958 Edition of The Encyclopedia Americana Volume 2?
Just to confirm if this is real or Memorex or just another hoax?
https://imgbox.com/f7MDjbKs
Now someone please revive Microsoft Encarta ...
Please with the beige serif-font vibecoded sites......