Show HN: Large Scale Article Extract of Newspapers 1730s-1960s

zzleeper

Looks cool, congrats!

I've also worked with this data, but only for research purposes:

https://www.finhist.com/bank-runs/episodes/13895.html https://www.finhist.com/bank-runs/index.html

Surprisingly, I found out that layout was the trickiest thing, as newspaper articles often had multiple layers of headers, spanned multiple columns, etc.

Do you have a preferred solution on that?

show comments

brettnbutter

A few examples you can click on without having to authenticate or click the free trial (no cc if you do though and I won't bother you or chase you with spam etc...)

https://snewpapers.com/components/b2d40c08-db63-40e8-890f-09...

https://snewpapers.com/components/0fabc8e4-a60b-4f31-9ad1-b0...

https://snewpapers.com/components/cdde790f-4e97-4f2d-a2c2-95...

show comments

benwills

As someone who has done a lot of downloading/parsing, this is so awesome and impressive to see.

One thing to think about, which I also struggle with when it comes to large and complicated datasets, is the UI. Even being in the search industry for a long time, it's difficult for me to concretely see how I would use this.

I'd suggest taking a small sample of the dataset that might be reflective of how people would use it, then make that segment public and immediately searchable without registering. eg: One year of articles related to the Olympics.

What I've found is that it's hard for a lot of people to imagine how they would use something without actually using it. So giving people the actual experience of searching the archive and interacting with the results would go a long way.

Again, congrats on the work. This is really impressive work.

show comments