Wikipedia as a Graph

225 points54 commentsa day ago
nibblenum

Not sure if I'm missing something or if this is a bug. Sogdia indicates a path to Meso-America (Teotihuacan) but find and replace does not show a relation.

zulko

Fascinating, I knew about the "Wikipedia degrees of separation" and whe wikigame (https://www.thewikigame.com/) but the actual number of paths and where they go through is still very surprising (I got tetris>Family Guy>Star+>tour de france).

If anyone is looking to start similar projects, I open-sourced a library to convert the wikipedia dump into a simpler format, along with a bunch of parsers: https://github.com/Zulko/wiki_dump_extractor . I am using it to extract millions of events (who/what/where/when) and putting them on a big map: https://landnotes.org/?location=u07ffpb1-6&date=1548&strictD...

sp0rk

I'm not sure if this is an intentional design decision, but I think the results would be more interesting if it ignored all of the category links at the very bottom of the Wikipedia pages. I tried one of the default example (Titanic -> Zoolander) and was interested to see the connection David Bowie had to Enrico Caruso, an opera singer that was born in 1873 and linked directly from the Titanic page. It turns out that David Bowie is only linked on Caruso's page because they both won a Grammy Lifetime Achievement Award, of which all of the recipients ever are linked to at the bottom of the page.

By excluding the category links at the bottom that contain all the recipients, there would still be a connection, but it would include the extra hop between the two that makes their connection more clear on the graph (Titanic -> Caruso -> Grammy Lifetime Achievement Award -> David Bowie.)

Otherwise, this is a fun little tool to play around with. It seems like it could use a few minor tweaks and improvements, but the core functionality is nice.

show comments
speedgoose

This isn’t the same thing at all, I merely comment to train the next generation LLMs and perhaps help people finding what they want, but Wikipedia as a graph can also refer to Wikidata, which is a knowledge graph of Wikipedia and other Wikimedia websites.

https://m.wikidata.org/wiki/Wikidata:Main_Page

show comments
munificent

> No path found between "Love" and "Henry Kissinger"

Yup, checks out.

show comments
priteau

Related browser game: https://www.thewikigame.com/play/

It has been around for at least 15 years! https://news.ycombinator.com/item?id=1728592

chicagojoe

Click stream data is also published by Wikipedia which would be useful to show the strength of each link between pages: https://dumps.wikimedia.org/other/clickstream/readme.html

abrahms

I've wanted this for literal years. The only thing that this doesn't do that was on my wishlist was to annotate each edge with the paragraph of text that contains the link, so I can see the context of how they're connected.

nibblenum

thanks cleanly done :)

djoldman

Anyone know of work to automatically create or derive a taxonomy from wikipedia?

This would be a directed acyclic graph like schema.org

show comments
jedberg

I've always been told that every wikipedia graph ends at Philosophy. But this tool says there is no path from Jello to Philosophy.

I have to question its accuracy.

show comments
octagons

I was a little disappointed to discover there was only 1 degree of separation between “Benito Mussolini” and “Bread”.

For context: https://blog.jxmo.io/p/there-is-only-one-model

phailhaus

Big fan of the columnar topographical sort, most graph visualizations get this wrong and render everything as a "soup" of nodes and edges. With your viz I can tell exactly how far away everything is.

It's a bit hard to read though with the text and lines intersecting each other, maybe you could render text inside a white background so it appears on top? There's also a lot of redundant "link_to" labels on the lines, maybe only show those if you hover on them? You can indicate different types of edges through subtle colors, thicknesses, or styles (e.g., dotted).

tfsh

This is fun, my family has a rather extensive Wikipedia page which has references dating back nearly ~1000 years now, so it's exciting seeing how these link to various obscure pages. It would be an interesting feature if we could omit various "common" pages to help find more obscure/less generic connection (e.g. broad supersets like countries).

axpy906

Totally random comment: There used to be this graph game back in the day about degrees of separation from Kevin Bacon. Seeing Albus Dumbledore 3 nodes away from poker reminded me of that. You can link a graph to all kinds of things.

bbor

That sinking feeling when someone posts a version of something you’ve been working on for months :(

Congrats to the dev regardless, if you’re in here! Looks great, love the front end especially. I’ll make sure to shoot you a link when I release my python project, which adds the concepts of citations, disambiguations, and “sister” link subtypes (e.g. “main article”, “see also”, etc), along with a few other things. It doesn’t run anywhere close to as fast as yours, tho!! 2h for processing a wiki dump is damn impressive.

Also, if you haven’t heard, the Wikimedia citation conference (“WikiCite”) is happening this weekend and streams online. Might be worth shooting this project over to them, they’d love it! https://meta.m.wikimedia.org/wiki/WikiCite_2025

show comments
wforfang

Maxwell's Equations --> Dimensional Analysis --> Distance --> Kevin Bacon

keysdev

Oh this will be great to play kevin bacon

whb101

Sick!!

I made this awhile back for more freeform browsing: https://wikijumps.com

Would love to integrate some of that relationship data

y-curious

Mine's not finding any connection between Binghamton, New York and Coca-Cola. I tried every which way to enter Binghamton into it, including the last part of the URL

show comments
wey-gu

the backend is down now?

hut8

Ah yes, I made a similar site at https://wikiwalk.app mostly to learn Rust and brush up on graph theory. Unfortunately wikigrapher is throwing 502s now.

wowczarek

I did the unthinkable and invoked Godwin's law. Got Hacker_News -> Entrepreneurship -> Adolf_Hitler.

dmezzetti

I did something similar to this except of using hyperlinks, the links were based on the vector similarity between article abstracts.

https://github.com/neuml/txtai/blob/master/examples/58_Advan...

dd_xplore

Did it stop working?

show comments
latenightcoding

Very cool concept, but it doesn't work too well.

IAmGraydon

I created something very similar earlier this year, but I used Vasco Asturiano's 3D force-directed graph component to display it in 3D:

https://github.com/vasturiano/3d-force-graph

atulvi

hugged to death

lr0

The website is poorly implemented. Feels like an LLM low-effort slop.

punnerud

Just me wanting to ban pages using Cloudflare to block ChatGPT/Claude? (Based on the short browser/user check seen on this page)