Analyzing the Interconnectedness of Wikipedia Articles
I Made a Graph of Wikipedia... This Is What I Found 🔗
0:00 Intro
The text is a video introduction to a graph representing the network of Wikipedia articles and their links. The creator promises that by watching the video, viewers will understand the graph and learn interesting information about it.
- Video introduces a graph representing Wikipedia articles and their links
- Creator promises viewers will understand the graph and learn interesting information
1:00 Communities
The text discusses the creation of a graph of Wikipedia articles based on their connections and the algorithmically determined communities. The communities reflect various subjects such as politics, music, video games, space, and regional politicians. The grouping of articles into communities also reflects societal interests, such as the popularity of Indian and Korean cinema. Additionally, there is a notable link between Canadian people and hockey in one community, and an interesting observation that sports articles are spread across different communities, contrary to what a human categorization might suggest.
4:07 Popular Articles
The video discusses the construction of a graph representing Wikipedia articles and the insights gained from analyzing its structure. It highlights the relationship between the size of each circle in the graph and the number of incoming links to the corresponding article, as well as the most referenced articles on Wikipedia. The impact of specific articles, such as those related to World War I, World War II, association football, and the United States, is explored, shedding light on the interconnectedness of content. Furthermore, the analysis delves into the contribution patterns of different countries to the English Wikipedia, with a focus on the correlation between the size of dots representing countries and the number of links to their corresponding articles.
- The size of each circle in the graph is proportional to the amount of incoming links to its corresponding article.
- The most referenced articles on Wikipedia include those related to World War I, World War II, association football, and the United States.
- Contribution patterns of different countries to the English Wikipedia are examined, revealing a correlation between the size of dots representing countries and the number of links to their corresponding articles.
7:38 Orphans & Dead Ends
The text discusses a project involving the creation of a graph representing Wikipedia articles and explores the concept of the Wikipedia race, where participants navigate from one page to another using only internal links. The author explains the presence of orphaned and dead-end articles on Wikipedia, highlighting their impact on the game and the graphing process. Over 350,000 articles, approximately 5% of all Wikipedia articles, are identified as orphaned, while about 6,000 are dead ends. Additionally, there are over 2,000 articles that are both orphaned and dead ends, causing complications in the graphing algorithm.
10:23 6 Degrees of Wikipedia
The text discusses the interconnectedness of articles on Wikipedia, demonstrating that most articles are reachable from one another within six degrees of separation. The author visualizes this by starting with a random Wikipedia page and plotting the articles in each degree of separation. The growth in the number of articles reached is rapid in the first few degrees but slows down after the sixth degree, with about 92% of all articles being reachable within seven or eight degrees. Additionally, it is noted that a small percentage of articles are unreachable, with some being orphaned or forming orphan groups.
- Most articles on Wikipedia are reachable within six degrees of separation
- Rapid growth in the number of reached articles in the first few degrees, followed by a slowdown
- Approximately 92% of all articles are reachable within seven or eight degrees
- A small percentage of articles are unreachable, including orphans and orphan groups
14:56 Longest Path on Wikipedia
The text discusses the findings of a study on the path lengths between articles on Wikipedia. The study found that, on average, the path length between two articles is 4.8 links, with about 8% of articles being unreachable from the main graph. The study also revealed that paths with lengths less than three and greater than eight were extremely rare, and the longest path found was 166 links long, connecting the article for athletics in the 1953 Arab games to a list of Highways number 999. This path was considered rare and tedious to navigate, similar to an actual highway.
- Study findings: average path length between articles is 4.8 links
- Rare paths: paths with lengths less than three and greater than eight were extremely rare
- Longest path found: 166 links long, connecting specific Wikipedia articles
17:06 FANTA CAKE
The text discusses a unique Wikipedia article called "Fanta cake" that was initially a disguised dead-end orphan page, meaning it appeared to have a link but actually linked back to itself and had no other connections. The author highlights the dynamic nature of Wikipedia, emphasizing its constant evolution and the ability for anyone to contribute and update information.
- Discussion of a unique Wikipedia article called "Fanta cake"
- Explanation of a disguised dead-end orphan page
- Emphasis on the dynamic nature of Wikipedia
19:20 Outro
The text is a closing statement from a video creator thanking their sponsors on GitHub for supporting the channel and allowing them to create content like the one just watched. The creator also encourages viewers to subscribe, like the video, and mentions that doing so helps the channel.