Methods for Finding Related Reddit Subreddits with Simple Set Theory (2024)

I recently wrote a post on how to visualize network graphs of Reddit subreddits.

One of the reasons I’ve been researching the topic is to find a good way to facilitate discovery of lesser-known subreddits, as Reddit is doing a terrible job at it (although they have been trying a few new experiments very recently). As it turns out, invoking graph theory is overkill. Even fancy machine learning approaches like collaborative filtering, while powerful, may not be required to help Redditors discover new things.

Let’s say we have two sets: Set A, where A represents the number of active users in a given subreddit, and set B, where B is the set of active users in a subreddit. The intersection of Sets A and B (A ∩ B) represents users who are active in both subreddits.

Using BigQuery, I can get the comment data from ALL public Reddit subreddits, as otherwise this technique would not work well using any smaller subset. The network graph edgelist conveniently gives (A ∩ B), obtained as described in my previous post, which calculates the number of active users for all pairs of subreddits (defining “active users” as users who have made a comment in at least 5 unique threads in a given subreddit within the past 6 months).

Methods for Finding Related Reddit Subreddits with Simple Set Theory (1)

In this case, we can filter the edgelist to only allow intersections where there are at least 10 active users; this prevents including dead and personal subreddits.

We can run another similar query to get the number of active users for each subreddit.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (2)

After that, for a given subreddit A, find:

(A ∩ B) / (B)

for all subreddits B where (A ∩ B) > 0 (i.e. only neighbors of A). This computation takes less than a second. Additionally, the output is always a percentage between 0% and 100%. For the visualizations, we plot the Top 15 subreddits with the highest overlap of the specified subreddit A (and color the bars with a nice viridis palette to provide another easy way to perceive relative magnitude of relatedness).

The methodology may sound arbitrary, but the results are very interesting. Here’s a chart of the top related subreddits for /r/aww, one of the most popular places on the internet for cat pictures.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (3)

I have honestly never heard of any of these subreddits before. But yet, by analyzing public user activity alone, I found a few new places to get more cute pics.

This methodology is excellent for finding subreddit-specific subsubreddits which may not be documented. The related subreddits for /r/buildapc offer more places to get PC building advice.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (4)

Related subreddits for sport-specific subreddits, like /r/cfb (college football) include the corresponding teams.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (5)

/r/food related subreddits list a surprising number of subreddits dedicated to specific foods.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (6)

There is a surprising amount of depth to the /r/me_irl network.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (7)

The chart for /r/programming can tell you which subreddits exist for specific programming languages and technologies.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (8)

The methodology can also reveal a lack of related subreddits, by the large contrast between subreddits with high relatedness and low relatedness. For example, while /r/cfb may have large numbers of obviously-related subreddits as a sports subreddit, /r/golf has only 2.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (9)

You can view Related Subreddit charts for the Top 200 Subreddits in this GitHub repository.

Finding Similar Subreddits

Another method for finding related subreddits would be to find subreddits with similar communities. An academic approach to finding similarity between sets is the Jaccard Index. Using the same set A and set B definitions above, the formula now becomes:

(A ∩ B) / [(A) + (B) - (A ∩ B)]

which outputs the Jaccard Index, between 0 and 1. This formula only requires a few tweaks to the original code. The results from this computation tell a different story.

Here are the most-similar subreddits to /r/aww:

Methods for Finding Related Reddit Subreddits with Simple Set Theory (10)

In this implementation, the default Reddit subreddits must be removed from the results, as the communities of default subreddits are largely similar to most others by design. Even former defaults like /r/adviceanimals and /r/technology still have large amounts of holdout users which skew the results. As /r/aww is a mass-appeal subreddit, it makes sense that the communities are similar to other mass-appeal subreddits.

The magnitude of the Jaccard Index measures the strength of the similarity. Most subreddit relationships have a low Jaccard Index, but the relative magnitude between all subreddit neighbors illustrate comparisons for potential related subreddits regardless (this is also the reason why the x-axis is not fixed across plots). The subreddit relationship with the highest absolute similarity is /r/arrow and /r/flashtv at 0.345, which make sense given the massive overlap between the two CW television shows.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (11)

The Jaccard Index is more useful for finding similar subreddits to niche subreddits. Let’s try a few of the subreddits mentioned previously and see how the results changed.

/r/buildapc is a niche, and the output identifies well-established subreddits, unlike with the previous related-subreddit methodology.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (12)

The subreddit most similar to /r/cfb (college football) is /r/collegebasketball!

Methods for Finding Related Reddit Subreddits with Simple Set Theory (13)

The subreddit most similar to /r/food is /r/cooking!

Methods for Finding Related Reddit Subreddits with Simple Set Theory (14)

The subreddit most similar to /r/programming is /r/linux! (of course)

Methods for Finding Related Reddit Subreddits with Simple Set Theory (15)

You can view the Similar Subreddit charts for the Top 200 Subreddits in this GitHub repository.

Again, Reddit has significantly better internal data for identifying user activity between subreddits, such as voting patterns and clickthrough tracking. But the results shown using these two set methodologies are pretty good for using public data. In fact, these two set approaches can theoretically work with any set of categorized, settable data, which may give me a few ideas for new blog posts in the future.

And there’s still the fancy machine learning approaches to try.

As always, the full code used to process the comment data and generate the visualizations is available in this Jupyter notebook, open-sourced on GitHub.

If you do find any other interesting trends in the related/similar charts of other subreddits and write about it, it would be greatly appreciated if proper attribution is given back to this post and/or myself. Thanks!

Methods for Finding Related Reddit Subreddits with Simple Set Theory (2024)
Top Articles
Measuring student learning | Center for Teaching Innovation
What is "Serious Mental Illness" and What is Not?
Words With Friends Cheat Board Layout 11X11
O Riley Auto Parts Near Me
Craigslist The Big Island
Culver's Flavor Of The Day Ann Arbor
Autozone Memorial Day Hours
5 Anterior Pelvic Tilt Exercises
The KT extinction
North Station To Lowell Schedule
Duralast Battery H6-Dl Group Size 48 680 Cca
An Honest Review of Accor Live Limitless (ALL) Loyalty Program
Www. Kdarchitects .Net
Betty Rea Ice Cream
Wdel News Today
Craigslist Metal Roofing
BugBitten Jiggers: a painful infestation
Aaf Seu
When Is Lana Rhoades’ Baby Due Date? Baby Daddy, Bump, And More
Indian Restaurants In Cape Cod
Frederik Zuiderveen Borgesius on LinkedIn: Amazingly quick work by Arnoud💻 Engelfriet! Can’t wait to dive in.
Craigslist Goats For Sale By Owner Near Me
Cn/As Archives
Anvil In Shattrath
Poe Poison Srs
Dreamhorse For Sale
Craigslist Eugene Motorcycles
Vision Government Solutions Stamford Ct
With Great Animation Comes Great Music — Spider-Man Across the Spider-Verse Live in Concert | Discover Jersey Arts
Erj Phone Number
NFL Week 1 games today: schedule, channels, live streams for September 8 | Digital Trends
Square Coffee Table Walmart
Zions March Labradors
Craigslist Chester Sc
Grave Digger Wynncraft
Sam's Club Stafford Gas Price
Keci News
Indiefoxx's biography: why has the streamer been banned so often?
Amazon Ups Drop Off Locations Near Me
100X35 Puerto Rico Meaning
Sodexo North Portal
Stellaris How To Get Subjugation Casus Belli
Channel 3000 News Madison Wisconsin
Osrs Desert Heat
101 Riddles for Adults That Will Test Your Smarts
Chars Boudoir
Glyph Of The Trusted Steed
Paychex Mobile Apps - Easy Access to Payroll, HR, & Other Services
[PDF] Canada - Free Download PDF
Payback Bato
Blow Dry Bar Boynton Beach
Markella Magliola Obituary
Latest Posts
Article information

Author: Dean Jakubowski Ret

Last Updated:

Views: 6381

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Dean Jakubowski Ret

Birthday: 1996-05-10

Address: Apt. 425 4346 Santiago Islands, Shariside, AK 38830-1874

Phone: +96313309894162

Job: Legacy Sales Designer

Hobby: Baseball, Wood carving, Candle making, Jigsaw puzzles, Lacemaking, Parkour, Drawing

Introduction: My name is Dean Jakubowski Ret, I am a enthusiastic, friendly, homely, handsome, zealous, brainy, elegant person who loves writing and wants to share my knowledge and understanding with you.