Recently, we had an Embedly hack week where internally we played with ideas to make something cool. I made a Reddit discussion network visualization, powered by D3 with previews generated by Embedly jQuery, and UI by Foundation.
I’ve been using Reddit a lot lately. My introduction to user-generated-link-site-addiction was Hacker News. When I moved onto Reddit, I first noticed how deeply nested conversations were. I found that interesting, especially because sometimes these deeply nested threads turned out hilarious. I like playing with social data, and figured I could visualize this nested structure as a network by linking comments to what they are referring to- as a new way to browse Reddit. At the same time I could see how different conversations are structured.
Getting Reddit data is fairly simple. You can see a previous post where we analyze subreddits. You just add a “.json” to the URL. Given a discussion, I recursively crawl the json response and pull all of the comments, noting the username, ID, parent, and body of the comment.
The D3 force directed layout needs an array of nodes and links, so I add those as the comments are parsed. The size of the node is based on the score. The original poster (OP) is orange, and all other nodes are black unless they comment more than once, in which case they are given a color.
To improve the user experience, I added link previews with Embedly jQuery and buttons for the Reddit Front Page. Each link from the front page is run through Embedly to get an embed of the link. A hover event for each button displays the respective preview.
There are also link previews for the comments. On hovering over a comment, the body is parsed for link text. The link text is replaced with a URL (just wrap it with an `a` tag), and used to get the embed preview. You can see an example of this in the Arnold Schwarzenegger AMA below.
One common pattern for deeply nested threads is that a user shows up in alternating responses to their comment, reflecting a dialogue. You can see this in the thread below the original post (large orange node) below. Click the image to open up the network.
You can also see how conversations can get ‘derailed’ and focus on the top comment thread, instead of the original post, as seen in the thread to the right of the first post.
AskReddit’s have much more comment upvoting than other discussions. You can see this by how large the nodes are.
AMA’s, as expected, have lots of comments from OP, as well as upvoting. Here is the top AMA from the last year:
and here is the infamous Morgan Freeman one. It was far less engaging, and you can see the difference:
And while we’re on AMAs, here is the Arnold Schwarzenegger one. I point it out because he used a unique answering method of handwriting the responses and posting them onto imgur.
This is a recent TIL I liked. You can see quite a few double (or more) comments by the same user along the threads. This was a particularly popular one, so there is more upvoting on this one than I’ve normally seen.
Here is the top post over the past year in /r/programming. It is showcasing a project. It looks like an AMA given the amount of times OP responds.
If you haven’t already, you can play around with the network here. Paste a link to the comments section of a reddit submission to see the network. The observations above are from looking at it over the past few days. I’d love to hear other things you notice from browsing reddit networks, and of course, here is the code, everything is client side.