Reddit Stats
Sunday, May 1, 2011
I thought it would be fun to track the development of the Factor programming language using Reddit. Of course, to do this, I wanted to use Factor.
A few months ago, I implemented a Reddit “Top” program. We can use the code for that as a basis for showing, using Reddit scores, how the activity (or popularity?) of a couple Factor blogs have changed over the last few years:
Stories
We start by creating a story
tuple that will hold all of the
properties that Reddit returns for each posted link (including the score
– up votes minus down votes – which we will be using).
TUPLE: story author clicked created created_utc domain downs
hidden id is_self levenshtein likes media media_embed name
num_comments over_18 permalink saved score selftext
selftext_html subreddit subreddit_id thumbnail title ups url ;
And a word to parse a “data” result into a story object:
: parse-story ( assoc -- obj )
"data" swap at \ story from-slots ;
Paging
The Reddit API provides results as a
series of “pages” (until all results are exhausted). This is pretty
similar to how the website works, so it should be fairly easy to
understand the mechanics. We will define a page
object that holds the
current URL, the results from the current page, and links to the pages
before and after the current page:
TUPLE: page url results before after ;
We can then build a word to fetch the page (as a JSON response), parse
it, and construct a page
object:
: json-page ( url -- page )
>url dup http-get nip json> "data" swap at {
[ "children" swap at [ parse-story ] map ]
[ "before" swap at [ f ] when-json-null ]
[ "after" swap at [ f ] when-json-null ]
} cleave \ page boa ;
An easy “paging” mechanism that, given a page of results, can return the next page:
: next-page ( page -- page' )
[ url>> ] [ after>> "after" set-query-param ] bi json-page ;
And, using the make vocabulary, construct all the results into a sequence:
: all-pages ( page -- results )
[
[ [ results>> , ] [ dup after>> ] bi ]
[ next-page ] while drop
] { } make concat ;
Domain Stats
We can retrieve all the results and produces a map of years that links were posted to a sum of the scores for each year’s links:
: domain-stats ( domain -- stats )
"https://api.reddit.com/domain/%s" sprintf json-page all-pages
[ created>> 1000 * millis>timestamp year>> ] group-by
[ [ score>> ] map-sum ] assoc-map ;
Finally, a word to display a chart of the scores across a given list of domains:
: domains. ( domains -- )
H{ } clone [
'[ domain-stats [ swap _ at+ ] assoc-each ] each
] keep >alist sort-keys <bar> 160 >>height chart. ;
The code for this is available on my GitHub.