Re: Factor

Factor: the language, the theory, and the practice.

Monday, January 17, 2011


Reddit has an API that can be used for accessing much of the information available through their website. We can retrieve a JSON list of recent stories posted to any subreddit by going to$NAME. You can experiment with this in the Factor listener - to retrieve top stories for the programming subreddit:

IN: scratchpad USING: http.client json.reader ;

IN: scratchpad ""
               http-get nip json> .

Someone once used the API to build a reddit-top program for monitoring top stories from the console. We will use Factor vocabularies to scrape Reddit and produce something similar:

We start by building a (subreddit) helper word to retrieve the JSON response for a particular subreddit, extracting the top stories, and returning an array of hashtables (one for each of the top stories).

: (subreddit) ( name -- seq )
    "" sprintf http-get nip
    json> { "data" "children" } [ swap at ] each
    [ "data" swap at ] map ;

We can then define a story tuple, with a slot for each attribute returned by the API.

TUPLE: story author clicked created created_utc domain downs
hidden id is_self levenshtein likes media media_embed name
num_comments over_18 permalink saved score selftext
selftext_html subreddit subreddit_id thumbnail title ups url ;

Once we have that, we can use the set-slots word from my previous post on setting attributes to build a subreddit word that retrieves the top stories as objects:

: subreddit ( name -- stories )
    (subreddit) [ story new [ set-slots ] keep ] map ;

Thats all we need to build the subreddit-top word demonstrated in the beginning:

  1. Retrieve the top stories for a given subreddit.
  2. Loop over each story.
  3. Format and print the relevant attributes.
: subreddit-top ( subreddit -- )
    subreddit [
        1 + "%2d. " printf {
            [ title>> ]
            [ url>> ]
            [ score>> ]
            [ num_comments>> ]
                created_utc>> unix-time>timestamp now swap time-
                duration>hours "%d hours ago" sprintf
            [ author>> ]
        } cleave
        "%s\n    %s\n    %d points, %d comments, posted %s by %s\n\n"
    ] each-index ;

This (and some code for users and comments) is available on my GitHub.