Re: Factor

Factor: the language, the theory, and the practice.

Trending GitHub Projects

Monday, October 25, 2010

#github #networking

This weekend someone posted an article describing using Python with YQL to parse Repopular to retrieve a list of popular GitHub projects. Since mashups are all the rage these days, I thought I would implement this in Factor.

The Yahoo! way

The YQL that was used in the original article is:

use 'https://yqlblog.net/samples/data.html.cssselect.xml'
  as data.html.cssselect;

select * from data.html.cssselect
where url="repopular.com"
  and css="div.pad a"

We can use the http.client to send this query to Yahoo!, parse the returned JSON data using json.reader, and extract the HREFs to the popular projects. We can then filter them for the links which point to GitHub.

USING: assocs http.client json.reader kernel sequences ;

: the-yahoo-way ( -- seq )
    "https://query.yahooapis.com/v1/public/yql?q=use%20'http%3A%2F%2Fyqlb
log.net%2Fsamples%2Fdata.html.cssselect.xml'%20as%20data.html.cssselect%
3B%20select%20*%20from%20data.html.cssselect%20where%20url%3D%22repopula
r.com%22%20and%20css%3D%22div.pad%20a%22&format=json&diagnostics=true&ca
llback="
    http-get nip json> { "query" "results" "results" "a" }
    [ swap at ] each [ "href" swap at ] map 
    [ "https://github.com" head? ] filter ;

We can run this to see what the current trending projects are on GitHub.

IN: scratchpad the-yahoo-way [ . ] each
"https://github.com/sinatra/sinatra"
"https://github.com/Sutto/barista"
"https://github.com/pypy/pypy"
"https://github.com/dysinger/apparatus"
"https://github.com/videlalvaro/Thumper"
"https://github.com/alunny/sleight"
"https://github.com/vimpr/vimperator-plugins"

The other way

We can use the html.parser vocabulary to do it another way. Given some knowledge of the HTML returned by Repopular, we can extract the HREFs directly:

USING: accessors assocs html.parser http.client kernel sequences ;

: the-other-way ( -- seq )
    "https://repopular.com" http-get nip parse-html
    [ [ name>> "aside" = ] find drop ]
    [ [ name>> "aside" = ] find-last drop ]
    [ <slice> ] tri
    [ name>> "a" = ] filter
    [ attributes>> "href" swap at ] map
    [ "https://github.com" head? ] filter ;

We can see it produces the same results:

IN: scratchpad the-other-way [ . ] each
"https://github.com/sinatra/sinatra"
"https://github.com/Sutto/barista"
"https://github.com/pypy/pypy"
"https://github.com/dysinger/apparatus"
"https://github.com/videlalvaro/Thumper"
"https://github.com/alunny/sleight"
"https://github.com/vimpr/vimperator-plugins"