Re: Factor

Factor: the language, the theory, and the practice.

Select Random Lines

Tuesday, October 23, 2012


Ned Batchelder made an interesting blog post on selecting randomly from an unknown sequence. His Python version uses a “random replacement” strategy to choose each line with a 1/n probability, where n is the number of elements seen so far:

import random

def random_element(seq):
    """Return an element chosen at random from `seq`."""
    it = None
    for n, elem in enumerate(seq):
        if random.randint(0, n) == 0:
            it = elem
    return it


I wanted to show how a similar approach in Factor could be used to select a random line from a file, or any stream that supports a character-based stream protocol.

We can make a variant of each-line that calls a quotation with each line and its “line number” (indexed from 1 to n, the number of lines in the stream):

: each-numbered-line ( ... quot: ( ... line number -- ... ) -- ... )
    [ 1 ] dip '[ swap [ @ ] [ 1 + ] bi ] each-line drop ; inline

Then, it is pretty easy to select a random line (replacing each line with chance of 1/K probability where K is the current line number.

: random-line ( -- line/f )
    f [ random zero? [ nip ] [ drop ] if ] each-numbered-line ;


To efficiently select more than one random line from a stream, we will be using a technique called reservoir sampling. The idea is to select the first n elements, then randomly replace them with weighted probability n/K, where K is the current line number:

:: random-lines ( n -- lines )
    V{ } clone :> accum
    [| line line# |
        line# n <= [
            line accum push
        ] [
            line# random :> r
            r n < [ line r accum set-nth ] when
        ] if
    ] each-numbered-line accum ;

Note: we used local variables to implement this.

This is available in the io.random vocabulary in the Factor development branch.