## Select Random Lines

Tuesday, October 23, 2012

Ned Batchelder made an interesting blog post on selecting randomly from
an unknown
sequence.
His Python version uses a “random replacement”
strategy to choose each line with a `1/n`

probability, where `n`

is the
number of elements seen so far:

```
import random
def random_element(seq):
"""Return an element chosen at random from `seq`."""
it = None
for n, elem in enumerate(seq):
if random.randint(0, n) == 0:
it = elem
return it
```

### random-line

I wanted to show how a similar approach in Factor could be used to select a random line from a file, or any stream that supports a character-based stream protocol.

We can make a variant of
each-line
that calls a quotation with each line and its “line number” (indexed
from 1 to `n`

, the number of lines in the stream):

```
: each-numbered-line ( ... quot: ( ... line number -- ... ) -- ... )
[ 1 ] dip '[ swap [ @ ] [ 1 + ] bi ] each-line drop ; inline
```

Then, it is pretty easy to select a random line (replacing each line
with chance of `1/K`

probability where `K`

is the current line number.

```
: random-line ( -- line/f )
f [ random zero? [ nip ] [ drop ] if ] each-numbered-line ;
```

### random-lines

To efficiently select more than one random line from a stream, we will
be using a technique called reservoir
sampling. The idea is
to select the first `n`

elements, then randomly replace them with
weighted probability `n/K`

, where `K`

is the current line number:

```
:: random-lines ( n -- lines )
V{ } clone :> accum
[| line line# |
line# n <= [
line accum push
] [
line# random :> r
r n < [ line r accum set-nth ] when
] if
] each-numbered-line accum ;
```

Note: we used local variables to implement this.

This is available in the io.random vocabulary in the Factor development branch.