Re: Factor

Factor: the language, the theory, and the practice.

New Combinatoric Functions

Sunday, July 25, 2010

#math

Combinatorics can provide some useful functions when working with sequences. In Factor, these are mostly defined in the math.combinatorics vocabulary.

USE: math.combinatorics

Inspired by some functions from clojure.contrib, I recently contributed two additional combinatoric words to the Factor project (although not with the same lazy semantics that the Clojure version has).

all-subsets:

The first word, all-subsets, returns all subsets of a given sequence. This can be calculated by iteratively taking n combinations of items from the sequence, where n goes from 0 (the empty set) to length (the sequence itself).

First, we observe how this works by experimenting with the all-combinations word:

IN: scratchpad { 1 2 } 0 all-combinations .
{ { } }

IN: scratchpad { 1 2 } 1 all-combinations .
{ { 1 } { 2 } }

IN: scratchpad { 1 2 } 2 all-combinations .
{ { 1 2 } }

By running it with various n, we have produced all of the subsets of the { 1 2 } sequence. Using a [0,b] range (from 0 to the length of the sequence), we make a sequence of subsets:

: all-subsets ( seq -- subsets )
    dup length [0,b] [
        [ dupd all-combinations [ , ] each ] each
    ] { } make nip ;

The all-subsets word can then be demonstrated by:

IN: scratchpad { 1 2 3 } all-subsets .
{ { } { 1 } { 2 } { 3 } { 1 2 } { 1 3 } { 2 3 } { 1 2 3 } }

selections:

Another useful function, selections, returns all the ways of taking n (possibly the same) elements from a sequence.

First, we observe that there are two base cases:

  1. If we want all ways of taking 0 elements from the sequence, we have only { } (the empty sequence).
  2. If we want all ways of taking 1 element from the sequence, we essentially have a sequence for each element in the input sequence.

If we take more elements from the sequence, we need to apply the cartesian-product word (which returns all possible pairs of elements from two sequences) n-1 times. For example, if we wanted to see all possible selections of 2 elements from a sequence, run the cartesian-product once:

IN: scratchpad { 1 2 3 } dup cartesian-product concat .
{
    { 1 1 }
    { 1 2 }
    { 1 3 }
    { 2 1 }
    { 2 2 }
    { 2 3 }
    { 3 1 }
    { 3 2 }
    { 3 3 }
}

Using these observations, we can build the selections word:

: (selections) ( seq n -- selections )
    dupd [ dup 1 > ] [
        swap pick cartesian-product [
            [ [ dup length 1 > [ flatten ] when , ] each ] each
        ] { } make swap 1 -
    ] while drop nip ;

: selections ( seq n -- selections )
    {
        { 0 [ drop { } ] }
        { 1 [ 1array ] }
        [ (selections) ]
    } case ;

This can be demonstrated by:

IN: scratchpad { 1 2 } 2 selections .
{ { 1 1 } { 1 2 } { 2 1 } { 2 2 } }

Note: we have defined this to take element order into account, so { 1 2 } and { 2 1 } are different possible results. Also, it could be argued that the result for { 1 2 3 } 1 selections should be { 1 2 3 } [ 1array ] map – perhaps it should change to that in the future.

This was committed to the main repository recently.

Update: A comment by Jon Harper showed me a way to improve all-subsets. Based on that, I also made some changes to selections (perhaps it could be improved even more):

: all-subsets ( seq -- subsets )
    dup length [0,b] [ all-combinations ] with map concat ;

: (selections) ( seq n -- selections )
    [ [ 1array ] map dup ] [ 1 - ] bi* [
        cartesian-product concat [ { } concat-as ] map
    ] with times ;

: selections ( seq n -- selections )
    dup 0 > [ (selections) ] [ 2drop { } ] if ;