Re: Factor

Factor: the language, the theory, and the practice.


Friday, February 4, 2011

#math #sorting

For some Factor code I was writing recently, I needed to find the “maximum” and “minimum” of some sequences of values. The “obvious” way to do this is to use the supremum and infimum words from the sequences vocabulary. Those words were not quite what I wanted, so I thought I would write some thoughts about it:

First, an example to show how they work:

IN: scratchpad { 1 2 3 4 } supremum .

IN: scratchpad { 1 2 3 4 } infimum .

Let’s say we create a person tuple with a name and age.

TUPLE: person name age ;

Now, let’s say we have a list of people (e.g., a sequence of person objects):

    T{ person f "Jim" 30 }
    T{ person f "Sally" 27 }
    T{ person f "Rebecca" 32 }
    T{ person f "James" 28 }
    T{ person f "Benjamin" 22 }

What if we’d like to find out who the oldest person is? We could try using the supremum word:

IN: scratchpad PEOPLE supremum
Generic word <=> does not define a method for the person class.
Dispatching on object: T{ person f "Sally" 27 }

Whoops! The supremum word uses the max generic word, which in turn defaults to using the after? generic word, which in turn defaults to using the <=> generic word (from the math.order vocabulary) to compare two values for +lt+, +eq+, or +gt+.

So, let’s define that word for the person types:

M: person <=> [ age>> ] bi@ <=> ;

Okay, let’s try again:

IN: scratchpad PEOPLE supremum .
T{ person f "Rebecca" 32 }

Alright, now we also want to find the person with the shortest name. Well, we could use infimum, but it, like supremum, delegates to <=> which (from our previous definition) currently orders by a person’s age. Let’s redefine <=> to compare based on length of a person’s name:

M: person <=> [ name>> length ] bi@ <=> ;

Okay, let’s try it now:

IN: scratchpad PEOPLE infimum .
T{ person f "Jim" 30 }

It worked! But, this is quite awkward to change the definition of <=> for each purpose. The problem here is that supremum and infimum (and the <=> word they use) expect to be used for the natural ordering of objects, not for arbitrary orderings.

Instead, we can define two new words, max-by and min-by, that use a provided quotation to obtain a value that is then used to compare two objects (e.g., in our first example, age, and in our second, the length of name).

: max-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 )
    [ bi@ [ max ] keep eq? not ] curry most ; inline

: min-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 )
    [ bi@ [ min ] keep eq? not ] curry most ; inline

With these, we can define the words, maximum and minimum, which use these to retrieve the object with “maximum” or “minimum” characteristics (using the specified quotation).

: maximum ( seq quot: ( ... elt -- ... n ) -- elt )
    [ keep 2array ] curry
    [ [ first ] max-by ] map-reduce second ; inline

: minimum ( seq quot: ( ... elt -- ... n ) -- elt )
    [ keep 2array ] curry
    [ [ first ] min-by ] map-reduce second ; inline

Using these, we can easily find our answers:

IN: scratchpad PEOPLE [ age>> ] maximum .
T{ person f "Rebecca" 32 }

IN: scratchpad PEOPLE [ name>> length ] minimum .
T{ person f "Jim" 30 }

It’s worth noting that we could define both supremum and infimum in terms of our new words:

: supremum ( seq -- elt ) [ ] maximum ;

: infimum ( seq -- elt ) [ ] minimum ;

But, because of the wrapping and unwrapping that we do inside of maximum and minimum, it’s a little less efficient than the current implementation.