Re: Factor

Factor: the language, the theory, and the practice.

Roman Sort

Monday, May 12, 2025

#performance #sorting

Factor has included support for Roman numerals since at least 2008. Sometimes recent events – such as the election of Pope Leo XIV or the Super Bowl LIX – revive modern interest in how they work and how to computationally work with them.

There was a blog post a few days ago about sorting Roman numerals which pointed out that sorting them alphabetically worked pretty well. Given that we have a pretty good roman vocabulary, I thought we could explore different methods of sorting a sequence of strings representing Roman numbers.

Let’s first try the mostly-but-not-entirely-correct method suggested in the blog post:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad sort .
{ "I" "II" "III" "IV" "IX" "V" "VI" "VII" "VIII" }

Well, that’s almost correct, but the number IX – the number 9 – sorts in the middle, rather than at the end. Could we fix this by using sort-by to convert the string to a number before calling compare to produce a sorted output?

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

That fixes it, but how often do we end up calling the conversion function?

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad SYMBOL: calls

IN: scratchpad [ calls inc roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

IN: scratchpad calls get .
40

Wow, 40 times – that seems like a lot!

Perhaps we could try the Schwartzian transform – also known as decorate-sort-undecorate – at the expense of using intermediate storage by saving the keys for each element, then sorting, and then returning only the values:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ [ roman> ] keep ] map>alist sort-keys values .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

That seems like a lot of code to do a simple thing. Instead, we can use the map-sort abstraction which implements the same method:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ roman> ] map-sort .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

Does this make much of a difference? Let’s compare each method:

: roman-sort1 ( seq -- sorted-seq ) [ roman> ] sort-by ;

: roman-sort2 ( seq -- sorted-seq ) [ roman> ] map-sort ;

We can time sorting a list of 100,000 random Roman numbers under 1,000:

IN: scratchpad 100,000 1,000 [1..b] randoms [ >ROMAN ] map

IN: scratchpad [ [ roman-sort1 ] time ]
               [ [ roman-sort2 ] time ] bi
Running time: 4.164076625 seconds
Running time: 0.154227583 seconds

You can see that the decorate-sort-undecorate pattern is quite a bit faster in this case. This is not always true, but generally depends on how much work the key function is doing.