Roman Sort
Monday, May 12, 2025
Factor has included support for Roman numerals since at least 2008. Sometimes recent events – such as the election of Pope Leo XIV or the Super Bowl LIX – revive modern interest in how they work and how to computationally work with them.
There was a blog post a few days ago about sorting Roman numerals which pointed out that sorting them alphabetically worked pretty well. Given that we have a pretty good roman vocabulary, I thought we could explore different methods of sorting a sequence of strings representing Roman numbers.
Let’s first try the mostly-but-not-entirely-correct method suggested in the blog post:
IN: scratchpad 10 [1..b) [ >ROMAN ] map
IN: scratchpad sort .
{ "I" "II" "III" "IV" "IX" "V" "VI" "VII" "VIII" }
Well, that’s almost correct, but the number IX
– the number 9
–
sorts in the middle, rather than at the end. Could we fix this by using
sort-by to
convert the string to a number before calling
compare to
produce a sorted output?
IN: scratchpad 10 [1..b) [ >ROMAN ] map
IN: scratchpad [ roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }
That fixes it, but how often do we end up calling the conversion function?
IN: scratchpad 10 [1..b) [ >ROMAN ] map
IN: scratchpad SYMBOL: calls
IN: scratchpad [ calls inc roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }
IN: scratchpad calls get .
40
Wow, 40
times – that seems like a lot!
Perhaps we could try the Schwartzian transform – also known as decorate-sort-undecorate – at the expense of using intermediate storage by saving the keys for each element, then sorting, and then returning only the values:
IN: scratchpad 10 [1..b) [ >ROMAN ] map
IN: scratchpad [ [ roman> ] keep ] map>alist sort-keys values .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }
That seems like a lot of code to do a simple thing. Instead, we can use the map-sort abstraction which implements the same method:
IN: scratchpad 10 [1..b) [ >ROMAN ] map
IN: scratchpad [ roman> ] map-sort .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }
Does this make much of a difference? Let’s compare each method:
: roman-sort1 ( seq -- sorted-seq ) [ roman> ] sort-by ;
: roman-sort2 ( seq -- sorted-seq ) [ roman> ] map-sort ;
We can time sorting a list of 100,000
random Roman numbers under 1,000
:
IN: scratchpad 100,000 1,000 [1..b] randoms [ >ROMAN ] map
IN: scratchpad [ [ roman-sort1 ] time ]
[ [ roman-sort2 ] time ] bi
Running time: 4.164076625 seconds
Running time: 0.154227583 seconds
You can see that the decorate-sort-undecorate pattern is quite a bit faster in this case. This is not always true, but generally depends on how much work the key function is doing.