Next Combination
Friday, August 25, 2023
Eleven years ago, I blogged about computing the next permutation of a sequence.
As part of some performance improvements to the math.combinatorics vocabulary that I was making around the same time, I needed a way of calculating the indices of the next combination of a given sequence.
Implementation
Without going into a great explanation, we split the problem into a few steps, operating on a sequence of indices and modifying in place – either incrementing the indices after the “max index” or the last index.
: find-max-index ( seq n -- i )
over length - '[ _ + >= ] find-index drop ; inline
: increment-rest ( i seq -- )
[ nth-unsafe ] [ swap index-to-tail <slice-unsafe> ] 2bi
[ drop 1 + dup ] map! 2drop ; inline
: increment-last ( seq -- )
[ index-of-last [ 1 + ] change-nth-unsafe ] unless-empty ; inline
:: next-combination ( seq n -- )
seq n find-max-index [
1 [-] seq increment-rest
] [
seq increment-last
] if* ; inline
We can show how it works by using clone
to see the intermediate results of
“5 choose 3”:
IN: scratchpad { 0 1 2 } 10 [
[ clone 5 next-combination ] keep
] replicate nip .
{
{ 0 1 2 }
{ 0 1 3 }
{ 0 1 4 }
{ 0 2 3 }
{ 0 2 4 }
{ 0 3 4 }
{ 1 2 3 }
{ 1 2 4 }
{ 1 3 4 }
{ 2 3 4 }
}
This is used to implement our various combinations words:
each-combination
, map-combinations
, filter-combinations
,
all-combinations
, find-combination
, reduce-combinations
, etc.
Benchmarking
The benchmark I have chosen is one that intentionally creates a large array of almost 65 million items with the result of allocating all combinations of 200 items choosing 4 items at a time.
We can run this benchmark on an Apple Mac mini (2018) with a 3.2 GHz 6-Core Intel Core i7 processor and 16 GB of memory that is used as part of the Factor nightly build farm.
Factor 0.99 takes 8.3 seconds:
IN: scratchpad [ 200 <iota> 4 all-combinations length ] time .
Running time: 8.269610590999999 seconds
Additional information was collected.
dispatch-stats. - Print method dispatch statistics
gc-events. - Print all garbage collection events
gc-stats. - Print breakdown of different garbage collection events
gc-summary. - Print aggregate garbage collection statistics
64684950
Look at the gc-summary.
to see that more than half the time (60% or 4.9
seconds) is involved in garbage collection as part of allocating the almost 65
million results:
IN: scratchpad gc-summary.
Collections: 1,480
Cards scanned: 7,070,518
Decks scanned: 9,124
Code blocks scanned: 2
Total time: 4,910,722 µs
Card scan time: 3,256,711 µs
Code block scan time: 162 µs
Marking time: 0 µs
Data heap sweep time: 0 µs
Code heap sweep time: 0 µs
Data compaction time: 0 µs
Other Languages
On that same machine we can try some other programming languages and see how they compare, attempting to get the same behavior as the Factor example above.
Python 3.11.4 takes 8.4 seconds:
In [1]: from itertools import combinations
In [2]: %time print(len(list(combinations(range(200), 4))))
64684950
CPU times: user 5.64 s, sys: 2.77 s, total: 8.41 s
Wall time: 8.44 s
PyPy 7.3.12 (compatible with Python 3.10) takes 17 seconds:
>>>> from itertools import combinations
>>>> from time import time
>>>> t0 = time(); print(len(list(combinations(range(200), 4)))); time() - t0
64684950
17.00031805038452
Ruby 3.2.2 takes 37.4 seconds:
irb(main):001:0> measure
TIME is added.
=> nil
irb(main):002:0> (0...200).to_a.combination(4).to_a.length
processing time: 37.413635s
=> 64684950
Julia 1.9.2 takes 10.7 seconds:
julia> ]add Combinatorics
julia> using Combinatorics
julia> @time length(collect(combinations(range(1, 200), 4)))
10.668982 seconds (129.37 M allocations: 8.193 GiB, 37.74% gc time)
64684950
Crystal 1.9.2 takes 515 seconds (interpreted) or 13.7 seconds (compiled):
# Run crystal interpreted
$ crystal i
icr:1> t0 = Time.utc;
puts (0...200).to_a.combinations(4).to_a.size;
Time.utc - t0
64684950 => 00:08:34.728442000
# Make a crystal source file
$ echo "t0 = Time.utc
puts (0...200).to_a.combinations(4).to_a.size
puts (Time.utc - t0)" > combo.cr
# Run crystal compiled
$ crystal combo.cr
64684950
00:00:13.739838000
Clojure 1.11.1 takes 10.7 seconds:
user=> (ns user (:use [clojure.math.combinatorics]))
user=> (time (count (combinations (range 0 200) 4)))
"Elapsed time: 10685.687736 msecs"
64684950
Rakudo v2023.08 takes 808.9 seconds:
$ time raku -e "print (1..200).combinations(4).Array.elems"
759.45s user 44.32s system 99% cpu 13:28.86 total
64684950
Considering that we only take the length of the large array of combinations, it is a bit artificial as a benchmark, with room for various optimizations, but it does seem to highlight both algorithmic and garbage collector issues in various languages.
Factor compares pretty favorably!