Re: Factor

Factor: the language, the theory, and the practice.

Genuine Sieve of Eratosthenes

Monday, June 8, 2015

#math #performance #primes

Iain Gray posted on the mailing list about a paper called the Genuine Sieve of Eratosthenes by Melissa O’Neill, a Professor of Computer Science at Harvey Mudd College.

It begins with a discussion about some clever looking Haskell code that is just trial division and not the Sieve of Eratosthenes. At the end of the paper is an interesting discussion of performance improvements that I thought might be fun to implement in Factor as a followup to the three versions that I posted about recently.

Version 4

We are going to use wheel factorization as a way of reducing the search space by ignoring some multiples of small prime numbers. It is a bit similar to ignoring all even numbers in our previous “Version 3”. In this case, we want to ignore all multiples of the first four prime numbers.

To calculate the “2-3-5-7” wheel, we start with 11 and filter the next 210 numbers that are not divisible by 2, 3, 5, or 7.

CONSTANT: wheel-2-3-5-7 $[
    11 dup 210 + [a..b] [
        { 2 3 5 7 } [ divisor? ] with any? not
    ] B{ } filter-as differences
]

Note: We use 210 here because the “{ 2 3 5 7 } product” cycle is 210.

Next, we want a way to iterate across all the numbers that might be prime, calling a quotation on each number that is prime.

:: each-prime ( upto sieve quot -- )
    11 upto '[ dup _ <= ] [
        wheel-2-3-5-7 [
            over dup 2/ sieve nth [ drop ] quot if
            +
        ] each
    ] while drop ; inline

For each prime found, we will want to mark all odd multiples as not prime.

:: mark-multiples ( i upto sieve -- )
    i sq upto i 2 * <range> [| j |
        t j 2/ sieve set-nth
    ] each ; inline

We will use a bit-array that is sized in 210-bit blocks, so the number of bits needed is:

: sieve-bits ( n -- bits )
    210 /i 1 + 210 * 2/ 6 + ; inline

Finally, we calculate our sieve, first for 11 and then for all the primes above 11.

:: sieve4 ( n -- primes )
    n sieve-bits <bit-array> :> sieve
    t 0 sieve set-nth
    t 4 sieve set-nth
    n sqrt sieve [ n sieve mark-multiples ] each-prime
    V{ 2 3 5 7 } clone :> primes
    n sieve [
        dup n <= [ primes push ] [ drop ] if
    ] each-prime primes ;

Calculating prime numbers up to 10 million is getting even faster.

IN: scratchpad gc [ 10,000,000 sieve4 ] time
Running time: 0.08523477 seconds

Version 5

If you use the compiler.tree.debugger to inspect the optimized output, you can see some dynamic dispatch with branches for fixed-width (fixnum) and arbitrary-precision (bignum) integers.

It would be nice to perform fixnum specialization or improve type propagation in the compiler, but for now we are going to write some ugly code to force fixnum math for performance.

We make a version of each-prime that inlines the iteration:

:: each-prime2 ( upto sieve quot -- )
    11 upto >fixnum '[ dup _ <= ] [
        wheel-2-3-5-7 [
            over dup 2/ sieve nth-unsafe [ drop ] quot if
            fixnum+fast
        ] each
    ] while drop ; inline

And similarly for mark-multiples:

:: mark-multiples2 ( i upto sieve -- )
    i 2 fixnum*fast :> step
    i i fixnum*fast upto >fixnum '[ dup _ <= ] [
        t over 2/ sieve set-nth-unsafe
        step fixnum+fast
    ] while drop ; inline

And use them in our sieve computation:

:: sieve5 ( n -- primes )
    n sieve-bits <bit-array> :> sieve
    t 0 sieve set-nth
    t 4 sieve set-nth
    n sqrt sieve [ n sieve mark-multiples2 ] each-prime2
    V{ 2 3 5 7 } clone :> primes
    n sieve [
        dup n <= [ primes push ] [ drop ] if
    ] each-prime2 primes ;

Calculating prime numbers up to 10 million is super fast!

IN: scratchpad gc [ 10,000,000 sieve5 ] time
Running time: 0.033914378 seconds

We can compute all the 11,078,937 prime numbers up to 200 million in about a second!

IN: scratchpad gc [ 200,000,000 sieve5 ] time
Running time: 0.970876855 seconds

And all the 50,847,534 prime numbers up to 1 billion in about six seconds!

IN: scratchpad gc [ 1,000,000,000 sieve5 ] time
Running time: 6.141224805 seconds

Not quite primegen or primesieve, but not bad for a laptop!

This is available in the math.primes.erato.fast vocabulary on our nightly builds!