Sieve of Eratosthenes (Python, arrays)

From LiteratePrograms

Jump to: navigation, search
Other implementations: Alice ML | Bash | C++ | Forth | Haskell | Java | Python | Python, arrays | Scala

The Sieve of Eratosthenes is an algorithm for rapidly locating all the prime numbers in a certain range of integers. It operates by marking as composite all nontrivial multiples of each prime in sequence until only primes remain unmarked. It is most well-suited to implementation using arrays, which are compact and feature random access.


Contents

simple implementation

We follow the algorithm outlined in the introduction.

<<sieve>>=
  initialize the sieve
  test each possible factor in sequence
          mark nontrivial multiples
  return unmarked primes

First we set up an array of n2 flags, initially all set to 1, indicating primes.

<<initialize the sieve>>=
flags = zeros(n**2) + 1

Next we take each nontrivial factor, from [2,n) ...

<<test each possible factor in sequence>>=
for i in arange(2,n):

... and mark the nontrivial multiples.

  • we start at i2 because lower multiples will have been tested earlier in the sequence
  • by using a stride of i all higher multiples are set in the same statement
<<mark nontrivial multiples>>=
flags[i*i::i] = 0

As a result, the remaining unmarked elements are 0, 1, and the primes; we drop the first two to return only the primes.

<<return unmarked primes>>=
return flatnonzero(flags)[2:]

more complex implementation

We can attempt to speed the algorithm up by precalculating the first few passes. With a variant of wheel factorization, rather than initializing the sieve to all ones, we replicate a pattern without multiples of 2 and 3, and then reset the flags for the trivial multiples themselves.

<<initialize precalculated sieve>>=
flags = resize((0,1,0,0,0,1), (n**2,))
put(flags, (0,2,3), 1)

Because of the precalculation, we start testing from the next remaining prime, 5. Also, instead of testing each factor, prime or composite, we check the results calculated so far to test only the prime factors.

<<test each prime factor in sequence>>=
for i in arange(5,n,2):
    if flags[i]:

The remainder of the algorithm remains unchanged.

<<koskinon>>=
  initialize precalculated sieve
  test each prime factor in sequence
          mark nontrivial multiples
  return unmarked primes

putting it all together

Finally, we generate a list of the primes less than 9'999, and check that sieve and koskinon yield identical results.

<<sieve.py>>=
from numpy import *
def sieve(n):
sieve
def koskinon(n):
koskinon
if __name__ == "__main__":
        set_printoptions(edgeitems=100)
        print sieve(100)
        print all(koskinon(100)==sieve(100))

which we expect to produce:

[   2    3    5    7   11   13   17   19   23   29   31   37   41   43   47
   53   59   61   67   71   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173  179  181  191  193  197
  199  211  223  227  229  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349  353  359  367  373  379
  383  389  397  401  409  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541 ..., 9109 9127 9133 9137
 9151 9157 9161 9173 9181 9187 9199 9203 9209 9221 9227 9239 9241 9257 9277
 9281 9283 9293 9311 9319 9323 9337 9341 9343 9349 9371 9377 9391 9397 9403
 9413 9419 9421 9431 9433 9437 9439 9461 9463 9467 9473 9479 9491 9497 9511
 9521 9533 9539 9547 9551 9587 9601 9613 9619 9623 9629 9631 9643 9649 9661
 9677 9679 9689 9697 9719 9721 9733 9739 9743 9749 9767 9769 9781 9787 9791
 9803 9811 9817 9829 9833 9839 9851 9857 9859 9871 9883 9887 9901 9907 9923
 9929 9931 9941 9949 9967 9973]
True

επίλογος

Was the extra complexity of koskinon worthwhile? On the machine tested, it appears to be less than twice as fast as sieve.

Note: The reason it is not twice as fast as sieve is because the flags array in koskinon is just as large as the array in sieve. We want to remove as many unnecessary steps as possible (i.e. setting 0 to 0). Actually I can only see koskinon being faster because of the 'if flags[i]' statement and the 2 step in the arange, the former which should have been in sieve anyway. Also since flags only assumes values 1 and 0, we could use 31 bits per entry less, making the program much faster.

flags = zeros(n**2, bool) + 1


„Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.“ — L Kronecker

Download code
Views