The Sieve of Erathostene
Some history
One of the oldest and simplest yet interesting algorithm that
involves arbitrary large data is the sieve of Erathostene.
Erathostene (?-?) was a mathematician of ancient Greece
[more info sought here; to be added later].
This algorithm was once used as a speed "benchmark" for languages and compilers
(which is lame in the first case, because by inlining arbitrary many steps
of the sieve, you could speed the whole thing arbitrarily anyway, and in the
second, as the sieve would eventually be specifically recognized by compilers
to speed up benchmarks).
Here we use it not to benchmark speed,
but to illustrate language expressiveness (well, lack of expressiveness, as
for existing languages).
The mathematical idea behind the sieve
Here is what Erathostene found centuries BC.
A natural number m is a multiple of some natural number
n if and only if
there is a natural number k such that m=k n.
We then say that n divides m,
or that m is divisible by n,
or that n is a divisor of m.
We also call k the multiplicity factor.
A natural number is said to be prime if and only if it is
divisible only by itself and 1.
Thus a natural number m is not a prime if and only if
it is divisible by a natural number strictly greater than 1 and strictly
lesser than m.
We also know that any number not prime will have a prime divisor:
take the least divisor strictly greater than 1.
Finally, we know that the square of this least prime divisor is
not greater than the number: when you divide the number by the prime
divisor, you obtain a new number stricly lesser than the original whose
divisors will also divide the original one; hence the least one will be
greater than the least of the original one.
Then, to find all primes lesser than some large (well, not too large
either) number N, just consider every number below N,
and eliminate all the multiples each considered number;
the remaining will be the primes.
Now, we see that every time a marked number, its multiples will also
be multiples of its least prime divisor; hence you need not mark them a
second time.
We also see that when a prime p is reached, all multiples below
its square are already marked, as the multiplicity factork being
lesser than p, it will already be marked as a multiple of
k's least prime divisor.
Thus, we can also stop marking when the square of the considered number
is greater than N.
Programming the sieve
The idea of the sieve is simple; but then, you can deduce infinitely
many programs based on this idea.
Existing computer languages would require you to rewrite each from scratch.
But in the same way as a human doesn't re-learn the sieve from scratch
each time he sees it, we can and should have computers understand it
once and for all. Here is how we could do:
- Take the program's most generic form,
that should work in any context.
- Tell the computer how to specialize this form into efficient ones
according to the context.
The sieve program in various programming languages and paradigms
lame "C" version
char * sieve (int max)
{
int n, n2, m;
char * sieved ;
/* malloc'ing */
sieved = malloc(max) ;
assert(sieved) ;
memset(sieved,0,max) ;
/* loop */
for (n = 2, n2 = 4; n2<=max ; n2+=(n++<<1)+1 ) {
if (!sieved[n]) {
out(n) ;
for (m=n2; m <= max ;m+=n) sieved[m] = 1 ;
}
}
return sieved ;
}
Here are many reasons why C sucks, that we can see in this program:
- In this example, the only parameter is the upper bound of a range to sieve
to find which integers inside are primes.
Instead, we could have used the very same algorithm to compute
the indefinite list of prime numbers,
a function that associates its lowest prime factor to each integer in a range,
or an indefinite sieving function.
But in C, these mean complete rewrite of the function.
- The caller must know the low-level implementation of the result
structure to use it.
- Every type or function definition being global in C, object-oriented
style is made difficult, as the programmer must manually resolve name clash,
and/or add indirection levels; this wastes the programmer's time and leads
to inefficient code. In C++, the problem is only partially solved,
as classes are still global, statically bound, and never value-dependent.
- the preprocessor could be used to increase the genericity of the function,
but its use is heavy, very unpowerful, static, and especially horrible if
multiple instances of preprocessed the function are to be used.
- If you want time or space efficiency, you must explicitly inline
the sieving function for a right quantity of small prime numbers needed to
achieve efficiency, and pack/unpack data from the array. Yeech !
- allocating memory is painful; either the caller or the callee must do it.
If the caller does it, it can sometime be optimized, but is painful everytime.
If the callee does it, it's painful only once (unless there's an exception
mechanism, but this would be extremely painful), but it can't be
optimized, and the caller must know the internals anyway, to have a consistent
deallocation policy.
- You must explicitly give the low-level implementation of arrays,
lists, inlined code, allocation procedure. You can't let the system optimize
them according to the way the function will actually be used.
- The caller must In this example, we assumed that out() was a function
to call whenever a prime number is detected. Using data-driven style in the
program that uses this information would require either complete rewrite
of the program, or explicit forking (or use of threads).
To conclude, it's impossible to express efficiently the generic algorithm
in a one C program.
Side-effect message-passing version
var n : int = 2 ;
sieved : int->bool = function x|->false ;
fun step () =
if not sieved (n) then
begin
out(n) ;
sieved := function x|-> old sieved x or n divides x
end ;
loop
begin
step () ;
increment(n)
end ;
Pure lazy functional version
let Erathostene's_Sieve =
let step n sieved =
let next_num = n+1 in
if not (sieved num)
then
let new_sieved m = (sieved m) or (n divides m) in
cons_stream n (step next_num new_sieved)
else step next_num sieved in
prime_stream = step 2 (fun x->false)
"Natural language" version
Consider knowing a natural number n to not be a prime as being false.
Consider number two.
A sieve step is
if the considered natural number is not known to not be a prime,
then
say that the considered natural number is prime, and
consider knowing a natural number to not be a prime as
previously knowing that and the considered natural number
not dividing this one.
To sieve is to loop on the sieve step, each time considering the next number.
Recursively export the verb to sieve.
Lesser natural language interpreters may require a more thorough quoting,
parenthesing, annotating and explicit typing than a greater one.
Variations on the sieve
Here is the things we could tell the computer so he can enhance the
implementation of the sieve:
- Efficiency annotations
- Eliminate multiples only from the square of the prime on.
- Thus stop when the square exceeds the numbers to sieve.
- Use a well-known egregious identity to compute the square
together with the
- Statically or dynamically limit (and change) the precision
of integer computations as needed.
- Use an array to implement the variable sieving function
if a low enough bound is (locally) known to the sieve.
- Else use the list of previously known primes as a parameter
for the sieving function instead of dynamically
producing (pseudo- or real) code to test each new prime.
- Arbitrary dynamic combinations of the former methods may be used !
- Expand the first steps of the sieve (i.e. inline primes
up to 2,3,5, or 7, etc) to speed up the whole thing:
typical programs only inline the first step to gain a factor
two on space (when using arrays) and time;
another factor two may be gained by inlining the four first
steps instead;
for growing zones, this factor can be dynamically increased,
etc.
- Input/Output annotations
- The out() (or "say") input procedure defines what to do with
a number known to be prime
- The function may be executed step-by-step, or prime-by-prime,
or continuously as needed.
- The sieve results may be buffered or not;
the buffer may be shared or not
between smaller or larger sets of programs and
running instanciations
of a same persistent interrupted program;
if needed results are not available, a sieve would of course
be re-run.
- A proof that the sieve actually yields the prime numbers shall
be included, exported, and this fact pointed to in the
standard list of algorithms, so that this algorithm will be
automatically used by whoever wants prime numbers without
explicitly programming.
To Do on this page
Add more annotations, particularly about generating all primes below N or
factorizing N.
Show how the real program would look like on our HLL.
Back to the list of HLL Examples.
Up to the HLL subproject.
Page Maintainer:
Faré
-- rideau@clipper.ens.fr