The Sieve of Erathostene
Some history
One of the oldest and simplest yet interesting algorithm that
involves arbitrary large data is the sieve of Erathostene.
Erathostene (?-?) was a mathematician of ancient Greece
[more info sought here; to be added later].
This algorithm was once used as a speed "benchmark" for languages and compilers
(which is lame in the first case, because by inlining arbitrary many steps
of the sieve, you could speed the whole thing arbitrarily anyway, and in the
second, as the sieve would eventually be specifically recognized by compilers
to speed up benchmarks).
Here we use it not to benchmark speed,
but to illustrate language expressiveness (well, lack of expressiveness, as
for existing languages).
The mathematical idea behind the sieve
Here is what Erathostene found centuries BC.
A natural number m is a multiple of some natural number
n if and only if
there is a natural number k such that m=k n.
We then say that n divides m,
or that m is divisible by n,
or that n is a divisor of m.
We also call k the multiplicity factor.
A natural number is said to be prime if and only if it is
divisible only by itself and 1.
Thus a natural number m is not a prime if and only if
it is divisible by a natural number strictly greater than 1 and strictly
lesser than m.
We also know that any number not prime will have a prime divisor:
take the least divisor strictly greater than 1.
Finally, we know that the square of this least prime divisor is
not greater than the number: when you divide the number by the prime
divisor, you obtain a new number stricly lesser than the original whose
divisors will also divide the original one; hence the least one will be
greater than the least of the original one.
Then, to find all primes lesser than some large (well, not too large
either) number N, just consider every number below N,
and eliminate all the multiples each considered number;
the remaining will be the primes.
Now, we see that every time a marked number, its multiples will also
be multiples of its least prime divisor; hence you need not mark them a
second time.
We also see that when a prime p is reached, all multiples below
its square are already marked, as the multiplicity factork being
lesser than p, it will already be marked as a multiple of
k's least prime divisor.
Thus, we can also stop marking when the square of the considered number
is greater than N.
Programming the sieve
The idea of the sieve is simple; but then, you can deduce infinitely
many programs based on this idea.
Existing computer languages would require you to rewrite each from scratch.
But in the same way as a human doesn't re-learn the sieve from scratch
each time he sees it, we can and should have computers understand it
once and for all. Here is how we could do:
- Take the program's most generic form,
that should work in any context.
- Tell the computer how to specialize this form into efficient ones
according to the context.
The sieve program in various programming languages and paradigms
lame "C" version
void sieve (int N)
{
var n : int = 2 ;
sieved : int->bool = function x|->false ;
fun step () =
if not sieved (n) then
begin
out(n) ;
sieved := function x|-> old sieved x or n divides x
end ;
loop
begin
step () ;
increment(n)
end ;
}
Here are a few reasons why C sucks, that you can see in this program:
You must explicitly use an array or pointers (or a mix of both)
according to your statically knowing the bounds or not.
You must explicitly inline the sieving function for a right quantity
of small prime numbers needed to achieve efficiency.
To conclude, it's impossible to express the generic algorithm in a
one C program.
Side-effect message-passing version
var n : int = 2 ;
sieved : int->bool = function x|->false ;
fun step () =
if not sieved (n) then
begin
out(n) ;
sieved := function x|-> old sieved x or n divides x
end ;
loop
begin
step () ;
increment(n)
end ;
Pure lazy functional version
let Erathostene's_Sieve =
let step n sieved =
let next_num = n+1 in
if not (sieved num)
then
let new_sieved m = (sieved m) or (n divides m) in
cons_stream n (step next_num new_sieved)
else step next_num sieved in
prime_stream = step 2 (fun x->false)
"Natural language" version
Consider knowing a natural number n to not be a prime as being false.
Consider number two.
A sieve step is
if the considered natural number is not known to not be a prime,
then
say that the considered natural number is prime, and
consider knowing a natural number to not be a prime as
previously knowing that and the considered natural number
not dividing this one.
To sieve is to loop on the sieve step, each time considering the next number.
Recursively export the verb to sieve.
Lesser natural language interpreters may require a more thorough quoting,
parenthesing, annotating and explicit typing than a greater one.
Variations on the sieve
Here is the things we could tell the computer so he can enhance the
implementation of the sieve:
- Efficiency annotations
- Eliminate multiples only from the square of the prime on.
- Thus stop when the square exceeds the numbers to sieve.
- Use a well-known egregious identity to compute the square
together with the
- Statically or dynamically limit (and change) the precision
of integer computations as needed.
- Use an array to implement the variable sieving function
if a low enough bound is (locally) known to the sieve.
- Else use the list of previously known primes as a parameter
for the sieving function instead of dynamically
producing (pseudo- or real) code to test each new prime.
- Arbitrary dynamic combinations of the former methods may be used !
- Expand the first steps of the sieve (i.e. inline primes
up to 2,3,5, or 7, etc) to speed up the whole thing:
typical programs only inline the first step to gain a factor
two on space (when using arrays) and time;
another factor two may be gained by inlining the four first
steps instead;
for growing zones, this factor can be dynamically increased,
etc.
- Input/Output annotations
- The out() (or "say") input procedure defines what to do with
a number known to be prime
- The function may be executed step-by-step, or prime-by-prime,
or continuously as needed.
- The sieve results may be buffered or not;
the buffer may be shared or not
between smaller or larger sets of programs and
running instanciations
of a same persistent interrupted program;
if needed results are not available, a sieve would of course
be re-run.
- A proof that the sieve actually yields the prime numbers shall
be included, exported, and this fact pointed to in the
standard list of algorithms, so that this algorithm will be
automatically used by whoever wants prime numbers without
explicitly programming.
To Do on this page
Add more annotations, particularly about generating all primes below N or
factorizing N.
Show how the real program would look like on our HLL.
Back to the
HLL subproject.
Page Maintainer:
Faré
-- rideau@clipper.ens.fr