Finding Prime Numbers - Part 2
In part 1 of this series we looked at the quick and easy way to determine if a number is prime using Python. Unfortunately, that method didn't scale to searching for primes in the millions. In this article, we will try a smarter algorithm and implement it in C instead of Python.
Switching to C will provide a performance boost. Since Python is an interpreted language, the interpreter causes some overhead that compiled languages like C do not have.
There is also a much smarter algorithm we can use. It's highly inefficient to calculate whether each number is prime separately. We can save the results of previous tests and use them to reduce the amount of numbers that must be checked. The algorithm we will use is called the Sieve of Eratosthenes.
The Sieve of Eratosthenes is conceptually very simple. We start with a table of bits, with a length equal to the highest number we want to search (in this case, 20,000). A one at a particular location of the table means that the number is composite. We first go through the table and set each multiple of 2 as composite. Then, it takes the next number not yet marked (3), and mark its multiples as composite. The process repeats with all numbers not yet marked as composite until it gets to the square root of the length of the table. Then, it simply goes through the table again. All the numbers not marked as composite are considered prime.
C version: 0.002 seconds
Let's give it more of a challenge. How about searching up to 20 million!
C version, search to 20 million: 2.771 seconds
We have now improved the prime finding algorithm significantly. Remember, the first algorithm took 7.6 seconds to search up to 20,000.
20000 / 7.649 seconds = 2614 per second 20000000 / 2.771 = 7217611 per second = 2800 times faster
A little optimization goes a long way!
We still however, have a problem. Even if we optimized this code to use 1 bit of memory per number tested, this algorithm's memory consumption is simply too large. The largest known prime number is 9,808,358 digits long. Even if such a large prime could be calculated by a single computer, 9,808,358 bits is approximately 101808345 terabytes. That's a little bit more memory than my computer has! In part 3 of this series, we will look into probabilistic primality tests.
You can download the code from this tutorial below: