Calculating prime numbers

This is just a simple benchmark tool to compare algorithms, programming languages and CPUs from microcomputers to workstations. I've been using this calculation since 1989. It is still usefull for microcontrollers like the T-Display with an esp32s2:

The code is only 14 lines short:

import math, time                    # v2.0.2022 in Python
last, found = 1000000, 4             # we start from 11, know already 2, 3, 5, 7
print(f"Calculating prime numbers to {last}.")
start = time.monotonic()
for number in range(11, last, 2):
    prime = True
    for divider in range(3, int(math.sqrt(number))+1, 2):
        if number % divider == 0:
            prime = False
            break
    if prime:
        found += 1
end = time.monotonic()
print(f"This took: {(end - start)} seconds. I found {found} prime numbers.")

Increasing the speed - four parameters determine the runtime of the calculation

The runtime depends on four variables. Multicore can be considered under algorithm and is represented in v4.0 and c6.0:

Used Algorithm (e.g. 34x for v0.9 to v5.0 in Python)
Used programming language (c. 16x vor v5.0 from Python to C)
CPU running the algorithm (c. 2.1x for v5.0 Python from i5-2520M 3.2GHz to i3-10100 4.2 GHz)
Used range to investigate (e.g. 23x longer for 10x larger range 100 million instead of 10 million in Python v5.0)

Better and faster algorithms

A faster way to calculate the prime numbers was already implemented 1991 in Omicron Basic on a Atari ST. By 2024 I collected 8 improvements to the algorithm.

v0.8.1991 compare result of division to division with truncated decimal places
v0.9.1991 use only divisors until the squareroot of the investigated number
v1.0.1991 start with 11 in steps of 2 and use odd dividers from 3 on
v1.1.2000 store result of the division, then check for decimal places
v2.0.2022 use modulo operator for comparison
v3.0.2023 put check for prime into a function
v4.0.2023 write code to execute it in parallel
v5.0.2023 calculate the prime numbers to the square root of the largest number, use these prime numbers as divisors for the remaining numbers
v6.0.2024 make v5.0 in parallel without problems in the racing conditions

Higher speed with other programming language

The same algorithm compiled in C for the prime numbers to 1,000,000 need just 0.049 seconds while Python3 needs 2.26 seconds to interpret the code and give the answer on my M1 Macbook.

With this simple example some 20x improvements can be seen. But the development does not stop, interpreters are getting faster and with special modules for special tasks a "slow" language might no longer be slow at all. While having the benefit of being easier to develop, prototype and maintain.

Faster with a different CPU

Interestingly the improvement in CPU performance for this specific task is not as large as the increase in the frequency these chips run. The range from 16MHz on the Atmel826P for the Arudino Uno to the 4200MHz for the i3-10100 brings a speed improvement of 262x. And 64bit instead of 8 bit can bring another 8x just from the data and instruction side. With long prediction pipelines the increase in IPC over the years is still significant.

Details to higher speed with a different programming language

language	prime numbers	factor
C	0.0498 s	51
Java	0.0663 s	34
Python3	2.2611 s	1
Swift
Javascript
Playgrounds (Swift)	7.0413 s	0.36

And the code from fast to slow:

C in 0.0498 seconds

#include <stdio.h>
#include <time.h>
#include <math.h>
#include <string.h>

int main() {
    int last = 1000000;
    int found = 4;   // we already know 2, 3, 5, 7
    const int arraylength = (int)(last / log(last));
    // printf("%d",arraylength);
    int primes[1000000] = {2, 3, 5, 7};
    clock_t start, end;
    double cpu_time_used;
    printf("Calculating prime numbers until %d\n", last);
    start = clock();
    for (int number = 11; number < last; number += 2) {
        int prime = 1;
        for (int divider = 3; divider < (int)(sqrt(number)) + 1; divider += 2) {
            if (number % divider == 0) {
                prime = 0;
                break;
            }
        }
        if (prime ==  1) {
            primes[found] = number;
            found += 1;
        }
    }
    end = clock();
    cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
    for (int i = 0; i < found - 1; i++) {
        printf("%d, ",primes[i]);
    }
    printf("\nFound %d prime numbers.\n", found);
    printf("This took %f seconds.",cpu_time_used);
}

Java with OpenJDK 17.0.6 in 0.066340542 seconds

class prime {
  public static void main(String[] args) {
    int last = 1000000;
    int found = 4;
    System.out.println("Prime numbers to " + last);
    System.out.print("2, 3, 5, 7, ");
    long start = System.nanoTime();
    for(int number = 11; number < last; number = number + 2) {
      boolean prime = true;
      for(int divider = 3; divider < (int) Math.sqrt(number) + 1; divider = divider + 2) {
        if(number % divider == 0) {
          prime = false;
          break;
        }
      }
      if(prime) {
        found++;
        // System.out.print(", " + number);
      }
    }
    long end = System.nanoTime();
    System.out.print("\nI found " + found + " prime numbers.");
    System.out.print("This took me " + ((end - start) / 1000000000.0) + " seconds.");
  }
}

Python 3.9.12 in 2.2611 seconds

import math, time
last = 1000000
found = 4             # we start from 11, know 2, 3, 5, 7
print(f"Prime numbers to {last}")
start = time.monotonic()
for number in range(11, last, 2):
    prime = True
    for divider in range(3, int(math.sqrt(number))+1, 2):
        if number % divider == 0:
            prime = False
            break
    if prime:
        found += 1
end = time.monotonic()
print(f"This took: {(end - start)} seconds.")
print(f"I found {found} prime numbers.")

Microcontroller MCUs

Today even microcontroller can drive color-tft displays, like here with a Raspberry Pi Pico 2040 on a 240px display with SP7789, running circuitpython:

The code is just a few lines:

import math, time, board, busio, terminalio, displayio
from adafruit_st7789 import ST7789
displayio.release_displays()
spi = busio.SPI(clock=board.GP10, MOSI=board.GP11)
tft_cs = board.GP9
tft_dc = board.GP8
display_bus = displayio.FourWire(
    spi, command=tft_dc, chip_select=tft_cs, reset=board.GP12
)
display = ST7789(display_bus, width=240, height=240, rowstart=80)
#splash = displayio.Group()
#display.show(splash)

last = 500
disp = True
found = 4             # we start from 11, know 2, 3, 5, 7
print(f"Prime numbers to {last}")
print("2, 3, 5, 7", end="")

start = time.monotonic()
for number in range(11, last, 2):
    prime = True
    for divider in range(3, int(math.sqrt(number))+1, 2):
        if number % divider == 0:
            prime = False
            break
    if prime:
        found += 1
        if disp:
            print(number, sep=', ', end=' ')
end = time.monotonic()
print(f"\nThis took: {(end - start)} seconds.")
print(f"I found {found} prime numbers.")

Prime on MCUs

up to	primes	MHz	80	240	125	240	160	240	3200
	#	micro:bit	esp8266	ESP32	rp2040	ESP32 S2	ESP32 C3	ESP32 S3	M1
1,000	168	0.694	0.182	0.036	0.043	0.139	0.027	0.016	0.000
10,000	1.229	13.517	3.360	0.626	0.797	0.769	0.487	0.259	0.008
100,000	9.592	318.459	71.008	12.766	17.028	10.278	10.037	5.607	0.111
1,000,000	78.498		599.490	294.419	396.301	201.873	232.850	130.837	2.599

And now in Arduino C

range	prime numbers	esp8266 80 MHz	ESP32 240 MHz	rp2040 125 MHz	ESP32-S2 240 MHz	ESP32-C3 160 MHz	ESP32-S3 240 MHz	M1 3200 MHz
1,000	168	0.04	0.01	0.03		0.02	0.01
10,000	1,229	0.9	0.24	0.66		0.43	0.22
100,000	9,592	21.54	5.65	15.82		10.16	5.20
1,000,000	78,498	541.01	140.47	395.94		252.78	129.44	0.054
1,000,000	78.498	599.490	294.419	396.301	201.873	232.850	130.837

Does my code work?

A quick check to see if the code works as intended is the number of primes it finds in a certain range:

range	prime numbers
10	4
100	25
1,000	168
10,000	1,229
100,000	9,592
1,000,000	78,498
10,000,000	664,579
100,000,000	5,761,455
1,000,000,000	50,847,534
2,147,483,647	105,097,564
4,294,967,295	203,280,221

Faster with Multithreading - 4.9x faster

This should work even with Python. For comparison we start with the calculation to 10 million and find 664,579 prime numbers. On my i3-10100 this takes 63 seconds. This could be improved by putting the check for a prime number into a function and the loop into the main function. We are down to 42.4 seconds (33% faster) with the following code:

import math, time

last = 10000000
found = 4             # we start from 11, know 2, 3, 5, 7

def is_prime(number):
    global found
    flag_prime = True
    for divider in range(3, int(math.sqrt(number))+1, 2):
        if number % divider == 0:
            flag_prime = False
            break
    if flag_prime:
        found += 1

if __name__ == "__main__":
    print(f"Prime numbers to {last}")
    start = time.perf_counter()
    for number in range(11, last, 2):
        is_prime(number)
    end = time.perf_counter()
    print(f"This took: {(end - start)} seconds.")
    print(f"I found {found} prime numbers. For 10M it should be 664,579.")

For multiprocessing we need to add a few lines, but the code is now executed in 12.87 seconds - 3.296x faster.

import math, time, multiprocessing

last = 10000000
found = 4             # we start from 11, know 2, 3, 5, 7
queue = multiprocessing.Queue()

def is_prime(number):
    flag_prime = True
    for divider in range(3, int(math.sqrt(number))+1, 2):
        if number % divider == 0:
            flag_prime = False
            break
    if flag_prime:
        return number
    else:
        return 0

if __name__ == "__main__":
    print(f"Prime numbers to {last}")
    start = time.perf_counter()
    print("Create pool")
    pool = multiprocessing.Pool()
    results = pool.map(is_prime, range(11, last, 2))
    pool.close()
    for nr in results:
        if nr > 0:
            found += 1
    end = time.perf_counter()
    print(f"This took: {(end - start)} seconds.")
    print(f"I found {found} prime numbers. For 10 million it should be 664,579.")

And finally: For the original 1 million it takes 0.702 seconds. Not so far from the 0.128 seconds in C (5.5x slower). Which makes we wonder: How fast would it be in Rust or in C in parallel?

Even faster with the Sieve of Eratosthenes?

This algorithm is known and documented for almost 2000 years. How much faster would it be than the brute force method above? Time to find out.

One limitation could be the required memory for the primes. Let's say I want to know the primes until 2E32 or 4,294,967,295. With the previous algorithm I would need 32bit x 203,280,221 primes = 813,120,884 byte. That's 775 MByte. Using the Sieve of Eratosthenes I could only include the odd numbers (I know the only even prime number - two) and just use one bit to represent if a number is prime (1) or not (0). Then each byte gives 8 numbers, and we need 4,294,967,295 / 2 / 8 = 268,435,456 bute = 256 MByte. It is actually less RAM demanding!

Later 2024 I will give it a try.

Name		Name	Last commit message	Last commit date
Latest commit History 271 Commits
.github/workflows		.github/workflows
.vscode		.vscode
arduino		arduino
c		c
circuitpython		circuitpython
cpp		cpp
docs		docs
java		java
javascript		javascript
micropython		micropython
python		python
rust		rust
swift		swift
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Calculating prime numbers

Increasing the speed - four parameters determine the runtime of the calculation

Better and faster algorithms

Higher speed with other programming language

Faster with a different CPU

Details to higher speed with a different programming language

C in 0.0498 seconds

Java with OpenJDK 17.0.6 in 0.066340542 seconds

Python 3.9.12 in 2.2611 seconds

Microcontroller MCUs

Prime on MCUs

Does my code work?

Faster with Multithreading - 4.9x faster

Even faster with the Sieve of Eratosthenes?

About

Releases 5

Packages

Languages

License

kreier/prime

Folders and files

Latest commit

History

Repository files navigation

Calculating prime numbers

Increasing the speed - four parameters determine the runtime of the calculation

Better and faster algorithms

Higher speed with other programming language

Faster with a different CPU

Details to higher speed with a different programming language

C in 0.0498 seconds

Java with OpenJDK 17.0.6 in 0.066340542 seconds

Python 3.9.12 in 2.2611 seconds

Microcontroller MCUs

Prime on MCUs

Does my code work?

Faster with Multithreading - 4.9x faster

Even faster with the Sieve of Eratosthenes?

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages