Thread Pool and the Best Sorting Algorithm

The multiprocessing.dummy module is a great tool to manage multithread tasks in python. It replicates the API of multiprocessing but is actually a wrapper around the threading module. I would create a chart to sum things up.

chart

Thread pool

With multiprocessing.dummy we can create a thread pool. For example we can build a multithread web crawler:

from multiprocessing.dummy import Pool as ThreadPool

urls = [
    'https://github.com', 
    'https://www.kaggle.com'
]

def fun(i):
    """
    do something.
    """

# set the number of threads
pool = ThreadPool(processes=len(url))

# map the function
result = pool.map(fun, urls)

# manage the threads post mortem
pool.close()
pool.join()

Sleepsort

Sleepsort is a joke sorting algorithm that gained popularity on the internet.. We can implement the algoirithm with multiprocessing.dummy module.

from multiprocessing.dummy import Pool as ThreadPool
from time import sleep

def sleepsort(array):
    """
    sleepsort.
    """
    sorted_list = []
    interval = 0.001

    def task(n):
        sleep(n * interval)
        sorted_list.append(n)
    
    pool = ThreadPool(len(array))
    pool.map(task, array)
    pool.close()
    pool.join()
    return sorted_list

Accuracy

The algorithm relies heavily on the machine itself. Let’s see if it always works properly.

attempts = 100
inacc = 0

for i in range(attempts):
    sl = sleep_sort([5, 1, 3, 2, 4])
    if sl != sorted(sl):
        inacc += 1

print('Sleepsort is inaccurate {} times out of {} attempts.'.format(inacc, attempts))

And unfortunately…

Out[ ]: Sleepsort is inaccurate 7 times out of 100 attempts.

Let’s make some plots to see how accuracy shifts with array length and sleep intervals.

chart

chart

We can see the accuracy skyrockets as intervals become larger or array size decreases. The results support the logic that multiprocessing.dummy simply creates dummy threads instead of opening real CPU threads.

If we want to sort a lengthy array while giving a trivial sleeping time, those elements at the top of thread assignment queue will wake up from sleeping even before the others getting throught the queue.

chart

Surprisingly, the order of the given array has trivial impact on accuracy. It implies the threads were not allocated in the order of array index. (If so, an already sorted array will result in 100% accuracy no matter what.)

Reference