How to use ThreadPoolExecutor in Python

Optimize your code using thread pools.

If a Python program is heavy on the I/O side, running it in a sequential/synchronous pattern can take a lot of time, and the execution time here can be reduced many folds using threading.

In this article, we are going to talk about Python's ThreadPoolExecutor to execute function instances in threads.

About ThreadPoolExecutor¶

A normal Python program runs as a single process and a single thread but sometimes using multiple threads can bring lots of performance improvements.

Creating new threads and managing them can be daunting, thankfully there are a few solutions available.

The concurrent Python module is a part of the standard library collection. ThreadPoolExecutor provides an interface that abstracts thread management from users and provides a simple API to use a pool of worker threads. It can create threads as and when needed and assign tasks to them.

In I/O bound tasks like web scraping, while an HTTP request is waiting for the response, another thread can be spawned to continue scraping other URLs.

Submitting multiple tasks with `map()`¶

map(func, *iterables, timeout=None, chunksize=1)

func is executed asynchronously and several calls to func may be made concurrently.

Let's look at an example:

from concurrent.futures import ThreadPoolExecutor

urls = ["python-engineer.com",
        "twitter.com",
        "youtube.com"]

def scrape_site(url):
    res = f'{url} was scraped!'
    return res

pool = ThreadPoolExecutor(max_workers=8)

results = pool.map(scrape_site, urls) # does not block

for res in results:
    print(res) # print results as they become available

pool.shutdown()

First, create an instance of ThreadPoolExecutor. Next, we have to declare the number of worker threads. The default value of max_workers is min(32, os.cpu_count() + 4).

The map() method is used to assign tasks to worker threads. This action is non-blocking. It returns an iterable immediately, which on iteration returns the output of the target function, blocking the interpreter process. The results are available in the order that the tasks were submitted.

Finally, call shutdown() to signal the executor that it should free any resources that it is using when the currently pending futures are done executing.

The above code outputs the following:

python-engineer.com was scraped!
twitter.com was scraped!
youtube.com was scraped!

Submitting a single task with `submit()`¶

submit(fn, /, *args, **kwargs)

Schedules the callable, fn, to be executed as fn(*args, **kwargs) and returns a Future object representing the execution of the callable.

Let's look at an example:

from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor(max_workers=8)

future = pool.submit(my_task, argument) # does not block

value = future.result() # blocks

print(value)

pool.shutdown()

The submit() method is used to submit a task in the thread pool. This action is non-blocking. To get the actual result, use the result() method. This method is blocking.

Use ThreadPoolExecutor as context manager¶

The recommended way to use a ThreadPoolExecuter is as a context manager. This way shutdown() will be called automatically when the block has completed.

with ThreadPoolExecutor(max_workers=1) as pool:
    future = pool.submit(pow, 2, 15)
    print(future.result())

FREE VS Code / PyCharm Extensions I Use

✅ Write cleaner code with Sourcery, instant refactoring suggestions: Link*

Python Problem-Solving Bootcamp

🚀 Solve 42 programming puzzles over the course of 21 days: Link*

* These are affiliate link. By clicking on it you will not have any additional costs. Instead, you will support my project. Thank you! 🙏