Python Multithreading - How to Handle Concurrent Tasks

When I first started writing Python scripts that interacted with networks or files, I noticed a common bottleneck: my programs spent a lot of time just waiting. Whether it was for an API to respond or a large file to be read from a disk, the application would just sit there, blocked. This is a classic I/O-bound problem, and the solution is concurrency. In Python, the most common way to handle this is with the threading module.

Table of Contents

1.1 The Global Interpreter Lock (GIL): A Crucial Limitation
1.2 The Modern Way: ThreadPoolExecutor
1.3 Race Conditions and The Need for Locks
1.3.1 Solution: threading.Lock
1.4 Other Essential Synchronization Tools
1.5 Daemon Threads and Thread-Local Data
1.5.1 Daemon Threads
1.5.2 Thread-Local Data
1.6 Conclusion
1.7 More Topics

Multithreading allows your program to manage multiple tasks at once, significantly speeding up applications that are limited by I/O operations. However, it comes with its own set of rules and challenges, like the Global Interpreter Lock (GIL) and the risk of race conditions. In this guide, I’ll walk you through how I use threads effectively, from the modern ThreadPoolExecutor to the essential synchronization tools that keep your code safe.

The Global Interpreter Lock (GIL): A Crucial Limitation

Before we dive in, we have to talk about the Global Interpreter Lock, or GIL. The GIL is a mutex in the standard CPython interpreter that allows only one thread to execute Python bytecode at a time.

What does this mean in practice?

For I/O-Bound Tasks, It’s Fine: When a thread is waiting for an I/O operation (like a network request), it releases the GIL, allowing another thread to run. This creates concurrency and is why threading is so effective for these tasks.
For CPU-Bound Tasks, It’s a Bottleneck: If your task is performing heavy calculations (e.g., crunching numbers), threads won’t help. Since only one thread can execute Python code at a time, you won’t get true parallelism on multi-core processors. In fact, the overhead of managing the threads can even make your code slower. For CPU-bound work, you should use the multiprocessing module instead.

The Modern Way: `ThreadPoolExecutor`

While you can manage threads manually using the threading.Thread class, I’ve found that the ThreadPoolExecutor from the concurrent.futures module is a much cleaner and safer approach. It manages a pool of worker threads for you, so you can just submit tasks without worrying about the lifecycle of each thread.

I almost always use it as a context manager (with), which ensures the pool is properly shut down when I’m done.

Python

import concurrent.futures
import requests

URLS = ['https://example.org/1', 'https://example.org/2', 'https://example.org/3']

def fetch_url(url):
    """A simple function to fetch a URL."""
    print(f"Fetching {url}...")
    response = requests.get(url)
    print(f"Fetched {url} with status {response.status_code}")
    return response.status_code

# Use ThreadPoolExecutor to manage a pool of worker threads
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    # The map method applies the function to each item in the list concurrently
    results = executor.map(fetch_url, URLS)

print("All requests complete.")

In this example, the executor.map method runs fetch_url for each URL in the list concurrently. The three requests happen at roughly the same time, dramatically speeding up the total execution time compared to running them sequentially.

Race Conditions and The Need for Locks

The biggest danger in multithreading comes from race conditions. This happens when multiple threads try to access and modify the same shared data at the same time, leading to unpredictable and incorrect results.

Here’s a classic example of a race condition. Imagine two threads trying to withdraw money from the same bank account:

Python

import threading

balance = 1000

def withdraw(amount):
    global balance
    # Read the current balance
    current_balance = balance
    # Check if there are enough funds
    if current_balance >= amount:
        # Update the balance
        balance = current_balance - amount

# This will likely result in an incorrect final balance
thread1 = threading.Thread(target=withdraw, args=(700,))
thread2 = threading.Thread(target=withdraw, args=(700,))

thread1.start()
thread2.start()
thread1.join()
thread2.join()

print(f"Final balance: {balance}") # Could be 300, or even -400!

The problem is that a thread can be paused by the OS at any point. One thread might read the balance, get paused, and then the second thread reads the same initial balance before the first one has a chance to update it.

Solution: `threading.Lock`

To solve this, we use a lock. A lock is a synchronization primitive that ensures only one thread can execute a critical section of code at a time. The best practice is to use a lock as a context manager with a with statement, which guarantees the lock is always released, even if an error occurs.

Here’s the fixed withdraw function:

Python

lock = threading.Lock()

def safe_withdraw(amount):
    global balance
    with lock:
        # This block can only be executed by one thread at a time
        current_balance = balance
        if current_balance >= amount:
            balance = current_balance - amount

By wrapping the critical section in with lock:, we make the operation “atomic” and prevent the race condition.

Other Essential Synchronization Tools

While Lock is the most common tool I use, the threading module provides others for more complex scenarios.

RLock (Re-entrant Lock): A special lock that can be acquired multiple times by the same thread. This is useful in recursive functions or complex methods where one function that holds a lock needs to call another function that acquires the same lock.
Semaphore: A counter that limits how many threads can access a resource at once. I use this when I need to throttle access to a service, like a database connection pool or an API with a rate limit.
Event: A simple way for one thread to signal one or more other threads. One thread can wait() for an event, and another can set() it, allowing all waiting threads to proceed. I find this useful for coordinating startup sequences.
Condition: A more advanced tool that combines a lock with the ability for threads to wait() for a specific condition to become true and be notify()-ed by another thread when it happens. This is often used in producer-consumer scenarios.

Daemon Threads and Thread-Local Data

Daemon Threads

Sometimes you have a background task, like logging or a health check, that should not prevent your main program from exiting. For this, I use daemon threads. By setting thread.daemon = True, you tell Python that the program can exit even if this thread is still running.

Thread-Local Data

While threads share memory, there are times when you need data to be specific to each thread. threading.local() creates a thread-local storage object. Any attribute you set on this object is only visible to the current thread, preventing data from leaking between threads. This is commonly used by ORMs like Peewee to manage per-thread database connections.

Conclusion

Multithreading in Python is a powerful tool for speeding up I/O-bound applications. While the GIL prevents it from being a solution for CPU-intensive tasks, it excels at making programs that wait on networks or files feel much faster.

My advice is to always start with ThreadPoolExecutor, as it provides a modern and safe API. And most importantly, whenever you have multiple threads accessing shared data, you must use synchronization primitives like threading.Lock to prevent race conditions and ensure your application is thread-safe.

Your Daily Dose of News, Insights, Gaming Guides, and Global Exploration.

Python Multithreading – How to Handle Concurrent Tasks

The Global Interpreter Lock (GIL): A Crucial Limitation

The Modern Way: `ThreadPoolExecutor`

Race Conditions and The Need for Locks

Solution: `threading.Lock`

Other Essential Synchronization Tools

Daemon Threads and Thread-Local Data

Daemon Threads

Thread-Local Data

Conclusion

More Topics

Yaman Şener

Leave a Reply Cancel reply

The Global Interpreter Lock (GIL): A Crucial Limitation

The Modern Way: ThreadPoolExecutor

Race Conditions and The Need for Locks

Solution: threading.Lock

Other Essential Synchronization Tools

Daemon Threads and Thread-Local Data

Daemon Threads

Thread-Local Data

Conclusion

More Topics

Yaman Şener

Leave a Reply Cancel reply

The Modern Way: `ThreadPoolExecutor`

Solution: `threading.Lock`