Python Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety. GIL was a simple solution for memory management in Python and it has little impact on I/O-bound multi-threaded tasks. However, when it comes to CPU-bound multi-threaded tasks, GIL could become a bottleneck and the program will behave like a single-threaded program.
In Python (CPython) 3.13, a new option --disable-gil was introduced to build Python without GIL. In this blog post, we will discuss how to build Python without GIL and the performance impact of disabling GIL.
Docker Official Python No-GIL Images
According to this issue for Docker, all the Docker official images for Python 3.13 was built with GIL. So we will have to build Python without GIL from the source code by ourselves.
Build Python With and Without GIL
The Docker official Python image Dockerfiles for Debian were used for building Python with and without the build configuration option --disable-gil. In addition, for more accurate performance comparisons, the build configuration option --enable-optimizations is also used.
We will run the Python script using the Docker images for Python that were built with and without GIL. We will also use top to monitor the CPU usage of the process (top should be used instead of htop).
When GIL is enabled, the process consumes 100% of the CPU, suggesting the presence of GIL. When GIL is disabled, each PID consumes 400% of the CPU, suggesting the GIL has been disabled.
Factorial
The factorial example demonstrates a CPU-bound task. This time we will quantitatively measure the performance of the CPU-bound task.
threads = [] results = queue.Queue() chunk_size = num // num_threads for i inrange(num_threads): start = i * chunk_size + 1 end = num + 1if i == num_threads - 1else (i + 1) * chunk_size + 1 thread = threading.Thread( target=lambda s, e: results.put(compute_partial_factorial(s, e)), args=(start, end)) thread.start() threads.append(thread) for thread in threads: thread.join() total_factorial = 1 whilenot results.empty(): total_factorial *= results.get() return total_factorial
defmain():
num_repeats = 20 num = 100000 factorial_ground_truth = math.factorial(num) num_threads = 4 factorial_multithread = compute_factorial_multithread(num, num_threads) assert factorial_ground_truth == factorial_multithread # Time the factorial computation start_time = time.time() for _ inrange(num_repeats): factorial_multithread = compute_factorial_multithread(num, num_threads) end_time = time.time() print( f"Average Time Elapsed: {(end_time - start_time) / num_repeats:.2f}s")
if __name__ == "__main__":
main()
We will run the Python script using the Docker images for Python that were built with and without GIL.
1 2 3 4 5 6
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-gil-slim-bullseye python factorial.py Average Time Elapsed: 0.88s $ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-nogil-slim-bullseye python factorial.py Average Time Elapsed: 0.80s $ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=0 python:3.13.0-nogil-slim-bullseye python factorial.py Average Time Elapsed: 0.22s
We could see that building Python without GIL and disabling GIL significantly improves the performance of the CPU-bound task.
Reduce Sum
The reduce sum example demonstrates a I/O-bound task. It’s IO bound because we have to constantly read from the memory for data. We will also quantitatively measure the performance of the I/O-bound task.
sum = 0 for i inrange(start, end): sum += arr[i] returnsum
defcompute_sum_multithread(arr, num_threads):
n = len(arr) chunk_size = n // num_threads threads = [] results = queue.Queue() for i inrange(num_threads): start = i * chunk_size end = n if i == num_threads - 1else (i + 1) * chunk_size thread = threading.Thread( target=lambda s, e: results.put(compute_sum(arr, s, e)), args=(start, end)) thread.start() threads.append(thread) for thread in threads: thread.join() sum = 0 whilenot results.empty(): sum += results.get() returnsum
We will run the Python script using the Docker images for Python that were built with and without GIL.
1 2 3 4 5 6
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-gil-slim-bullseye python reduce_sum.py Average Time Elapsed: 0.45s $ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-nogil-slim-bullseye python reduce_sum.py Average Time Elapsed: 0.72s $ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=0 python:3.13.0-nogil-slim-bullseye python reduce_sum.py Average Time Elapsed: 0.88s
We could see that building Python without GIL and disabling GIL does not necessarily improve the performance of the I/O-bound task.
According to the criteria from “It isn’t Easy to Remove the GIL” by Guido van Rossum, the creator of Python, “I’d welcome it if someone did another experiment along the lines of Greg’s patch (which I haven’t found online), and I’d welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease.”
Apparently, disabling GIL for this particular I/O-bound task certainly does not meet the criteria. This is probably also why disabling GIL is not the default option in Python 3.13, and probably will still not be the default option for a very long time in the future.