Python No-GIL

10-07-202410-07-2024 blog 16 minutes read (About 2438 words) visits

Introduction

Python Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety. GIL was a simple solution for memory management in Python and it has little impact on I/O-bound multi-threaded tasks. However, when it comes to CPU-bound multi-threaded tasks, GIL could become a bottleneck and the program will behave like a single-threaded program.

In Python (CPython) 3.13, a new option --disable-gil was introduced to build Python without GIL. In this blog post, we will discuss how to build Python without GIL and the performance impact of disabling GIL.

Docker Official Python No-GIL Images

According to this issue for Docker, all the Docker official images for Python 3.13 was built with GIL. So we will have to build Python without GIL from the source code by ourselves.

Build Python With and Without GIL

The Docker official Python image Dockerfiles for Debian were used for building Python with and without the build configuration option --disable-gil. In addition, for more accurate performance comparisons, the build configuration option --enable-optimizations is also used.

Dockerfile for Python With GIL

The Dockerfile for building Python with GIL is exactly the same as the Docker official Python image Dockerfiles for Debian.

gil.Dockerfile

#
# NOTE: THIS DOCKERFILE IS GENERATED VIA "apply-templates.sh"
#
# PLEASE DO NOT EDIT IT DIRECTLY.
#

FROM debian:bullseye-slim

# ensure local python is preferred over distribution python
ENV PATH /usr/local/bin:$PATH

# runtime dependencies
RUN set -eux; \
	apt-get update; \
	apt-get install -y --no-install-recommends \
		ca-certificates \
		netbase \
		tzdata \
	; \
	rm -rf /var/lib/apt/lists/*

ENV GPG_KEY 7169605F62C751356D054A26A821E680E5FA6305
ENV PYTHON_VERSION 3.13.0

RUN set -eux; \
	\
	savedAptMark="$(apt-mark showmanual)"; \
	apt-get update; \
	apt-get install -y --no-install-recommends \
		dpkg-dev \
		gcc \
		gnupg \
		libbluetooth-dev \
		libbz2-dev \
		libc6-dev \
		libdb-dev \
		libexpat1-dev \
		libffi-dev \
		libgdbm-dev \
		liblzma-dev \
		libncursesw5-dev \
		libreadline-dev \
		libsqlite3-dev \
		libssl-dev \
		make \
		tk-dev \
		uuid-dev \
		wget \
		xz-utils \
		zlib1g-dev \
	; \
	\
	wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz"; \
	wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc"; \
	GNUPGHOME="$(mktemp -d)"; export GNUPGHOME; \
	gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$GPG_KEY"; \
	gpg --batch --verify python.tar.xz.asc python.tar.xz; \
	gpgconf --kill all; \
	rm -rf "$GNUPGHOME" python.tar.xz.asc; \
	mkdir -p /usr/src/python; \
	tar --extract --directory /usr/src/python --strip-components=1 --file python.tar.xz; \
	rm python.tar.xz; \
	\
	cd /usr/src/python; \
	gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)"; \
	./configure \
		--build="$gnuArch" \
		--enable-loadable-sqlite-extensions \
		--enable-optimizations \
		--enable-option-checking=fatal \
		--enable-shared \
		--with-lto \
		--with-system-expat \
		--with-ensurepip \
	; \
	nproc="$(nproc)"; \
	EXTRA_CFLAGS="$(dpkg-buildflags --get CFLAGS)"; \
	LDFLAGS="$(dpkg-buildflags --get LDFLAGS)"; \
	LDFLAGS="${LDFLAGS:--Wl},--strip-all"; \
	make -j "$nproc" \
		"EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" \
		"LDFLAGS=${LDFLAGS:-}" \
		"PROFILE_TASK=${PROFILE_TASK:-}" \
	; \
# https://github.com/docker-library/python/issues/784
# prevent accidental usage of a system installed libpython of the same version
	rm python; \
	make -j "$nproc" \
		"EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" \
		"LDFLAGS=${LDFLAGS:--Wl},-rpath='\$\$ORIGIN/../lib'" \
		"PROFILE_TASK=${PROFILE_TASK:-}" \
		python \
	; \
	make install; \
	\
	cd /; \
	rm -rf /usr/src/python; \
	\
	find /usr/local -depth \
		\( \
			\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
			-o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name 'libpython*.a' \) \) \
		\) -exec rm -rf '{}' + \
	; \
	\
	ldconfig; \
	\
	apt-mark auto '.*' > /dev/null; \
	apt-mark manual $savedAptMark; \
	find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';' \
		| awk '/=>/ { so = $(NF-1); if (index(so, "/usr/local/") == 1) { next }; gsub("^/(usr/)?", "", so); printf "*%s\n", so }' \
		| sort -u \
		| xargs -r dpkg-query --search \
		| cut -d: -f1 \
		| sort -u \
		| xargs -r apt-mark manual \
	; \
	apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \
	rm -rf /var/lib/apt/lists/*; \
	\
	export PYTHONDONTWRITEBYTECODE=1; \
	python3 --version; \
	pip3 --version

# make some useful symlinks that are expected to exist ("/usr/local/bin/python" and friends)
RUN set -eux; \
	for src in idle3 pip3 pydoc3 python3 python3-config; do \
		dst="$(echo "$src" | tr -d 3)"; \
		[ -s "/usr/local/bin/$src" ]; \
		[ ! -e "/usr/local/bin/$dst" ]; \
		ln -svT "$src" "/usr/local/bin/$dst"; \
	done

CMD ["python3"]

Dockerfile for Python Without GIL

The only difference between the Dockerfile for building Python with GIL and without GIL is the --disable-gil option in the ./configure command.

nogil.Dockerfile

#
# NOTE: THIS DOCKERFILE IS GENERATED VIA "apply-templates.sh"
#
# PLEASE DO NOT EDIT IT DIRECTLY.
#

FROM debian:bullseye-slim

# ensure local python is preferred over distribution python
ENV PATH /usr/local/bin:$PATH

# runtime dependencies
RUN set -eux; \
	apt-get update; \
	apt-get install -y --no-install-recommends \
		ca-certificates \
		netbase \
		tzdata \
	; \
	rm -rf /var/lib/apt/lists/*

ENV GPG_KEY 7169605F62C751356D054A26A821E680E5FA6305
ENV PYTHON_VERSION 3.13.0

RUN set -eux; \
	\
	savedAptMark="$(apt-mark showmanual)"; \
	apt-get update; \
	apt-get install -y --no-install-recommends \
		dpkg-dev \
		gcc \
		gnupg \
		libbluetooth-dev \
		libbz2-dev \
		libc6-dev \
		libdb-dev \
		libexpat1-dev \
		libffi-dev \
		libgdbm-dev \
		liblzma-dev \
		libncursesw5-dev \
		libreadline-dev \
		libsqlite3-dev \
		libssl-dev \
		make \
		tk-dev \
		uuid-dev \
		wget \
		xz-utils \
		zlib1g-dev \
	; \
	\
	wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz"; \
	wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc"; \
	GNUPGHOME="$(mktemp -d)"; export GNUPGHOME; \
	gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$GPG_KEY"; \
	gpg --batch --verify python.tar.xz.asc python.tar.xz; \
	gpgconf --kill all; \
	rm -rf "$GNUPGHOME" python.tar.xz.asc; \
	mkdir -p /usr/src/python; \
	tar --extract --directory /usr/src/python --strip-components=1 --file python.tar.xz; \
	rm python.tar.xz; \
	\
	cd /usr/src/python; \
	gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)"; \
	./configure \
		--build="$gnuArch" \
		--enable-loadable-sqlite-extensions \
		--enable-optimizations \
		--enable-option-checking=fatal \
		--enable-shared \
		--with-lto \
		--with-system-expat \
		--with-ensurepip \
		--disable-gil \
	; \
	nproc="$(nproc)"; \
	EXTRA_CFLAGS="$(dpkg-buildflags --get CFLAGS)"; \
	LDFLAGS="$(dpkg-buildflags --get LDFLAGS)"; \
	LDFLAGS="${LDFLAGS:--Wl},--strip-all"; \
	make -j "$nproc" \
		"EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" \
		"LDFLAGS=${LDFLAGS:-}" \
		"PROFILE_TASK=${PROFILE_TASK:-}" \
	; \
# https://github.com/docker-library/python/issues/784
# prevent accidental usage of a system installed libpython of the same version
	rm python; \
	make -j "$nproc" \
		"EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" \
		"LDFLAGS=${LDFLAGS:--Wl},-rpath='\$\$ORIGIN/../lib'" \
		"PROFILE_TASK=${PROFILE_TASK:-}" \
		python \
	; \
	make install; \
	\
	cd /; \
	rm -rf /usr/src/python; \
	\
	find /usr/local -depth \
		\( \
			\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
			-o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name 'libpython*.a' \) \) \
		\) -exec rm -rf '{}' + \
	; \
	\
	ldconfig; \
	\
	apt-mark auto '.*' > /dev/null; \
	apt-mark manual $savedAptMark; \
	find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';' \
		| awk '/=>/ { so = $(NF-1); if (index(so, "/usr/local/") == 1) { next }; gsub("^/(usr/)?", "", so); printf "*%s\n", so }' \
		| sort -u \
		| xargs -r dpkg-query --search \
		| cut -d: -f1 \
		| sort -u \
		| xargs -r apt-mark manual \
	; \
	apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \
	rm -rf /var/lib/apt/lists/*; \
	\
	export PYTHONDONTWRITEBYTECODE=1; \
	python3 --version; \
	pip3 --version

# make some useful symlinks that are expected to exist ("/usr/local/bin/python" and friends)
RUN set -eux; \
	for src in idle3 pip3 pydoc3 python3 python3-config; do \
		dst="$(echo "$src" | tr -d 3)"; \
		[ -s "/usr/local/bin/$src" ]; \
		[ ! -e "/usr/local/bin/$dst" ]; \
		ln -svT "$src" "/usr/local/bin/$dst"; \
	done

CMD ["python3"]

Build Docker Images

Building the Docker images for Python with and without GIL takes less than 10 minutes each on a machine with an Intel Core i9-9900K CPU.

1 2	$ docker build -f gil.Dockerfile --no-cache --tag python:3.13.0-gil-slim-bullseye . $ docker build -f nogil.Dockerfile --no-cache --tag python:3.13.0-nogil-slim-bullseye .

Examples

Disabling GIL is mainly beneficial for CPU-bound tasks. All the following examples were run on a machine with an Intel Core i9-9900K CPU.

Burn CPU

The following Python script that is complete CPU-bound was created to burn CPU.

burn_cpu.py

import threading


def burn_cpu():
    while True:
        pass


def main():

    num_threads = 4
    threads = []
    for _ in range(num_threads):
        thread = threading.Thread(target=burn_cpu)
        thread.start()
        threads.append(thread)

    for thread in threads:
        thread.join()


if __name__ == "__main__":

    main()

We will run the Python script using the Docker images for Python that were built with and without GIL. We will also use top to monitor the CPU usage of the process (top should be used instead of htop).

1
2
3

$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-gil-slim-bullseye python burn_cpu.py
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-nogil-slim-bullseye python burn_cpu.py
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=0 python:3.13.0-nogil-slim-bullseye python burn_cpu.py

When GIL is enabled, the process consumes 100% of the CPU, suggesting the presence of GIL. When GIL is disabled, each PID consumes 400% of the CPU, suggesting the GIL has been disabled.

Factorial

The factorial example demonstrates a CPU-bound task. This time we will quantitatively measure the performance of the CPU-bound task.

factorial.py

import math
import queue
import threading
import time


def compute_partial_factorial(start, end):

    partial_factorial = 1
    for i in range(start, end):
        partial_factorial *= i
    return partial_factorial


def compute_factorial_multithread(num, num_threads):

    threads = []
    results = queue.Queue()
    chunk_size = num // num_threads
    for i in range(num_threads):
        start = i * chunk_size + 1
        end = num + 1 if i == num_threads - 1 else (i + 1) * chunk_size + 1
        thread = threading.Thread(
            target=lambda s, e: results.put(compute_partial_factorial(s, e)),
            args=(start, end))
        thread.start()
        threads.append(thread)
    for thread in threads:
        thread.join()
    total_factorial = 1
    while not results.empty():
        total_factorial *= results.get()
    return total_factorial


def main():

    num_repeats = 20
    num = 100000
    factorial_ground_truth = math.factorial(num)
    num_threads = 4
    factorial_multithread = compute_factorial_multithread(num, num_threads)
    assert factorial_ground_truth == factorial_multithread
    # Time the factorial computation
    start_time = time.time()
    for _ in range(num_repeats):
        factorial_multithread = compute_factorial_multithread(num, num_threads)
    end_time = time.time()
    print(
        f"Average Time Elapsed: {(end_time - start_time) / num_repeats:.2f}s")


if __name__ == "__main__":

    main()

We will run the Python script using the Docker images for Python that were built with and without GIL.

$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-gil-slim-bullseye python factorial.py
Average Time Elapsed: 0.88s
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-nogil-slim-bullseye python factorial.py
Average Time Elapsed: 0.80s
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=0 python:3.13.0-nogil-slim-bullseye python factorial.py
Average Time Elapsed: 0.22s

We could see that building Python without GIL and disabling GIL significantly improves the performance of the CPU-bound task.

Reduce Sum

The reduce sum example demonstrates a I/O-bound task. It’s IO bound because we have to constantly read from the memory for data. We will also quantitatively measure the performance of the I/O-bound task.

reduce_sum.py

import queue
import threading
import time


def compute_sum(arr, start, end):

    sum = 0
    for i in range(start, end):
        sum += arr[i]
    return sum


def compute_sum_multithread(arr, num_threads):

    n = len(arr)
    chunk_size = n // num_threads
    threads = []
    results = queue.Queue()
    for i in range(num_threads):
        start = i * chunk_size
        end = n if i == num_threads - 1 else (i + 1) * chunk_size
        thread = threading.Thread(
            target=lambda s, e: results.put(compute_sum(arr, s, e)),
            args=(start, end))
        thread.start()
        threads.append(thread)
    for thread in threads:
        thread.join()
    sum = 0
    while not results.empty():
        sum += results.get()
    return sum


def main():

    num_repeats = 20
    num_elements = 10000000
    arr = list(range(1, num_elements + 1))
    sum_ground_truth = num_elements * (num_elements + 1) // 2
    num_threads = 4
    sum_multithread = compute_sum_multithread(arr, num_threads)
    assert sum_ground_truth == sum_multithread
    # Time the sum computation
    start_time = time.time()
    for _ in range(num_repeats):
        sum_multithread = compute_sum_multithread(arr, num_threads)
    end_time = time.time()
    print(
        f"Average Time Elapsed: {(end_time - start_time) / num_repeats:.2f}s")


if __name__ == "__main__":

    main()

We will run the Python script using the Docker images for Python that were built with and without GIL.

$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-gil-slim-bullseye python reduce_sum.py
Average Time Elapsed: 0.45s
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=1 python:3.13.0-nogil-slim-bullseye python reduce_sum.py
Average Time Elapsed: 0.72s
$ docker run -it --rm -v $(pwd):/tmp -w /tmp -e PYTHON_GIL=0 python:3.13.0-nogil-slim-bullseye python reduce_sum.py
Average Time Elapsed: 0.88s

We could see that building Python without GIL and disabling GIL does not necessarily improve the performance of the I/O-bound task.

According to the criteria from “It isn’t Easy to Remove the GIL” by Guido van Rossum, the creator of Python, “I’d welcome it if someone did another experiment along the lines of Greg’s patch (which I haven’t found online), and I’d welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease.”

Apparently, disabling GIL for this particular I/O-bound task certainly does not meet the criteria. This is probably also why disabling GIL is not the default option in Python 3.13, and probably will still not be the default option for a very long time in the future.

References

Python No-GIL

https://leimao.github.io/blog/Python-No-GIL/

Author

Lei Mao

Posted on

10-07-2024

Updated on

10-07-2024

Licensed under

Python,

Memory Management,

GIL,

Global Interpreter Lock

Python No-GIL

Introduction

Docker Official Python No-GIL Images

Build Python With and Without GIL

Dockerfile for Python With GIL

Dockerfile for Python Without GIL

Build Docker Images

Examples

Burn CPU

Factorial

Reduce Sum

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue