Python GIL

Introduction

The Python Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at the same time. The GIL is a “controversial” topic in the Python community because it makes Python effectively single-threaded and not suitable for CPU-bound tasks that require multiple threads to run in parallel.

In this blog post, I would like to discuss the history of the Python GIL, why it was introduced, and the possible future of the Python GIL.

Python Automatic Reference Counting and GIL

Python, more specifically the standard reference implementation CPython, uses automatic reference counting to manage the lifetime of objects. Similarly in C++, std::shared_ptr is a smart pointer that uses automatic reference counting to track the number of std::shared_ptr instances pointing to the object. When the reference count drops to zero, the object is automatically deallocated.

When Python was first created in 1991, CPUs did not have the concept of threads. So there would be no situations where multiple threads will try to modify the reference count of an object at the same time.

Later, CPUs started to have multiple threads. When an object is referenced in multiple threads, the reference count of the object will be modified by multiple threads at the same time, resulting in a racing condition. This became not thread-safe and Python needed a solution to this problem.

There were multiple solution candidates at that time, including:

  1. Create a lock for the reference count of each object.
  2. Create a global interpreter lock (GIL).

Note that using atomic operations to modify the reference count of the object was not an option at that time because atomic operations were not available for the CPUs.

Create a Lock for Each Individual Reference Count

Creating a lock for each individual reference count of each object seems to be straightforward to implement and it did solve the problem of the racing condition that the reference count of an object is modified by multiple threads at the same time. However, there are more complicated racing conditions from Python APIs in the context of multiple threads, such as modifying the reference counts of two different objects at the same time.

For example, in the following example, the reference count of the object of integer 1 and the object of integer 2 are modified at the same time when a = b is executed.

1
2
3
a = 1
b = 2
a = b

If there is a lock for each individual reference count, in the context of multiple threads, it could result in a deadlock.

Suppose for the assignment operation a = b, the implementation will acquire the lock for the object that a references first, then acquire the lock for the object that b references. If two threads are executing the assignment operation a = b and b = a at the same time, it could result in a deadlock. While modern programming languages do have APIs to acquire multiple locks at the same time to avoid deadlocks, such as C++ std::lock, I am not sure if that was available or possible at that time. Even if such APIs were available, given how frequently the reference count of an object is modified and how many objects are created and destroyed in Python, the overhead of acquiring and releasing locks for each individual reference count would be very high.

Therefore, creating a lock for each individual reference count was not a good solution for Python.

Global Interpreter Lock (GIL)

Instead of having a lock for each individual reference count, Python created a global interpreter lock (GIL) so that only one thread can execute Python bytecode at a time. The GIL was easy to implement and it effectively eliminated the racing conditions and make Python APIs thread-safe.

Therefore, GIL was adopted as the solution to the racing condition of the reference count of an object in Python. I think we could also say, having a GIL was due to Python uses automatic reference counting to manage the lifetime of objects.

The downside of having a GIL is that it makes Python effectively single-threaded. Even if you have multiple threads in Python, only one thread can execute Python bytecode at a time. This is why Python is not suitable for CPU-bound tasks that require multiple threads to run in parallel. For I/O-bound tasks, Python is still a good choice because the GIL is released when the thread is waiting for I/O operations to complete.

Removing GIL

Getting rid of the GIL has been mentioned in the Python community from time to time, because inevitably there could be CPU-bound tasks that the effectively single-threaded Python cannot handle and Python multi-processing cannot handle optimally.

To remove the GIL, we typically have two options:

  1. Change the Python memory management from automatic reference counting to garbage collection.
  2. Change the Python thread-safety mechanism for reference counts from GIL to something else.

Garbage Collection

If Python uses garbage collection to manage the lifetime of objects, instead of automatic reference counting. There is no need to keep the reference count of an object and the racing condition of modifying the reference count of an object by multiple threads at the same time is eliminated. Therefore, the GIL is not needed.

In fact, there exists Python implementations that use garbage collection to manage the lifetime of objects, such as Jython which is an implementation of Python that runs on the Java Virtual Machine (JVM).

One of the drawbacks of Python implementations that use garbage collection for memory management is that they are usually slower than CPython. To make it faster, often the time it has to introduce additional optimization techniques, such as Just-In-Time (JIT) compilation. But this might not be the most critical drawback.

Python has become so popular because it has a lot of C extensions. The libraries that the C extensions use, even if they are not thread-safe, can still be used in Python in a thread-safe way because of the GIL. The Python C extension implementations are relatively simple to create because it is straightforward to acquire and release the GIL. If Python uses garbage collection for memory management, the existing C extensions have to be significantly re-implemented to be thread-safe.

Alternative Thread-Safety Mechanism for Reference Counts

There are alternative thread-safety mechanisms for reference counts that can be used instead of GIL, such as the relatively “new” atomic operations I mentioned earlier. It does work and the existing C extensions do not have to be modified significantly. However, based on the profiling, Python becomes 30% slower.

According to the criteria from “It isn’t Easy to Remove the GIL” by Guido van Rossum, the creator of Python, “I’d welcome it if someone did another experiment along the lines of Greg’s patch (which I haven’t found online), and I’d welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease.” So far, this solution has not been adopted to the Python standard reference implementation.

The Future of GIL

In my own opinion, GIL was and is still the correct solution for Python. In spite of some drawbacks it brings to the CPU-bound tasks, which should not really be performed by Python, it’s the simplest and most performant solution for Python to be thread-safe. If the Python community does not want to lower the performance criteria for removing the GIL, probably GIL will never be officially and completely removed from the Python standard reference implementation.

References

Author

Lei Mao

Posted on

09-30-2024

Updated on

09-30-2024

Licensed under


Comments