Build and Develop PyTorch
Introduction
If we would like to contribute code to PyTorch, we will have to build the PyTorch main branch code from source, apply our changes, and pass all the unit tests. Although we can use the PyTorch CI on GitHub when we create a pull request, the unit tests that CI runs is large and complete, which will take a very long time to finish. So building, developing, and testing locally are more preferable.
In this blog post, I would like to discuss how to build, develop, and contribute code to PyTorch using a Docker container.
PyTorch Development Docker Container
Create Dockerfile
The major component of the Docker container for PyTorch development is CMake.
1 | FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 |
Build Docker Image
We could build the Docker image using different CMake versions.
1 | $ CMAKE_VERSION=3.25.1 |
Build PyTorch
Clone PyTorch Repositories
The PyTorch repository has to be cloned from GitHub. The torchvision
and torchaudio
libraries are optional and sometimes necessary for unit testing.
1 | $ git clone --recursive https://github.com/pytorch/pytorch.git |
Run Docker Contaienr
Mount the working directory that contains the PyTorch repository to Docker container.
1 | $ docker run -it --rm --gpus all -v $(pwd):/mnt -w /mnt torch-build:0.0.1 |
Build PyTorch for Development
Build PyTorch from source in development mode.
1 | $ cd /mnt/pytorch |
With python setup.py develop
, when we change the PyTorch files, we don’t have to rebuild the entire PyTorch library.
When MAX_JOBS
is not set or MAX_JOBS
is too large, it’s likely that we will encounter the following error during the PyTorch build and the error is extremely confusing and misleading.
1 | internal compiler error: Segmentation fault |
If the build failed because of this, we can kill the build and re-run the build command. The files that have already been built were cached and we will not have to start from the beginning again.
Although I have not investigated this, I suspect that it is a compiler implementation bug. When the compile is executed in multi-thread, the memory allocation happened in each thread, somehow some memory allocations were not successful and null pointers were returned. However, those null pointers were directly used without checking whether they were valid or not. Consequently, segmentation fault happened.
In my case, I have an Intel Core i9-9900K and 32 GB memory. This error will be very likely to happen when MAX_JOBS
is large, say 16, and when I was doing some other stuff on the computer, such as watching videos, while I was also building PyTorch. The chance of encountering this error will be reduced if MAX_JOBS
is small or a computer that has much larger memory, say 128 GB, is used.
Run Unit Test
Install Dependencies
Some unit tests require torchvision
and torchaudio
. So we will build torchvision
and torchaudio
if necessary.
1 | $ cd /mnt/vision |
1 | $ cd /mnt/audio |
There might be some other Python dependencies required for unit tests. But they should be easy to install via pip
.
Run Unit Test for Development
To run unit tests, simply go to pytorch/test
, and run the selected Python unit test files. Sometimes, if the unit test file is too large, we could either comment out some of the unit tests or use pytest
to run unit tests more selectively.
Run Code Formatting
PyTorch uses lintrunner
for code formatting. Following the PyTorch official lintrunner instruction to install and setup. Before submitting the code to PyTorch for pull request, run lintrunner
using
1 | $ cd /mnt/pytorch |
Known Issues
Host System Frozen
When building PyTorch, the host system might be frozen. This is because the Docker container is using all the CPU cores and memory. To avoid this, we could limit the number of CPU cores and memory that the Docker container can use.
We can limit the memory usage of the Docker container by adding the --memory
option when running the Docker container. However, based on my experiences, the --memory
option was not respected on the Linux system. This means that even if we set the --memory
option, the Docker container will still use all the memory on the host system. When we use a large number of CPU threads to build PyTorch from source, especially when it’s building Flash-Attention CUDA kernels, the host system will run out of memory and be frozen, if the host system does not have sufficient memory. In my own case, I have 32 GB memory and it’s still not enough to avoid the host system being frozen if I use 8 CPU threads to build PyTorch from source.
References
Build and Develop PyTorch