Build and Develop PyTorch

Introduction

If we would like to contribute code to PyTorch, we will have to build the PyTorch main branch code from source, apply our changes, and pass all the unit tests. Although we can use the PyTorch CI on GitHub when we create a pull request, the unit tests that CI runs is large and complete, which will take a very long time to finish. So building, developing, and testing locally are more preferable.

In this blog post, I would like to discuss how to build, develop, and contribute code to PyTorch using a Docker container.

PyTorch Development Docker Container

Create Dockerfile

The major component of the Docker container for PyTorch development is CMake.

torch-build.Dockerfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04

ARG CMAKE_VERSION=3.28.3
ARG NUM_JOBS=8

ENV DEBIAN_FRONTEND noninteractive

# Install package dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
software-properties-common \
autoconf \
automake \
libtool \
libssl-dev \
pkg-config \
ca-certificates \
wget \
git \
curl \
libjpeg-dev \
libpng-dev \
language-pack-en \
locales \
locales-all \
python3 \
python3-py \
python3-dev \
python3-pip \
python3-numpy \
python3-pytest \
python3-setuptools \
libprotobuf-dev \
protobuf-compiler \
zlib1g-dev \
swig \
vim \
gdb \
valgrind && \
apt-get clean

RUN cd /usr/local/bin && \
ln -s /usr/bin/python3 python && \
ln -s /usr/bin/pip3 pip && \
pip install --upgrade pip setuptools wheel

# System locale
# Important for UTF-8
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

# Install CMake
RUN cd /tmp && \
wget https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz && \
tar xzf cmake-${CMAKE_VERSION}.tar.gz && \
cd cmake-${CMAKE_VERSION} && \
./bootstrap && \
make -j${NUM_JOBS} && \
make install && \
rm -rf /tmp/*

RUN cd /tmp && \
wget https://raw.githubusercontent.com/pytorch/pytorch/master/requirements.txt && \
pip install -r requirements.txt

RUN pip install lintrunner

Build Docker Image

We could build the Docker image using different CMake versions.

1
2
$ CMAKE_VERSION=3.25.1
$ docker build -f torch-build.Dockerfile --build-arg CMAKE_VERSION=${CMAKE_VERSION} --tag torch-build:0.0.1 .

Build PyTorch

Clone PyTorch Repositories

The PyTorch repository has to be cloned from GitHub. The torchvision and torchaudio libraries are optional and sometimes necessary for unit testing.

1
2
3
$ git clone --recursive https://github.com/pytorch/pytorch.git
$ git clone --recursive https://github.com/pytorch/vision.git
$ git clone --recursive https://github.com/pytorch/audio.git

Run Docker Contaienr

Mount the working directory that contains the PyTorch repository to Docker container.

1
$ docker run -it --rm --gpus all -v $(pwd):/mnt -w /mnt torch-build:0.0.1

Build PyTorch for Development

Build PyTorch from source in development mode.

1
2
3
$ cd /mnt/pytorch
$ pip install -r requirements.txt
$ MAX_JOBS=4 DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=1 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 python setup.py develop

With python setup.py develop, when we change the PyTorch files, we don’t have to rebuild the entire PyTorch library.

When MAX_JOBS is not set or MAX_JOBS is too large, it’s likely that we will encounter the following error during the PyTorch build and the error is extremely confusing and misleading.

1
internal compiler error: Segmentation fault

If the build failed because of this, we can kill the build and re-run the build command. The files that have already been built were cached and we will not have to start from the beginning again.

Although I have not investigated this, I suspect that it is a compiler implementation bug. When the compile is executed in multi-thread, the memory allocation happened in each thread, somehow some memory allocations were not successful and null pointers were returned. However, those null pointers were directly used without checking whether they were valid or not. Consequently, segmentation fault happened.

In my case, I have an Intel Core i9-9900K and 32 GB memory. This error will be very likely to happen when MAX_JOBS is large, say 16, and when I was doing some other stuff on the computer, such as watching videos, while I was also building PyTorch. The chance of encountering this error will be reduced if MAX_JOBS is small or a computer that has much larger memory, say 128 GB, is used.

Run Unit Test

Install Dependencies

Some unit tests require torchvision and torchaudio. So we will build torchvision and torchaudio if necessary.

1
2
$ cd /mnt/vision
$ python setup.py install
1
2
$ cd /mnt/audio
$ python setup.py install

There might be some other Python dependencies required for unit tests. But they should be easy to install via pip.

Run Unit Test for Development

To run unit tests, simply go to pytorch/test, and run the selected Python unit test files. Sometimes, if the unit test file is too large, we could either comment out some of the unit tests or use pytest to run unit tests more selectively.

Run Code Formatting

PyTorch uses lintrunner for code formatting. Following the PyTorch official lintrunner instruction to install and setup. Before submitting the code to PyTorch for pull request, run lintrunner using

1
2
$ cd /mnt/pytorch
$ lintrunner -a

Known Issues

Host System Frozen

When building PyTorch, the host system might be frozen. This is because the Docker container is using all the CPU cores and memory. To avoid this, we could limit the number of CPU cores and memory that the Docker container can use.

We can limit the memory usage of the Docker container by adding the --memory option when running the Docker container. However, based on my experiences, the --memory option was not respected on the Linux system. This means that even if we set the --memory option, the Docker container will still use all the memory on the host system. When we use a large number of CPU threads to build PyTorch from source, especially when it’s building Flash-Attention CUDA kernels, the host system will run out of memory and be frozen, if the host system does not have sufficient memory. In my own case, I have 32 GB memory and it’s still not enough to avoid the host system being frozen if I use 8 CPU threads to build PyTorch from source.

References

Author

Lei Mao

Posted on

02-28-2023

Updated on

03-28-2024

Licensed under


Comments