Introduction TensorRT is a high-performance deep learning inference SDK that accelerates deep learning inference on NVIDIA GPUs. While NVIDIA NGC releases Docker images for TensorRT monthly, sometimes we would like to build our own Docker image for selected TensorRT versions.
In this blog post, I would like to show how to build a Docker image for TensorRT.
TensorRT Download TensorRT SDK To download the TensorRT SDK, please go to the TensorRT website , login with NVIDIA Developer account if necessary, and download the TAR install packages for Linux into the downloads
directory.
For example, the TAR install packages for TensorRT 8.6.1.6 looks like the following.
1 2 $ ls downloads/ TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0.tar.gz
Create Dockerfile The installation of TensorRT inside the Docker follows the TensorRT Installation Guide . The Dockerfile created based on the installation guide is shown below.
tensorrt.Dockerfile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 FROM nvcr.io/nvidia/cuda:12.1 .1 -cudnn8-devel-ubuntu22.04 ARG TENSORRT_VERSION=8.6 .1.6 ARG CUDA_USER_VERSION=12.0 ARG CUDNN_USER_VERSION=8.9 ARG OPERATING_SYSTEM=LinuxENV DEBIAN_FRONTEND=noninteractiveRUN apt-get update && \ apt-get install -y --no-install-recommends \ build-essential \ autoconf \ automake \ libtool \ pkg-config \ ca-certificates \ wget \ git \ curl \ libjpeg-dev \ libpng-dev \ language-pack-en \ locales \ locales-all \ python3 \ python3-dev \ python3-pip \ python3-setuptools \ libprotobuf-dev \ protobuf-compiler \ zlib1g-dev \ swig \ vim \ gdb \ valgrind \ libsm6 \ libxext6 \ libxrender-dev \ cmake && \ apt-get clean RUN cd /usr/local/bin && \ ln -s /usr/bin/python3 python && \ ln -s /usr/bin/pip3 pip && \ pip install --upgrade pip setuptools wheel ENV LC_ALL=en_US.UTF-8 ENV LANG=en_US.UTF-8 ENV LANGUAGE=en_US.UTF-8 COPY ./downloads/TensorRT-${TENSORRT_VERSION} .${OPERATING_SYSTEM} .x86_64-gnu.cuda-${CUDA_USER_VERSION} .tar.gz /opt RUN cd /opt && \ tar -xzf TensorRT-${TENSORRT_VERSION} .${OPERATING_SYSTEM} .x86_64-gnu.cuda-${CUDA_USER_VERSION} .tar.gz && \ rm TensorRT-${TENSORRT_VERSION} .${OPERATING_SYSTEM} .x86_64-gnu.cuda-${CUDA_USER_VERSION} .tar.gz && \ export PYTHON_VERSION=$(python3 --version 2>&1 | awk '{print $2}' | cut -d. -f1,2 | tr -d .) && \ python3 -m pip install TensorRT-${TENSORRT_VERSION} /python/tensorrt-*-cp ${PYTHON_VERSION} -none-linux_x86_64.whl && \ python3 -m pip install TensorRT-${TENSORRT_VERSION} /python/tensorrt_lean-*-cp ${PYTHON_VERSION} -none-linux_x86_64.whl && \ python3 -m pip install TensorRT-${TENSORRT_VERSION} /python/tensorrt_dispatch-*-cp ${PYTHON_VERSION} -none-linux_x86_64.whl && \ python3 -m pip install TensorRT-${TENSORRT_VERSION}/onnx_graphsurgeon/onnx_graphsurgeon-*-py2.py3-none-any.whl ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT-${TENSORRT_VERSION}/libENV PATH=$PATH:/opt/TensorRT-${TENSORRT_VERSION}/bin
Build Docker Image To build the Docker image, please run the following command.
1 2 3 $ TENSORRT_VERSION=8.6.1.6 $ CUDA_USER_VERSION=12.0 $ docker build -f tensorrt.Dockerfile --no-cache --build-arg TENSORRT_VERSION=$TENSORRT_VERSION --build-arg CUDA_USER_VERSION=$CUDA_USER_VERSION --tag tensorrt:$TENSORRT_VERSION .
Run Docker Container To run the Docker container, please run the following command.
1 $ docker run -it --rm --gpus all -v $(pwd ):/mnt -w /mnt tensorrt:$TENSORRT_VERSION
Examples Build TensorRT Engine To verify the TensorRT installation, please build a TensorRT engine from an ONNX model using trtexec
using the following command.
1 2 $ wget -P /tmp/ https://github.com/onnx/models/raw/ddbbd1274c8387e3745778705810c340dea3d8c7/validated/vision/classification/mnist/model/mnist-12.onnx $ trtexec --onnx=/tmp/mnist-12.onnx
If the TensorRT installation is correct, the output should look like the following.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [01/24/2024-20:02:59] [I] === Performance summary === [01/24/2024-20:02:59] [I] Throughput: 40624.2 qps [01/24/2024-20:02:59] [I] Latency: min = 0.020874 ms, max = 0.354187 ms, mean = 0.0249536 ms, median = 0.0246582 ms, percentile(90%) = 0.0258789 ms, percentile(95%) = 0.0263672 ms, percentile(99%) = 0.0344238 ms [01/24/2024-20:02:59] [I] Enqueue Time: min = 0.0117188 ms, max = 0.0712891 ms, mean = 0.0130411 ms, median = 0.0126343 ms, percentile(90%) = 0.0130005 ms, percentile(95%) = 0.0134277 ms, percentile(99%) = 0.0257568 ms [01/24/2024-20:02:59] [I] H2D Latency: min = 0.00195312 ms, max = 0.244873 ms, mean = 0.00396986 ms, median = 0.00390625 ms, percentile(90%) = 0.0045166 ms, percentile(95%) = 0.00463867 ms, percentile(99%) = 0.00488281 ms [01/24/2024-20:02:59] [I] GPU Compute Time: min = 0.0151367 ms, max = 0.348145 ms, mean = 0.0171428 ms, median = 0.0166016 ms, percentile(90%) = 0.0174561 ms, percentile(95%) = 0.0183105 ms, percentile(99%) = 0.0256348 ms [01/24/2024-20:02:59] [I] D2H Latency: min = 0.00219727 ms, max = 0.0393066 ms, mean = 0.0038367 ms, median = 0.00402832 ms, percentile(90%) = 0.00439453 ms, percentile(95%) = 0.0045166 ms, percentile(99%) = 0.00476074 ms [01/24/2024-20:02:59] [I] Total Host Walltime: 3.00006 s [01/24/2024-20:02:59] [I] Total GPU Compute Time: 2.08927 s [01/24/2024-20:02:59] [W] * GPU compute time is unstable, with coefficient of variance = 29.0605%. [01/24/2024-20:02:59] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability. [01/24/2024-20:02:59] [I] Explanations of the performance metrics are printed in the verbose logs. [01/24/2024-20:02:59] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8601]
GitHub All the Dockerfiles and examples are available on GitHub .
References