TensorRT In Docker

Introduction

TensorRT is a high-performance deep learning inference SDK that accelerates deep learning inference on NVIDIA GPUs. While NVIDIA NGC releases Docker images for TensorRT monthly, sometimes we would like to build our own Docker image for selected TensorRT versions.

In this blog post, I would like to show how to build a Docker image for TensorRT.

TensorRT

Download TensorRT SDK

To download the TensorRT SDK, please go to the TensorRT website, login with NVIDIA Developer account if necessary, and download the TAR install packages for Linux into the downloads directory.

For example, the TAR install packages for TensorRT 8.6.1.6 looks like the following.

1
2
$ ls downloads/
TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-12.0.tar.gz

Create Dockerfile

The installation of TensorRT inside the Docker follows the TensorRT Installation Guide. The Dockerfile created based on the installation guide is shown below.

tensorrt.Dockerfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04

ARG TENSORRT_VERSION=8.6.1.6
ARG CUDA_USER_VERSION=12.0
ARG CUDNN_USER_VERSION=8.9
ARG OPERATING_SYSTEM=Linux

ENV DEBIAN_FRONTEND noninteractive

# Install package dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
autoconf \
automake \
libtool \
pkg-config \
ca-certificates \
wget \
git \
curl \
libjpeg-dev \
libpng-dev \
language-pack-en \
locales \
locales-all \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
libprotobuf-dev \
protobuf-compiler \
zlib1g-dev \
swig \
vim \
gdb \
valgrind \
libsm6 \
libxext6 \
libxrender-dev \
cmake && \
apt-get clean

RUN cd /usr/local/bin && \
ln -s /usr/bin/python3 python && \
ln -s /usr/bin/pip3 pip && \
pip install --upgrade pip setuptools wheel

# System locale
# Important for UTF-8
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

COPY ./downloads/TensorRT-${TENSORRT_VERSION}.${OPERATING_SYSTEM}.x86_64-gnu.cuda-${CUDA_USER_VERSION}.tar.gz /opt
RUN cd /opt && \
tar -xzf TensorRT-${TENSORRT_VERSION}.${OPERATING_SYSTEM}.x86_64-gnu.cuda-${CUDA_USER_VERSION}.tar.gz && \
rm TensorRT-${TENSORRT_VERSION}.${OPERATING_SYSTEM}.x86_64-gnu.cuda-${CUDA_USER_VERSION}.tar.gz && \
export PYTHON_VERSION=$(python3 --version 2>&1 | awk '{print $2}' | cut -d. -f1,2 | tr -d .) && \
python3 -m pip install TensorRT-${TENSORRT_VERSION}/python/tensorrt-*-cp${PYTHON_VERSION}-none-linux_x86_64.whl && \
python3 -m pip install TensorRT-${TENSORRT_VERSION}/python/tensorrt_lean-*-cp${PYTHON_VERSION}-none-linux_x86_64.whl && \
python3 -m pip install TensorRT-${TENSORRT_VERSION}/python/tensorrt_dispatch-*-cp${PYTHON_VERSION}-none-linux_x86_64.whl && \
# Deprecated.
# python3 -m pip install TensorRT-${TENSORRT_VERSION}/uff/uff-*-py2.py3-none-any.whl && \
# python3 -m pip install TensorRT-${TENSORRT_VERSION}/graphsurgeon/graphsurgeon-*-py2.py3-none-any.whl && \
python3 -m pip install TensorRT-${TENSORRT_VERSION}/onnx_graphsurgeon/onnx_graphsurgeon-*-py2.py3-none-any.whl

ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT-${TENSORRT_VERSION}/lib
ENV PATH=$PATH:/opt/TensorRT-${TENSORRT_VERSION}/bin

Build Docker Image

To build the Docker image, please run the following command.

1
2
3
$ TENSORRT_VERSION=8.6.1.6
$ CUDA_USER_VERSION=12.0
$ docker build -f tensorrt.Dockerfile --no-cache --build-arg TENSORRT_VERSION=$TENSORRT_VERSION --build-arg CUDA_USER_VERSION=$CUDA_USER_VERSION --tag=tensorrt:$TENSORRT_VERSION .

Run Docker Container

To run the Docker container, please run the following command.

1
$ docker run -it --rm --gpus all -v $(pwd):/mnt tensorrt:$TENSORRT_VERSION

Examples

Build TensorRT Engine

To verify the TensorRT installation, please build a TensorRT engine from an ONNX model using trtexec using the following command.

1
2
$ wget -P /tmp/ https://github.com/onnx/models/raw/ddbbd1274c8387e3745778705810c340dea3d8c7/validated/vision/classification/mnist/model/mnist-12.onnx
$ trtexec --onnx=/tmp/mnist-12.onnx

If the TensorRT installation is correct, the output should look like the following.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[01/24/2024-20:02:59] [I] === Performance summary ===
[01/24/2024-20:02:59] [I] Throughput: 40624.2 qps
[01/24/2024-20:02:59] [I] Latency: min = 0.020874 ms, max = 0.354187 ms, mean = 0.0249536 ms, median = 0.0246582 ms, percentile(90%) = 0.0258789 ms, percentile(95%) = 0.0263672 ms, percentile(99%) = 0.0344238 ms
[01/24/2024-20:02:59] [I] Enqueue Time: min = 0.0117188 ms, max = 0.0712891 ms, mean = 0.0130411 ms, median = 0.0126343 ms, percentile(90%) = 0.0130005 ms, percentile(95%) = 0.0134277 ms, percentile(99%) = 0.0257568 ms
[01/24/2024-20:02:59] [I] H2D Latency: min = 0.00195312 ms, max = 0.244873 ms, mean = 0.00396986 ms, median = 0.00390625 ms, percentile(90%) = 0.0045166 ms, percentile(95%) = 0.00463867 ms, percentile(99%) = 0.00488281 ms
[01/24/2024-20:02:59] [I] GPU Compute Time: min = 0.0151367 ms, max = 0.348145 ms, mean = 0.0171428 ms, median = 0.0166016 ms, percentile(90%) = 0.0174561 ms, percentile(95%) = 0.0183105 ms, percentile(99%) = 0.0256348 ms
[01/24/2024-20:02:59] [I] D2H Latency: min = 0.00219727 ms, max = 0.0393066 ms, mean = 0.0038367 ms, median = 0.00402832 ms, percentile(90%) = 0.00439453 ms, percentile(95%) = 0.0045166 ms, percentile(99%) = 0.00476074 ms
[01/24/2024-20:02:59] [I] Total Host Walltime: 3.00006 s
[01/24/2024-20:02:59] [I] Total GPU Compute Time: 2.08927 s
[01/24/2024-20:02:59] [W] * GPU compute time is unstable, with coefficient of variance = 29.0605%.
[01/24/2024-20:02:59] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[01/24/2024-20:02:59] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/24/2024-20:02:59] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=/tmp/mnist-12.onnx

GitHub

All the Dockerfiles and examples are available on GitHub.

References

Author

Lei Mao

Posted on

02-05-2024

Updated on

02-05-2024

Licensed under


Comments