Nsight Systems In Docker

06-01-202212-19-2023 blog 5 minutes read (About 717 words) visits

Introduction

NVIDIA Nsight Systems is a low overhead performance analysis tool designed to provide developers need to optimize their software. Unbiased activity data is visualized within the tool to help users investigate bottlenecks, avoid inferring false-positives, and pursue optimizations with higher probability of performance gains.

In this blog post, I would like to discuss how to install and use Nsight Systems in Docker container so that we could use it anywhere that has Docker installed.

Nsight Systems

Build Docker Image

It is possible to install Nsight Systems inside a Docker image and used it anywhere. The Dockerfile for building Nsight Systems is as follows.

nsight-systems.Dockerfile

FROM nvcr.io/nvidia/cuda:12.0.1-devel-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
        apt-transport-https \
        ca-certificates \
        dbus \
        fontconfig \
        gnupg \
        libasound2 \
        libfreetype6 \
        libglib2.0-0 \
        libnss3 \
        libsqlite3-0 \
        libx11-xcb1 \
        libxcb-glx0 \
        libxcb-xkb1 \
        libxcomposite1 \
        libxcursor1 \
        libxdamage1 \
        libxi6 \
        libxml2 \
        libxrandr2 \
        libxrender1 \
        libxtst6 \
        libgl1-mesa-glx \
        libxkbfile-dev \
        openssh-client \
        wget \
        xcb \
        xkb-data && \
    apt-get clean

RUN cd /tmp && \
    wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2023_4_1_97/nsight-systems-2023.4.1_2023.4.1.97-1_amd64.deb && \
    apt-get install -y ./nsight-systems-2023.4.1_2023.4.1.97-1_amd64.deb && \
    rm -rf /tmp/*

To build the Docker image, please run the following command.

1	$ docker build -f nsight-systems.Dockerfile --no-cache --tag nsight-systems:2023.4 .

Upload Docker Image

To upload the Docker image, please run the following command.

1 2	$ docker tag nsight-systems:2023.4 leimao/nsight-systems:2023.4 $ docker push leimao/nsight-systems:2023.4

Pull Docker Image

To pull the Docker image, please run the following command.

1 2	$ docker pull leimao/nsight-systems:2023.4 $ docker tag leimao/nsight-systems:2023.4 nsight-systems:2023.4

Run Docker Container

To run the Docker container, please run the following command.

1
2
3

$ xhost +
$ docker run -it --rm --gpus all -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --cap-add=SYS_ADMIN --security-opt seccomp=unconfined -v $(pwd):/mnt -w /mnt --network host nsight-systems:2023.4
$ xhost -

Run Nsight Systems

To run Nsight Systems with GUI, please run the following command.

$ nsys-ui

We could now profile the applications from the Docker container, from the Docker local host machine via Docker mount, and from the remote host such as a remote workstation or an embedding device.

Examples

Pageable Memory VS Page-Locked Memory

To overlap data transfer and kernel launch with CUDA stream, we will have to use page-locked (pinned) host memory. Otherwise, with pageable memory, no data transfer and kernel launch overlap will happen.

I prepared two examples trying to use CUDA stream to overlap data transfer and kernel launch. One uses page-locked host memory and the other one uses pageable host memory. The two examples are available on GitHub.

Using Nsight Systems to profile the two implementations, we could clearly see that there are no data transfer and kernel launch overlap from the implementation that does not use page-locked memory. Based on this, we realized that we made a mistake or there could be optimization opportunities.

No Data Transfer Overlap with Non-Pinned Host Memory

By switching to page-locked memory, we could see data transfer and kernel launch overlap.

Data Transfer Overlap with Pinned Host Memory

GitHub

All the Dockerfiles and examples are available on GitHub.

Miscellaneous

NVIDIA Nsight Compute is an interactive specialized kernel profiler for CUDA applications. So for optimizing CUDA kernel implementation, we should use Nsight Compute instead of Nsight Systems. Nsight Compute could be installed and used in Docker container similarly as Nsight Systems.

References

Nsight Systems In Docker

https://leimao.github.io/blog/Docker-Nsight-Systems/

Author

Lei Mao

Posted on

06-01-2022

Updated on

12-19-2023

Licensed under

CUDA,

Docker

Nsight Systems In Docker

Introduction

Nsight Systems

Build Docker Image

Upload Docker Image

Pull Docker Image

Run Docker Container

Run Nsight Systems

Examples

Pageable Memory VS Page-Locked Memory

GitHub

Miscellaneous

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue