TensorBoard on Docker
Introduction
TensorBoard has been developed by Google in order to accelerate the debugging process of TensorFlow and visualize the training process. However, I actually have not got a chance to use this tool until very recently. This blog documented the most basic protocols of visualizing TensorBoard running on the Docker on the remote server from your local computer.
Protocols
Add TensorBoard Components to Code
TensorBoard does not run on its own. You would need to add new components to your TensorFlow code to ask TensorBoard keep tracking of the tensor values.
Please check the official TensorBoard Tutorial about how to add such components.
Connect Ports of Docker Container to Server
This is usually done via the -p
argument of docker run
command. TensorBoard uses port 6006
by default, so we connect the port 6006
(0.0.0.0:6006
) on Docker container to the port 5001
(0.0.0.0:5001
) on the sever.
1 | $ nvidia-docker run -it --name leimao-speech-instance -v /home/leimao/workspace:/workspace -p 5000:8888 -p 5001:6006 leimao/speech |
To exit the docker container while keep the container running in the backgroud, click Ctrl
+ P
+ Ctrl
+ Q
.
Run docker container ls
to check if we have connected the port successfully.
1 | $ docker container ls |
SSH to Server
To connect the local port to the server port, in our local terminal:
1 | $ ssh -L 127.0.0.1:16006:0.0.0.0:5001 username@server |
We use the full port name because sometimes there are warnings from the server terminal if we do not do so.
Restart Docker Container
To restart the docker container, in our server terminal:
1 | $ docker start -i 05ee0d5a5a0e |
Start TensorBoard Service
To start the TensorBoard service, in our docker container terminal:
1 | $ tensorboard --logdir ./graphs/rnn/ & |
We have to specify the TensorBoard record directories in the logdir
argument. There might be remaining TensorBoard records in the directory. It would be better to clean the directory before getting new records from the new training, if we would like to monitor the new training process.
The &
sign is used to run TensorBoard in the background.
After starting TensorBoard successfully, we will receive such message:
1 | TensorBoard 1.8.0 at http://05ee0d5a5a0e:6006 (Press CTRL+C to quit) |
To kill TensorBoard process in the background if necessary, we first check the PID (Process ID) of Tensorboard using top
:
1 | $ top |
In our case, the PID of Tensorboard is 56
. We could kill the process in terminal:
1 | $ kill -09 56 |
Run TensorFlow
Run the TensorFlow program in Docker container terminal:
1 | $ python main.py |
Use Ctrl
+ P
+ Ctrl
+ Q
to exit the docker container while keep the container running in the background if necessary.
Monitor TensorBoard Locally
Open a web browser, such as Chrome, and go to the url http://127.0.0.1:16006
. We could see the TensorBoard and keep track of the training process on the remote server on our local computer.
TensorBoard on Docker