TensorBoard has been developed by Google in order to accelerate the debugging process of TensorFlow and visualize the training process. However, I actually have not got a chance to use this tool until very recently. This blog documented the most basic protocols of visualizing TensorBoard running on the Docker on the remote server from your local computer.
Add TensorBoard Components to Code
TensorBoard does not run on its own. You would need to add new components to your TensorFlow code to ask TensorBoard keep tracking of the tensor values.
Please check the official TensorBoard Tutorial about how to add such components.
Connect Ports of Docker Container to Server
This is usually done via the
-p argument of
docker run command. TensorBoard uses port
6006 by default, so we connect the port
0.0.0.0:6006) on Docker container to the port
0.0.0.0:5001) on the sever.
$ nvidia-docker run -it --name leimao-speech-instance -v /home/leimao/workspace:/workspace -p 5000:8888 -p 5001:6006 leimao/speech
To exit the docker container while keep the container running in the backgroud, click
docker container ls to check if we have connected the port successfully.
$ docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 05ee0d5a5a0e leimao/speech "/bin/bash" About an hour ago Up 8 seconds 0.0.0.0:5001->6006/tcp, 0.0.0.0:5000->8888/tcp leimao-speech-instance
SSH to Server
To connect the local port to the server port, in our local terminal:
$ ssh -L 127.0.0.1:16006:0.0.0.0:5001 username@server
We use the full port name because sometimes there are warnings from the server terminal if we do not do so.
Restart Docker Container
To restart the docker container, in our server terminal:
$ docker start -i 05ee0d5a5a0e
Start TensorBoard Service
To start the TensorBoard service, in our docker container terminal:
$ tensorboard --logdir ./graphs/rnn/ &
We have to specify the TensorBoard record directories in the
logdir argument. There might be remaining TensorBoard records in the directory. It would be better to clean the directory before getting new records from the new training, if we would like to monitor the new training process.
& sign is used to run TensorBoard in the background.
After starting TensorBoard successfully, we will receive such message:
TensorBoard 1.8.0 at http://05ee0d5a5a0e:6006 (Press CTRL+C to quit)
To kill TensorBoard process in the background if necessary, we first check the PID (Process ID) of Tensorboard using
$ top top - 16:24:57 up 69 days, 23:50, 0 users, load average: 0.01, 0.29, 0.50 Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.1 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 63850140 total, 11136512 free, 2106536 used, 50607092 buff/cache KiB Swap: 4194300 total, 4191240 free, 3060 used. 60739944 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 18232 2016 1528 S 0.0 0.0 0:00.07 bash 56 root 20 0 87.717g 603244 220008 S 0.0 0.9 0:14.92 tensorboard 98 root 20 0 38752 1808 1284 R 0.0 0.0 0:00.01 top
In our case, the PID of Tensorboard is
56. We could kill the process in terminal:
$ kill -09 56
Run the TensorFlow program in Docker container terminal:
$ python main.py
Q to exit the docker container while keep the container running in the background if necessary.
Monitor TensorBoard Locally
Open a web browser, such as Chrome, and go to the url
http://127.0.0.1:16006. We could see the TensorBoard and keep track of the training process on the remote server on our local computer.