Although Python 3 has officially started to use UTF-8 encoding for text files, I still sometimes got errors regarding ASCII/UTF-8 in Docker container. Surprisingly, there is no such issue in the native system. It turns out that it is the system locale problem. In native system, the locale is usually properly set from the GUI during installation. In Docker container, usually the system locale was not set, and therefore UTF-8 could not be properly read and display in the terminal.
In this blog post, I will talk about how to set the locale properly in Docker container so that there will be no UTF-8 problems at all.
Check System Locale
We could check the system locale using the
$ locale LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8
LC_MESSAGES are not set with
UTF-8 locales, you are likely to have UTF-8 read and display issues when running computer programs.
In Python, we could also check the encoding method of the locale in the system using the following command.
$ python -c "import sys; print(sys.stdout.encoding)" UTF-8
If the output is not
UTF-8, you are likely to have UTF-8 read and display issues when running computer programs.
It should be noted that the following command, although somewhat similar to the one we used above, does not reflect the system locale.
python -c "import sys; print(sys.getdefaultencoding())"
Set Locale Properly for Docker Container
It is actually simple to set locale for the Docker container. During the building of Docker image, just add either one of the following Docker script snippet to the Dockerfile and you are all set.
RUN apt-get update RUN apt-get install -y locales RUN sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \ locale-gen ENV LC_ALL en_US.UTF-8 ENV LANG en_US.UTF-8 ENV LANGUAGE en_US:en
RUN apt-get update RUN apt-get install -y locales locales-all ENV LC_ALL en_US.UTF-8 ENV LANG en_US.UTF-8 ENV LANGUAGE en_US.UTF-8