Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence. On the Move.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

Although Python 3 has officially started to use UTF-8 encoding for text files, I still sometimes got errors regarding ASCII/UTF-8 in Docker container. Surprisingly, there is no such issue in the native system. It turns out that it is the system locale problem. In native system, the locale is usually properly set from the GUI during installation. In Docker container, usually the system locale was not set, and therefore UTF-8 could not be properly read and display in the terminal.


In this blog post, I will talk about how to set the locale properly in Docker container so that there will be no UTF-8 problems at all.

Check System Locale

We could check the system locale using the locale command.

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

If the LANG, LANGUAGE, and LC_MESSAGES are not set with UTF-8 locales, you are likely to have UTF-8 read and display issues when running computer programs.


In Python, we could also check the encoding method of the locale in the system using the following command.

$ python -c "import sys; print(sys.stdout.encoding)"
UTF-8

If the output is not UTF-8, you are likely to have UTF-8 read and display issues when running computer programs.


It should be noted that the following command, although somewhat similar to the one we used above, does not reflect the system locale.

python -c "import sys; print(sys.getdefaultencoding())"

Set Locale Properly for Docker Container

It is actually simple to set locale for the Docker container. During the building of Docker image, just add either one of the following Docker script snippet to the Dockerfile and you are all set.

Method 1

RUN apt-get update
RUN apt-get install -y locales
RUN sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
    locale-gen
ENV LC_ALL en_US.UTF-8 
ENV LANG en_US.UTF-8  
ENV LANGUAGE en_US:en     

Method 2

RUN apt-get update
RUN apt-get install -y locales locales-all
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

References