Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence. On the Move.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS


Python string format has been widely used to control variables in the string and format the string in a way that the user prefers. However, in practice, the strings printed out still do not look beautiful for various reasons such as bad text alignment and insufficient free spaces.

In this blog post, I am going to describe the general rule of using Python string format, and how to use it to print beautiful strings to console for machine learning and data science projects.

Basic Python String Format Syntax


Although the Python string format syntax could be more complicated, I think the following syntax might be sufficient for most of the projects involving scientific computing.

{id : char_to_fill alignment sign width comma num_decimals data_type}


Token Optional Explanation
id Yes The id of the string format placeholder.
padding_char Yes The character used for filling the padding spaces at the start and the end of the string.
If no character is given, empty space will be used.
alignment Yes `^` is align center; `<` is align left; `>` is align right.
sign Yes If `+` is used, + or - would be used for positive and negative values, respectively.
width Yes The width of the whole string. If the width is larger than the length of the string to be print,
`padding_char` will be used.
comma Yes If `,` is used, large numbers will have commas as separator.
num_decimals Yes The number of decimals for floating numbers. Has to be of format `.n` where n is an integer.
data_type Yes `s` is string, `f` is floating number, `d` is integer number.


If we run the following code in Python,

example_line = "|{pi:@^+25,.8f}|".format(pi=314159.26)

The message printed to the console would be


Python String Format for Machine Learning and Data Science

We would use the following Python generator to generate fake machine learning training statistics for illustration.

# Generate fake training statistics
def gen_func(n):
    loss_max = 10000.0
    accuracy_max = 1.0
    for i in range(n):
        # epoch, training loss, training accuracy
        yield i, (1-(i+1)/n)*loss_max, (i+1)/n*accuracy_max 

The following Python code could be used to print the aligned training statistics to console automatically, as long as the variable header_items, and width were given.

train_op = gen_func(n=10)
header_items = ["Epoch", "Loss", "Accuracy"]
width = 60

dash = "-" * width
column_width = width // len(header_items)
column_width_items = [column_width] * len(header_items)
header_format_content = [None] * (len(header_items) + len(column_width_items))
header_format_content[::2] = header_items
header_format_content[1::2] = column_width_items
# Expand list using asterisk
# We could have {} inside {}
header = "{:^{}s}{:^{}s}{:^{}s}".format(*header_format_content)
for (epoch, loss, accuracy) in train_op:
    line = "{:^{}d}{:^{}.4f}{:^{}.2%}".format(epoch, column_width, loss, column_width, accuracy, column_width)

The aligned training statistics printed out would be

       Epoch                Loss              Accuracy      
         0               9000.0000             10.00%       
         1               8000.0000             20.00%       
         2               7000.0000             30.00%       
         3               6000.0000             40.00%       
         4               5000.0000             50.00%       
         5               4000.0000             60.00%       
         6               3000.0000             70.00%       
         7               2000.0000             80.00%       
         8               1000.0000             90.00%       
         9                 0.0000             100.00%