# Binary VS Text Mode for File I/O Operations

## Introduction

When we try to read or write files in our program, usually there are two modes to use. Text mode, usually by default, and binary mode. Obviously, in text mode, the program writes data to file as text characters, and in binary mode, the program writes data to files as 0/1 bits. While it sounds trivial to distinguish the two modes, people sometimes got confused. Since the computer only reads and writes in binary formats, where is this text mode coming from?

In this blog post, I am going to talk about the conceptual difference between the text mode and the binary model, and discuss some caveats of using them.

## Example

We have a signed int -10000, an unsigned short 100, and a C string WE. Their binary sequence representations in an 64-bit computer is as follows.

signed int -10000:

std::string -10000:

unsigned short 100:

std::string 100:

C string WE:

Note that C string always has a 0 at the end of the string.

When we save the three values to file, the binary sequence of the file is simply a concatenation of all the values.

## Binary Mode

The binary mode is very easy to understand. For each piece of the data on the computer, they are represented as binary sequences on the memory or hard drive.

### Writing File

To save the data in binary mode, we simply take the exact binary sequence representing the data, and save it to the file. Nothing fancy.

Because the saved file has no knowledge about the data structure of its content, to read the data saved in binary mode, the users would need to implement the decoding method themselves.

### Expected Output of the Example Using Binary Mode

When the computer sees the binary sequence from the binary file, it would have no clue to decode it back to the original values. It is our users’ responsibility to tell the computer, the first 4 bytes represent a signed int, the next 2 bytes represent an unsigned short, and the next 3 bytes represent a C string, so that the computer would know how to decode.

### Code Example

We implemented a code example binaryIO.cpp for saving data in binary mode.

We compiled the above code using the following command.

We ran the executable and got the following outputs.

Be aware that the numeric values are saved with the order of bytes reversed, which is an artifact of C/C++. Except for this, everything saved to the file matches our expectation.

The size of the saved file is exactly 9 bytes.

## Text Mode

The text mode is nothing special but converts the data to string format, and use the binary representation of the string to represent the data.

### Writing File

Because the encoding and decoding methods, such as ASCII and UTF-8, of string characters have been implemented already. The user does not have to implement any encoding and decoding methods, but let the program know which encoding and decoding methods to use. Some of the data which could be implicitly converted to strings could also be saved using the text mode. However, the data which could not be converted to strings could not be saved using the text mode.

When there is more than one value to be saved into the file, it is the user’s responsibility to parse the text. Usually, the user would use some special delimiters such as \n to separate different values.

Similarly, because the base unit of the text is character. When it comes to reading the file using the text mode, the program would just have to read the file byte by byte, and decode each byte to character using the decoding method the user-specified.

### Expected Output of the Example Using Text Mode

When the computer sees the binary sequence from the text file, since each byte is a character, it would just decode byte to character one by one.

### Code Example

We implemented a code example textIO.cpp for saving data in binary mode.

We compiled the above code using the following command.

We ran the executable and got the following outputs.

Note that 00001010 is the delimiter \n. Except for the three delimiters we inserted, everything else matches our expectations.

The size of the saved file is exactly 15 bytes. Even if we do not count the three delimiters inserted, the size would be 12 bytes.

## Conclusions

Writing data using the binary mode takes smaller disk or memory sizes comparing to writing data using the text mode. That’s why large data storage and low latency file transmission often use binary formats.

The shortcoming of the binary mode is that you should know the data structure and the exact methods for decoding the data. Implementation of the decoding method for each specific data structure would be time consuming. However, with the rise of libraries for handling the binary encoding and decoding for different data structures, such as Google’s Protocol Buffer, we could handle the writing and reading for binary files more easily for most of the common data structures.

## References

Binary VS Text Mode for File I/O Operations

https://leimao.github.io/blog/File-IO-Binary-VS-Text/

Lei Mao

12-22-2019

12-22-2019