C++ Data Alignment
Introduction
Data alignment is a key feature in computing on modern computer hardware. The CPU reads and writes to memory most efficiently when the data is naturally aligned, which generally means that the data’s memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary.
In addition to the performance, data alignment is also the assumption of many programming languages. Even though the programming languages try to take care of data alignment for us as much as possible, some low-level programming languages can have misaligned data access while the behavior is undefined.
In this blog post, I would like to quickly talk about data alignment, including aligned memory address and aligned memory access, and how to ensure data alignment as much as possible in C++.
Data Alignment
A memory address $a$ is said to be $n$-byte aligned when $a$ is a multiple of $n$ (where $n$ is a power of 2). Suppose we have a piece of $m$-byte data and a $n$-byte aligned address. If $m$ is not divisible by $n$, the $m$-byte data will be padded to $\lfloor \frac{m + n -1}{n} \rfloor \times n$ byte data. Accessing $kn + 1$, $kn + 2$, $\cdots$, $(k+1)n$ byte data all have the same latency, because the CPU reads data from memory $n$-byte a time and those data will usually be cached in CPU. That is to say, if the data storing on $n$-byte aligned address whose storage size $m$ is not a multiple of the $n$, some of the memory access bandwidth is wasted.
A memory access is said to be aligned when the data being accessed is $n$ bytes long and the datum address is $n$-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition one-byte memory accesses are always aligned. Theoretically, it is possible to access on a $n$-byte data on a memory address which is not a multiple of $n$, with much more memory access bandwidth wasted. However, because C and C++ standards assumed aligned memory access, accessing a misaligned address might result in undefined behaviors.
Data Alignment Requirement
alignof
can be used for checking the alignment requirement of certain data type.
1 |
|
Memory Allocation
According to the GNU documentation, the address of a block returned by malloc
or realloc
in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). The default memory address alignment of array is determined by the alignment requirement of the element.
It is possible to use custom data alignment for allocated static memory and dynamic memory. alignas(T)
can be used to specify the byte alignment of an static array and aligned_alloc
can be used to specify the byte alignment of a buffer on dynamic memory.
1 |
|
1 | $ g++ alloc.cpp -o alloc -std=c++11 |
Undefined Behaviors
Writing data to static array or dynamic buffer can lead to undefined behavior if the data alignment is incorrect. For example, if we are creating an object of type T
on unsigned char buf[sizeof(T) / sizeof(char)]
, undefined behaviors might happen for reading and writing, especially with reinterpret_cast
and misaligned memory address incremental. The same is also true for creating an object T
on the dynamic buffer allocated using malloc
. However, because the address returned by malloc
is $8$-byte aligned for 32-bit architecture and $16$-byte aligned for 64-bit architecture, $8$-byte and $16$-byte alignment can be satisfied by almost all the datums, especially the primitive types. It’s not likely undefined behavior can happen.
For example, the following data structure Bar
has sizeof(Bar) == 6
and alignof(Bar) == 2
.
1 | struct Bar |
The alignment requirement is always the maximum alignment requirement from each of the members in the data structure. In modern computers, it must be a power of 2. In the case of the data structure Bar
, alignof(char) == 1
and alignof(short) == 2
. Therefore, sizeof(Bar) == max(alignof(char), alignof(short)) == 2
.
If a Bar
object on the memory is $2$-byte aligned, accessing its char
typed attributes, which requires $1$-byte requirement, is automatically satisfied. Because of the padded byte, accessing its short
typed attributes, which requires $2$-byte requirement, is also automatically satisfied.
On my x86-64 architecture computer, I could malloc(sizeof(Bar))
and the pointer returned will be an address that is $16$-byte aligned, which satisfied the $2$-byte alignment requirement of the the Bar
data structure.
To completely ensure there is no undefined behavior caused by data alignment, we should use T buf[N]
, alignas(T) unsigned char buf[N * sizeof(T) / sizeof(char)]
, and aligned_alloc(alignas(T), N * sizeof(T))
to allocate the memory before creating objects of type T
on it. This also implies that when the compiler generates the data structure, sizeof(T)
must be multiples of alignas(T)
. Otherwise, starting from the second element in the array, they might start to become misaligned.
Conclusions
Ensuring data alignment manually can be error-prone. Therefore, for most of the use cases, we should try using the high-level interface functions and STL containers, such as new
and delete
and std::vector
, for dynamic memory allocation, and ensure the type safety by reducing dangerous pointer type castings.
References
C++ Data Alignment