C++ Data Alignment

Introduction

Data alignment is a key feature in computing on modern computer hardware. The CPU reads and writes to memory most efficiently when the data is naturally aligned, which generally means that the data’s memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary.

In addition to the performance, data alignment is also the assumption of many programming languages. Even though the programming languages try to take care of data alignment for us as much as possible, some low-level programming languages can have misaligned data access while the behavior is undefined.

In this blog post, I would like to quickly talk about data alignment, including aligned memory address and aligned memory access, and how to ensure data alignment as much as possible in C++.

Data Alignment

A memory address $a$ is said to be $n$-byte aligned when $a$ is a multiple of $n$ (where $n$ is a power of 2). Suppose we have a piece of $m$-byte data and a $n$-byte aligned address. If $m$ is not divisible by $n$, the $m$-byte data will be padded to $\lfloor \frac{m + n -1}{n} \rfloor \times n$ byte data. Accessing $kn + 1$, $kn + 2$, $\cdots$, $(k+1)n$ byte data all have the same latency, because the CPU reads data from memory $n$-byte a time and those data will usually be cached in CPU. That is to say, if the data storing on $n$-byte aligned address whose storage size $m$ is not a multiple of the $n$, some of the memory access bandwidth is wasted.

A memory access is said to be aligned when the data being accessed is $n$ bytes long and the datum address is $n$-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition one-byte memory accesses are always aligned. Theoretically, it is possible to access on a $n$-byte data on a memory address which is not a multiple of $n$, with much more memory access bandwidth wasted. However, because C and C++ standards assumed aligned memory access, accessing a misaligned address might result in undefined behaviors.

Data Alignment Requirement

alignof can be used for checking the alignment requirement of certain data type.

aligment_requirement.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <cassert>

struct float4_4_t
{
float data[4];
};

// Every object of type float4_32_t will be aligned to 32-byte boundary.
// Might be useful for SIMD instructions.
struct alignas(32) float4_32_t
{
float data[4];
};

// Valid non-zero alignments that are weaker than another alignas on the same
// declaration are ignored.
struct alignas(1) float4_1_t
{
float data[4];
};

// Accessing object results in undefined behavior.
// 1-byte struct member alignment.
// size = 32, alignment = 1-byte, no padding for these struct members.
// This is ill-formed since float requires 4-byte alignment.
#pragma pack(push, 1)
struct alignas(1) float4_1_ub_t
{
float data[4];
};
#pragma pack(pop)

int main()
{
assert(alignof(float4_4_t) == 4);
assert(alignof(float4_32_t) == 32);
assert(alignof(float4_1_t) == 4);
assert(alignof(float4_1_ub_t) == 1);

assert(sizeof(float4_4_t) == 16);
assert(sizeof(float4_32_t) == 32);
assert(sizeof(float4_1_t) == 16);
assert(sizeof(float4_1_ub_t) == 16);
}

Memory Allocation

According to the GNU documentation, the address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). The default memory address alignment of array is determined by the alignment requirement of the element.

It is possible to use custom data alignment for allocated static memory and dynamic memory. alignas(T) can be used to specify the byte alignment of an static array and aligned_alloc can be used to specify the byte alignment of a buffer on dynamic memory.

alloc.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <cstdio>
#include <cstdlib>
#include <iostream>

int main()
{
unsigned char buf1[sizeof(int) / sizeof(char)];
std::cout << "Default "
<< alignof(unsigned char[sizeof(int) / sizeof(char)]) << "-byte"
<< " aligned addr: " << static_cast<void*>(buf1) << std::endl;
std::cout << reinterpret_cast<uintptr_t>(buf1) %
alignof(unsigned char[sizeof(int) / sizeof(char)])
<< std::endl;
std::cout << reinterpret_cast<uintptr_t>(buf1) % alignof(int) << std::endl;

alignas(int) unsigned char buf2[sizeof(int) / sizeof(char)];
std::cout << alignof(int)
<< "-byte aligned addr: " << static_cast<void*>(buf2)
<< std::endl;
std::cout << reinterpret_cast<uintptr_t>(buf2) %
alignof(unsigned char[sizeof(int) / sizeof(char)])
<< std::endl;
std::cout << reinterpret_cast<uintptr_t>(buf2) % alignof(int) << std::endl;

void* p1 = malloc(sizeof(int));
std::cout << "Default "
<< "16-byte"
<< " aligned addr: " << p1 << std::endl;
std::cout << reinterpret_cast<uintptr_t>(p1) % 16 << std::endl;
std::cout << reinterpret_cast<uintptr_t>(p1) % 1024 << std::endl;
free(p1);

void* p2 = aligned_alloc(1024, sizeof(int));
std::cout << "1024-byte aligned addr: " << p2 << std::endl;
std::cout << reinterpret_cast<uintptr_t>(p2) % 16 << std::endl;
std::cout << reinterpret_cast<uintptr_t>(p2) % 1024 << std::endl;
free(p2);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ g++ alloc.cpp -o alloc -std=c++11
$ ./alloc
Default 1-byte aligned addr: 0x7ffd46d76304
0
0
4-byte aligned addr: 0x7ffd46d76300
0
0
Default 16-byte aligned addr: 0x559a6e1c42c0
0
704
1024-byte aligned addr: 0x559a6e1c4400
0
0

Undefined Behaviors

Writing data to static array or dynamic buffer can lead to undefined behavior if the data alignment is incorrect. For example, if we are creating an object of type T on unsigned char buf[sizeof(T) / sizeof(char)], undefined behaviors might happen for reading and writing, especially with reinterpret_cast and misaligned memory address incremental. The same is also true for creating an object T on the dynamic buffer allocated using malloc. However, because the address returned by malloc is $8$-byte aligned for 32-bit architecture and $16$-byte aligned for 64-bit architecture, $8$-byte and $16$-byte alignment can be satisfied by almost all the datums, especially the primitive types. It’s not likely undefined behavior can happen.

For example, the following data structure Bar has sizeof(Bar) == 6 and alignof(Bar) == 2.

1
2
3
4
5
struct Bar
{
char arr[3]; // 3 bytes + 1 padded byte
short s; // 2 bytes
};

The alignment requirement is always the maximum alignment requirement from each of the members in the data structure. In modern computers, it must be a power of 2. In the case of the data structure Bar, alignof(char) == 1 and alignof(short) == 2. Therefore, sizeof(Bar) == max(alignof(char), alignof(short)) == 2.

If a Bar object on the memory is $2$-byte aligned, accessing its char typed attributes, which requires $1$-byte requirement, is automatically satisfied. Because of the padded byte, accessing its short typed attributes, which requires $2$-byte requirement, is also automatically satisfied.

On my x86-64 architecture computer, I could malloc(sizeof(Bar)) and the pointer returned will be an address that is $16$-byte aligned, which satisfied the $2$-byte alignment requirement of the the Bar data structure.

To completely ensure there is no undefined behavior caused by data alignment, we should use T buf[N], alignas(T) unsigned char buf[N * sizeof(T) / sizeof(char)], and aligned_alloc(alignas(T), N * sizeof(T)) to allocate the memory before creating objects of type T on it. This also implies that when the compiler generates the data structure, sizeof(T) must be multiples of alignas(T). Otherwise, starting from the second element in the array, they might start to become misaligned.

Conclusions

Ensuring data alignment manually can be error-prone. Therefore, for most of the use cases, we should try using the high-level interface functions and STL containers, such as new and delete and std::vector, for dynamic memory allocation, and ensure the type safety by reducing dangerous pointer type castings.

References

Author

Lei Mao

Posted on

07-02-2022

Updated on

09-21-2023

Licensed under


Comments