AddressSanitizer
Introduction
AddressSanitizer (ASan) is a fast memory error detector that can identify various memory errors such as accessing the not addressable memory, use-after-free, and memory leaks. It is a compile-time instrumentation tool that modifies the source code of programs to insert additional checks for memory accesses.
In this blog post, I would like to discuss the AddressSanitizer algorithm and its usages.
AddressSanitizer Algorithm
The AddressSanitizer algorithm has been described in the AddressSanitizer GitHub Wiki. I am going to elaborate on some more details that I found during my study of the algorithm.
Application Memory and Shadow Memory Mapping
Fundamentally, AddressSanitizer maps 8 bytes of the application memory into 1 byte of the shadow memory.
There are only 9 different values for any aligned 8 bytes of the application memory:
- All 8 bytes in qword are unpoisoned (i.e. addressable). The shadow value is 0.
- All 8 bytes in qword are poisoned (i.e. not addressable). The shadow value is negative.
- First
k
bytes are unpoisoned, the rest8-k
are poisoned. The shadow value isk
.
This is guaranteed by the fact thatmalloc
returns 8-byte aligned chunks of memory.
The only case where different bytes of an aligned qword have different state is the tail of amalloc
-ed region. For example, if we callmalloc(13)
, we will have one full unpoisoned qword and one qword where 5 first bytes are unpoisoned.
Here addressable means the valid memory region can be accessed using the memory address provided. It is different from the conceptual out-of-bound (OOB) memory access in some cases. For example, if we have a malloc
-ed region of 16 bytes, and we created a container that uses the first 12 bytes, leaving the last 4 bytes unused and uninitialized. Then accessing the last 4 bytes of the malloc
-ed region is addressable but out-of-bound for the container. Such OOB access might cause undefined behaviors of the program but cannot be detected by AddressSanitizer.
Having understood the difference between addressable and OOB memory access, we could understand that for any aligned 8 bytes there can only be $8 + 1 = 9$ poison states. It is impossible that the application memory allocated from malloc
has not addressable bytes followed by addressable bytes. More generally, for any aligned $N$ bytes, there can only be $N + 1$ poison states. This means that 1 byte of shadow memory, which has $2^8 = 256$ possible states, can be used for representing up to 128 bytes of the application memory theoretically.
The AddressSanitizer instrumentation for detecting memory errors looks like this:
1 | // Given the address user provided, get the corresponding shadow memory address which points to the shadow value of the byte. |
Here we could see there is a memory error that AddressSanitizer cannot detect. Suppose the user only allocated 8 bytes of memory using malloc(8)
, and then accessed 16 bytes starting from the address returned by malloc
using some vectorized access instructions. The shadow value for the first 8 bytes is 0, but apparently accessing 16 bytes from the address returned by malloc(8)
is illegal. A possible fix for this is to use 1 byte of shadow memory for representing larger than 8 bytes of application memory, for example, 16 bytes, as we have discussed earlier. But there can be some impact to the performance of AddressSanitizer, which requires further analysis and experiments.
The further checking using the access size information is done in the SlowPathCheck
function. The implementation of SlowPathCheck
is like this:
1 | // Check the cases where we access first k bytes of the qword |
In this case, address & 7
gives the offset of the byte pointed by address
in the aligned 8-byte qword, and it is equivalent as address % 8
. If last_accessed_byte >= shadow_value
, definitely some not addressable bytes are accessed and we should report this as a memory error.
Shadow Memory Layout
On a 64-bit operating system, the memory address can be mapped to the shadow memory address using the following formula.
1 | Shadow = (Mem >> 3) + 0x7fff8000; |
where Mem >> 3
is just Mem / 8
since AddressSanitizer maps 8 bytes of application memory to 1 byte of shadow memory and 0x7fff8000
is the offset to map the application memory to the shadow memory region.
[0x10007fff8000, 0x7fffffffffff] |
HighMem |
[0x02008fff7000, 0x10007fff7fff] |
HighShadow |
[0x00008fff7000, 0x02008fff6fff] |
ShadowGap |
[0x00007fff8000, 0x00008fff6fff] |
LowShadow |
[0x000000000000, 0x00007fff7fff] |
LowMem |
On a 64-bit operating system, the address space is 48 bits, which means the valid memory address ranges from 0x000000000000
to 0xffffffffffff
, where 0xffffffffffff
is $2^{48} - 1$. However, the upper half of the address space is reserved for kernel use only, so the valid memory address for user space is from 0x000000000000
to 0x7fffffffffff
, where 0x7fffffffffff
is $2^{47} - 1$. Then the shadow memory address space size must be $\frac{2^{47}}{8} = 2^{44}$. Because of the existence of shadow memory, there must be some memory regions among the shadow memory address space that shall never be accessed. Such memory regions are called ShadowGap. Such shadow gap region must be of size $\frac{2^{44}}{8} = 2^{41}$.
For some reason which I am not completely sure, AddressSanitizer chooses to use shadow memory to divide the application memory into two halves, HighMem and LowMem. This introduces the shadow memory offset. The offset used in the formula above is arbitrarily 0x7fff8000
. So the end address of the shadow memory must be 0x7fff8000 + 2^44 - 1 = 0x00007fff8000 + 0x100000000000 - 0x000000000001 = 0x10007fff8000 - 0x000000000001 = 0x10007fff7fff
. By applying the aforementioned equation that maps the application memory to the shadow memory, we find the shadow gap then starts from 0x7fff8000 >> 3 + 0x7fff8000 = 0x0ffff000 + 0x7fff8000 = 0x00008fff7000
and ends at 0x10007fff7fff >> 3 + 0x7fff8000 = 0x10007fff7fff + 0x7fff8000 = 0x02008fff6fff
. The shadow memory is divided into two halves, HighShadow and LowShadow, by the shadow gap.
These derivations justify the shadow memory layout shown in the table above. One thing that bothered me a little bit concerned is that 0x00007fff8000 ≈ 2^31
is a little bit too small whereas 0x10007fff8000 ≈ 2^44
is too large for a contemporary conventional computer. This means for contemporary conventional computer, when AddressSanitizer is used, the application could only use 2^31
bytes, i.e., 2 GB, of memory in the low memory region, whereas the high memory region is not usable at all. I wonder if AddressSanitizer would adjust the shadow memory offset depending on the actual memory size of the computer it runs on, so that the application could use more memory.
Stack Memory Instrumentation
AddressSanitizer replaces the malloc
and free
functions to poison and unpoison the memory allocated from the heap. Because stack memory does not use malloc
and free
, AddressSanitizer has to instrument the stack memory accesses in a different way. The approach is to add red zones around the buffer on the stack, which are poisoned.
For example, we might have an array on the stack like the following.
1 | void foo() { |
After AddressSanitizer instrumentation, it becomes like the following.
1 | void foo() { |
Unlike the not addressable heap memory access identification, adding red zones are necessary for the not addressable stack memory access identification. Because AddressSanitizer identifies memory access to the not addressable memory instead of the OOB memory, and the memory before and after the buffer on the stack are mostly always addressable, AddressSanitizer would not be able to detect illegal stack buffer memory access without red zones.
Also note that because the red zones instrumented are limited, AddressSanitizer might not be able to detect all the illegal stack buffer memory access. In the example above, if we do a[128]
, it would not be detected because a[128]
is outside all the red zones.
AddressSanitizer Usages
AddressSanitizer is supported by both GCC and Clang. It can also be enabled using high-level build tools such as CMake.
GCC/Clang
We have taken the OOB example from the previous blog post “Illegal Memory Access and Segmentation Fault” and used AddressSanitizer to detect the OOB memory access.
1 |
|
1 | $ g++ -fsanitize=address -g oob.cpp -o oob |
CMake
The following CMake build file could be used to build the OOB example above.
1 | cmake_minimum_required(VERSION 3.10) |
To build the project using AddressSanitizer, we could run the following commands. AddressSanitizer could be simply enabled by adding the compiler flag -fsanitize=address
and the linker flag -fsanitize=address
.
1 | $ cmake -DCMAKE_CXX_FLAGS="-fsanitize=address -g" -DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address" -B build |
To build the project without AddressSanitizer, we could run the following commands.
1 | # Remove the build directory if it already exists and run CMake build without AddressSanitizer flags. |
AddressSanitizer VS Valgrind
The AddressSanitizer versus other memory error checking tools, such as Valgrind, has been described in the AddressSanitizer GitHub Wiki.
The major advantages of AddressSanitizer over Valgrind is that it has much higher performance because it does code instrumentation at compile time and does not rely on a virtual machine to run the program like Valgrind. The major disadvantages of AddressSanitizer over Valgrind is that it requires recompilation of the program with AddressSanitizer enabled, which means the source code must be available, whereas Valgrind can be used to check memory errors of any binary executable.
Conclusions
If AddressSanitizer reports an error, it is a true error and requires fixing. However, AddressSanitizer may not catch all the memory errors because of its design limitations.
References
- AddressSanitizer - Clang
- AddressSanitizer - GitHub Wiki
- AddressSanitizer Algorithm - GitHub Wiki
- AddressSanitizer Comparison of Memory Tools - GitHub Wiki
- AddressSanitizer Call Stack - GitHub Wiki
- AddressSanitizer Known Issues - Microsoft
- Finding Races and Memory Errors with LLVM Instrumentation
- Finding Races and Memory Errors with LLVM Instrumentation - YouTube
AddressSanitizer