C++ Function Call Performance

Introduction

In a C++ project, to reduce code duplication and make the code more modular, we often pass functions as arguments to other functions. There are usually several ways to pass functions as arguments in C++: function pointer, std::function, and lambda function.

In this blog post, we will discuss the performance caveats of passing functions as arguments in C++.

Performances

We will call a simple and fast function add_one frequently to measure the performance of different ways of passing functions as arguments.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
#include <cassert>
#include <chrono>
#include <functional>
#include <iostream>
#include <vector>

int add_one(int input) { return input + 1; }

bool validate_vector_add_one(std::vector<int> const& input_vector,
std::vector<int> const& output_vector)
{
bool is_valid{true};
for (size_t i{0}; i < input_vector.size(); ++i)
{
if (output_vector.at(i) != input_vector.at(i) + 1)
{
is_valid = false;
break;
}
}
return is_valid;
}

void reset_vector(std::vector<int>& input_vector)
{
for (size_t i{0}; i < input_vector.size(); ++i)
{
input_vector.at(i) = 0;
}
}

template <typename T, typename Func>
void unitary_function_pass_by_lambda_function(T& output, T const& input,
Func const func)
{
output = func(input);
}

template <typename T>
void unitary_function_pass_by_std_function_value(T& output, T const& input,
std::function<T(T)> const func)
{
output = func(input);
}

template <typename T>
void unitary_function_pass_by_std_function_reference(
T& output, T const& input, std::function<T(T)> const& func)
{
output = func(input);
}

template <typename T>
void unitary_function_pass_by_function_pointer(T& output, T const& input,
T (*func)(T))
{
output = func(input);
}

int main()
{
// Set floating point format std::cout with 3 decimal places.
std::cout.precision(3);

size_t const num_elements{10000000};
std::vector<int> input_vector(num_elements, 0);
std::vector<int> output_vector(num_elements, 0);

auto const lambda_function_add_one{[](int const& input) -> int
{ return input + 1; }};
std::function<int(int)> const std_function_add_one{lambda_function_add_one};

std::cout << "The size of a function pointer: " << sizeof(&add_one)
<< std::endl;
std::cout << "The size of a std::function pointer: "
<< sizeof(&std_function_add_one) << std::endl;
std::cout << "The size of a std::function: " << sizeof(std_function_add_one)
<< std::endl;

// Call function frequently in a vanilla way.
// The compiler knows what function to call at compile time and can optimize
// the code.
// This is the best performance we could get.
std::chrono::steady_clock::time_point const time_start_vanilla{
std::chrono::steady_clock::now()};
for (size_t i{0}; i < num_elements; ++i)
{
output_vector.at(i) = add_one(input_vector.at(i));
}
std::chrono::steady_clock::time_point const time_end_vanilla{
std::chrono::steady_clock::now()};
auto const time_elapsed_vanilla{
std::chrono::duration_cast<std::chrono::nanoseconds>(time_end_vanilla -
time_start_vanilla)
.count()};
float const latency_vanilla{time_elapsed_vanilla /
static_cast<float>(num_elements)};
std::cout << "Latency Pass Vanilla: " << latency_vanilla << " ns"
<< std::endl;
assert(validate_vector_add_one(input_vector, output_vector));
reset_vector(output_vector);

// Sometimes, we don't know what function to call at compile time.
// We can use std::function to pass a function as an argument.
// In this case, we pass the std::function by value.
// Because the size of a std::function is 32 bytes, passing by value
// results in a lot of copying and bad performance.
std::chrono::steady_clock::time_point const
time_start_pass_by_std_function_value{std::chrono::steady_clock::now()};
for (size_t i{0}; i < num_elements; ++i)
{
unitary_function_pass_by_std_function_value(
output_vector.at(i), input_vector.at(i), std_function_add_one);
}
std::chrono::steady_clock::time_point const
time_end_pass_by_std_function_value{std::chrono::steady_clock::now()};
auto const time_elapsed_pass_by_std_function_value{
std::chrono::duration_cast<std::chrono::nanoseconds>(
time_end_pass_by_std_function_value -
time_start_pass_by_std_function_value)
.count()};
float const latency_pass_by_std_function_value{
time_elapsed_pass_by_std_function_value /
static_cast<float>(num_elements)};
std::cout << "Latency Pass By Std Function Value: "
<< latency_pass_by_std_function_value << " ns" << std::endl;
assert(validate_vector_add_one(input_vector, output_vector));
reset_vector(output_vector);

// Instead of passing the std::function by value, we can pass it by
// reference (pointer). In this case, object copying is eliminated. The
// performance is better than passing the std::function by value. However,
// the performance is still not as good as the vanilla way.
std::chrono::steady_clock::time_point const
time_start_pass_by_std_function_reference{
std::chrono::steady_clock::now()};
for (size_t i{0}; i < num_elements; ++i)
{
unitary_function_pass_by_std_function_reference(
output_vector.at(i), input_vector.at(i), std_function_add_one);
}
std::chrono::steady_clock::time_point const
time_end_pass_by_std_function_reference{
std::chrono::steady_clock::now()};
auto const time_elapsed_pass_by_std_function_reference{
std::chrono::duration_cast<std::chrono::nanoseconds>(
time_end_pass_by_std_function_reference -
time_start_pass_by_std_function_reference)
.count()};
float const latency_pass_by_std_function_reference{
time_elapsed_pass_by_std_function_reference /
static_cast<float>(num_elements)};
std::cout << "Latency Pass By Std Function Reference: "
<< latency_pass_by_std_function_reference << " ns" << std::endl;
assert(validate_vector_add_one(input_vector, output_vector));
reset_vector(output_vector);

// std::function is a general purpose wrapper for function pointers,
// callable objects, and lambda functions. Because it's general purpose,
// it's not as efficient as a function pointer. In this case, we pass a
// function pointer to a function. The performance is better than passing
// the std::function by reference.
std::chrono::steady_clock::time_point const
time_start_pass_by_function_pointer{std::chrono::steady_clock::now()};
for (size_t i{0}; i < num_elements; ++i)
{
unitary_function_pass_by_function_pointer(output_vector.at(i),
input_vector.at(i), &add_one);
}
std::chrono::steady_clock::time_point const
time_end_pass_by_function_pointer{std::chrono::steady_clock::now()};
auto const time_elapsed_pass_by_function_pointer{
std::chrono::duration_cast<std::chrono::nanoseconds>(
time_end_pass_by_function_pointer -
time_start_pass_by_function_pointer)
.count()};
float const latency_pass_by_function_pointer{
time_elapsed_pass_by_function_pointer /
static_cast<float>(num_elements)};
std::cout << "Latency Pass By Function Pointer: "
<< latency_pass_by_function_pointer << " ns" << std::endl;
assert(validate_vector_add_one(input_vector, output_vector));
reset_vector(output_vector);

// We can also pass a lambda function to a function.
// The compiler knows what function to call at compile time and can optimize
// the code. The performance is also better than passing the std::function
// by reference.
std::chrono::steady_clock::time_point const
time_start_pass_by_lambda_function{std::chrono::steady_clock::now()};
for (size_t i{0}; i < num_elements; ++i)
{
unitary_function_pass_by_lambda_function(
output_vector.at(i), input_vector.at(i), lambda_function_add_one);
}
std::chrono::steady_clock::time_point const
time_end_pass_by_lambda_function{std::chrono::steady_clock::now()};
auto const time_elapsed_pass_by_lambda_function{
std::chrono::duration_cast<std::chrono::nanoseconds>(
time_end_pass_by_lambda_function -
time_start_pass_by_lambda_function)
.count()};
float const latency_pass_by_lambda_function{
time_elapsed_pass_by_lambda_function /
static_cast<float>(num_elements)};
std::cout << "Latency Pass By Lambda Function: "
<< latency_pass_by_lambda_function << " ns" << std::endl;
assert(validate_vector_add_one(input_vector, output_vector));
reset_vector(output_vector);
}

When the code is compiled using GCC with -O0, the performance is as follows.

1
2
3
4
5
6
7
8
9
10
$ g++ function_call_performance.cpp -o function_call_performance -std=c++14 -O0
$ ./function_call_performance
The size of a function pointer: 8
The size of a std::function pointer: 8
The size of a std::function: 32
Latency Pass Vanilla: 14.6 ns
Latency Pass By Std Function Value: 84.4 ns
Latency Pass By Std Function Reference: 39.4 ns
Latency Pass By Function Pointer: 17.5 ns
Latency Pass By Lambda Function: 17.5 ns

When the code is compiled using GCC with -O3, the performance is as follows.

1
2
3
4
5
6
7
8
9
10
$ g++ function_call_performance.cpp -o function_call_performance -std=c++14 -O3
$ ./function_call_performance
The size of a function pointer: 8
The size of a std::function pointer: 8
The size of a std::function: 32
Latency Pass Vanilla: 0.354 ns
Latency Pass By Std Function Value: 5.06 ns
Latency Pass By Std Function Reference: 1.91 ns
Latency Pass By Function Pointer: 0.387 ns
Latency Pass By Lambda Function: 0.413 ns

Conclusion

Using std::function to pass functions as arguments is convenient. However, it’s not as efficient as using function pointers or lambda functions. If performance is critical, especially when the function is fast to compute and is called frequently, we should avoid using std::function.

For the best performance, we should use function pointers or lambda functions to pass functions as arguments.

References

Author

Lei Mao

Posted on

11-13-2023

Updated on

11-13-2023

Licensed under


Comments