In a C++ project, to reduce code duplication and make the code more modular, we often pass functions as arguments to other functions. There are usually several ways to pass functions as arguments in C++: function pointer, std::function, and lambda function.
In this blog post, we will discuss the performance caveats of passing functions as arguments in C++.
Performances
We will call a simple and fast function add_one frequently to measure the performance of different ways of passing functions as arguments.
std::cout << "The size of a function pointer: " << sizeof(&add_one) << std::endl; std::cout << "The size of a std::function pointer: " << sizeof(&std_function_add_one) << std::endl; std::cout << "The size of a std::function: " << sizeof(std_function_add_one) << std::endl;
// Call function frequently in a vanilla way. // The compiler knows what function to call at compile time and can optimize // the code. // This is the best performance we could get. std::chrono::steady_clock::time_point const time_start_vanilla{ std::chrono::steady_clock::now()}; for (size_t i{0}; i < num_elements; ++i) { output_vector.at(i) = add_one(input_vector.at(i)); } std::chrono::steady_clock::time_point const time_end_vanilla{ std::chrono::steady_clock::now()}; autoconst time_elapsed_vanilla{ std::chrono::duration_cast<std::chrono::nanoseconds>(time_end_vanilla - time_start_vanilla) .count()}; floatconst latency_vanilla{time_elapsed_vanilla / static_cast<float>(num_elements)}; std::cout << "Latency Pass Vanilla: " << latency_vanilla << " ns" << std::endl; assert(validate_vector_add_one(input_vector, output_vector)); reset_vector(output_vector);
// Sometimes, we don't know what function to call at compile time. // We can use std::function to pass a function as an argument. // In this case, we pass the std::function by value. // Because the size of a std::function is 32 bytes, passing by value // results in a lot of copying and bad performance. std::chrono::steady_clock::time_point const time_start_pass_by_std_function_value{std::chrono::steady_clock::now()}; for (size_t i{0}; i < num_elements; ++i) { unitary_function_pass_by_std_function_value( output_vector.at(i), input_vector.at(i), std_function_add_one); } std::chrono::steady_clock::time_point const time_end_pass_by_std_function_value{std::chrono::steady_clock::now()}; autoconst time_elapsed_pass_by_std_function_value{ std::chrono::duration_cast<std::chrono::nanoseconds>( time_end_pass_by_std_function_value - time_start_pass_by_std_function_value) .count()}; floatconst latency_pass_by_std_function_value{ time_elapsed_pass_by_std_function_value / static_cast<float>(num_elements)}; std::cout << "Latency Pass By Std Function Value: " << latency_pass_by_std_function_value << " ns" << std::endl; assert(validate_vector_add_one(input_vector, output_vector)); reset_vector(output_vector);
// Instead of passing the std::function by value, we can pass it by // reference (pointer). In this case, object copying is eliminated. The // performance is better than passing the std::function by value. However, // the performance is still not as good as the vanilla way. std::chrono::steady_clock::time_point const time_start_pass_by_std_function_reference{ std::chrono::steady_clock::now()}; for (size_t i{0}; i < num_elements; ++i) { unitary_function_pass_by_std_function_reference( output_vector.at(i), input_vector.at(i), std_function_add_one); } std::chrono::steady_clock::time_point const time_end_pass_by_std_function_reference{ std::chrono::steady_clock::now()}; autoconst time_elapsed_pass_by_std_function_reference{ std::chrono::duration_cast<std::chrono::nanoseconds>( time_end_pass_by_std_function_reference - time_start_pass_by_std_function_reference) .count()}; floatconst latency_pass_by_std_function_reference{ time_elapsed_pass_by_std_function_reference / static_cast<float>(num_elements)}; std::cout << "Latency Pass By Std Function Reference: " << latency_pass_by_std_function_reference << " ns" << std::endl; assert(validate_vector_add_one(input_vector, output_vector)); reset_vector(output_vector);
// std::function is a general purpose wrapper for function pointers, // callable objects, and lambda functions. Because it's general purpose, // it's not as efficient as a function pointer. In this case, we pass a // function pointer to a function. The performance is better than passing // the std::function by reference. std::chrono::steady_clock::time_point const time_start_pass_by_function_pointer{std::chrono::steady_clock::now()}; for (size_t i{0}; i < num_elements; ++i) { unitary_function_pass_by_function_pointer(output_vector.at(i), input_vector.at(i), &add_one); } std::chrono::steady_clock::time_point const time_end_pass_by_function_pointer{std::chrono::steady_clock::now()}; autoconst time_elapsed_pass_by_function_pointer{ std::chrono::duration_cast<std::chrono::nanoseconds>( time_end_pass_by_function_pointer - time_start_pass_by_function_pointer) .count()}; floatconst latency_pass_by_function_pointer{ time_elapsed_pass_by_function_pointer / static_cast<float>(num_elements)}; std::cout << "Latency Pass By Function Pointer: " << latency_pass_by_function_pointer << " ns" << std::endl; assert(validate_vector_add_one(input_vector, output_vector)); reset_vector(output_vector);
// We can also pass a lambda function to a function. // The compiler knows what function to call at compile time and can optimize // the code. The performance is also better than passing the std::function // by reference. std::chrono::steady_clock::time_point const time_start_pass_by_lambda_function{std::chrono::steady_clock::now()}; for (size_t i{0}; i < num_elements; ++i) { unitary_function_pass_by_lambda_function( output_vector.at(i), input_vector.at(i), lambda_function_add_one); } std::chrono::steady_clock::time_point const time_end_pass_by_lambda_function{std::chrono::steady_clock::now()}; autoconst time_elapsed_pass_by_lambda_function{ std::chrono::duration_cast<std::chrono::nanoseconds>( time_end_pass_by_lambda_function - time_start_pass_by_lambda_function) .count()}; floatconst latency_pass_by_lambda_function{ time_elapsed_pass_by_lambda_function / static_cast<float>(num_elements)}; std::cout << "Latency Pass By Lambda Function: " << latency_pass_by_lambda_function << " ns" << std::endl; assert(validate_vector_add_one(input_vector, output_vector)); reset_vector(output_vector); }
When the code is compiled using GCC with -O0, the performance is as follows.
1 2 3 4 5 6 7 8 9 10
$ g++ function_call_performance.cpp -o function_call_performance -std=c++14 -O0 $ ./function_call_performance The size of a function pointer: 8 The size of a std::function pointer: 8 The size of a std::function: 32 Latency Pass Vanilla: 14.6 ns Latency Pass By Std Function Value: 84.4 ns Latency Pass By Std Function Reference: 39.4 ns Latency Pass By Function Pointer: 17.5 ns Latency Pass By Lambda Function: 17.5 ns
When the code is compiled using GCC with -O3, the performance is as follows.
1 2 3 4 5 6 7 8 9 10
$ g++ function_call_performance.cpp -o function_call_performance -std=c++14 -O3 $ ./function_call_performance The size of a function pointer: 8 The size of a std::function pointer: 8 The size of a std::function: 32 Latency Pass Vanilla: 0.354 ns Latency Pass By Std Function Value: 5.06 ns Latency Pass By Std Function Reference: 1.91 ns Latency Pass By Function Pointer: 0.387 ns Latency Pass By Lambda Function: 0.413 ns
Conclusion
Using std::function to pass functions as arguments is convenient. However, it’s not as efficient as using function pointers or lambda functions. If performance is critical, especially when the function is fast to compute and is called frequently, we should avoid using std::function.
For the best performance, we should use function pointers or lambda functions to pass functions as arguments.