C++ move semantics is extremely helpful for making program efficient by eliminating unnecessary large memory copies. However, in some scenarios, move semantics will not be effective.
In this blog post, I would like to discuss how to make move semantics effective.
Profile Environment
We will run some profiling for our example programs. Docker image gcc:9.4.0 for profiling. The test platform uses Intel Core i9-9900K CPU.
When move semantics is not helpful, copy semantics could be or will be used instead.
Copy Source Is lvalue
The implementation has to use std::move for lvalue to invoke move semantics, including move constructor and move assignment. Otherwise, copy semantics, including copy constructor and copy assignment, is invoked.
It should be noted that the return type of std::move is a rvalue reference which has to match the move constructor and the move assignment input types.
Type Offers No Move Support
When there is no move semantics, including move constructor and move assignment, for a type, move semantics cannot be invoked even if std::move has been used, copy semantics will be invoked instead.
There are many scenarios where there is no move support for certain types. For example, built-in primitive types, such as int, does not have move support, default move constructors and move assignments could be implicitly deleted under some circumstances, the type has been declared with const, etc.
Notice that std::move can be used for any lvalue, even if the lvalue does not have move support.
no_move_support.cpp
1 2 3 4 5 6 7
#include<utility>
intmain() { int a = 10; int b = std::move(a); }
In the above example, std::move(a) returns rvalue reference of type int&&. Because int has no move support. It will be converted to type const int& to use copy semantics instead.
Let’s profile the copy and move of strings of different lengths.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
$ g++ sso.cpp -o sso -std=c++14 $ ./sso 7 String length: 7 String copy assignment average time: 14.0527[ns] String move assignment average time: 13.7724[ns] $ ./sso 15 String length: 15 String copy assignment average time: 13.895[ns] String move assignment average time: 13.3158[ns] $ ./sso 50 String length: 50 String copy assignment average time: 48.3205[ns] String move assignment average time: 10.7449[ns] $ ./sso 100 String length: 100 String copy assignment average time: 60.3347[ns] String move assignment average time: 10.4812[ns]
Probably because GCC does SSO for small std::string, we could see that the move assignment performance for small strings is almost the same as the copy assignment. However, the move assignment performance for small strings is even worse than the performance for long strings.
Move Unusable
Let’s investigate the consequence of not having noexcept for move constructor (and move assignment).
The noexcept_move.cpp implemented CustomString with move constructor with noexcept. We then tried to push_back and emplace_backCustomString instances into a std::vector.
std::chrono::steady_clock::time_point t_push_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_1.push_back(custom_string); } std::chrono::steady_clock::time_point t_push_back_end = std::chrono::steady_clock::now(); std::chrono::steady_clock::time_point t_emplace_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_2.emplace_back(std_string); } std::chrono::steady_clock::time_point t_emplace_back_end = std::chrono::steady_clock::now(); std::cout << "String length: " << len_string << std::endl; std::cout << "Custom string push back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_push_back_end - t_push_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; std::cout << "Custom string emplace back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_emplace_back_end - t_emplace_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; }
We could see that on average, push_back takes 153 ns per element and emplace_back takes 168 ns per element, for a std::vector consisting of 10000 CustomString elements.
1 2 3 4 5
$ g++ noexcept_move.cpp -o noexcept_move -std=c++14 $ ./noexcept_move String length: 100 Custom string push back average time: 153.895[ns] Custom string emplace back average time: 168.498[ns]
The except_move.cpp implemented CustomString with move constructor without noexcept. Except noexcept, except_move.cpp is identical to noexcept_move.cpp.
std::chrono::steady_clock::time_point t_push_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_1.push_back(custom_string); } std::chrono::steady_clock::time_point t_push_back_end = std::chrono::steady_clock::now(); std::chrono::steady_clock::time_point t_emplace_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_2.emplace_back(std_string); } std::chrono::steady_clock::time_point t_emplace_back_end = std::chrono::steady_clock::now(); std::cout << "String length: " << len_string << std::endl; std::cout << "Custom string push back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_push_back_end - t_push_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; std::cout << "Custom string emplace back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_emplace_back_end - t_emplace_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; }
This time, on average, push_back takes 187 ns per element and emplace_back takes 211 ns per element, for a std::vector consisting of 10000 CustomString elements.
1 2 3 4 5
$ g++ except_move.cpp -o except_move -std=c++14 $ ./except_move String length: 100 Custom string push back average time: 187.617[ns] Custom string emplace back average time: 211.423[ns]
Both push_back and emplace_back from except_move.cpp are much slower than the ones from noexcept_move.cpp.
Let’s try to see what’s happening here by printing out the constructors being called and reducing the number of test iterations.
Now we only insert two CustomString instances into std::vector in noexcept_move.cpp.
std::chrono::steady_clock::time_point t_push_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_1.push_back(custom_string); } std::chrono::steady_clock::time_point t_push_back_end = std::chrono::steady_clock::now(); std::chrono::steady_clock::time_point t_emplace_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_2.emplace_back(std_string); } std::chrono::steady_clock::time_point t_emplace_back_end = std::chrono::steady_clock::now(); std::cout << "String length: " << len_string << std::endl; std::cout << "Custom string push back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_push_back_end - t_push_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; std::cout << "Custom string emplace back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_emplace_back_end - t_emplace_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; }
1 2 3 4 5 6 7 8 9 10 11 12
$ g++ noexcept_move.cpp -o noexcept_move -std=c++14 $ ./noexcept_move Explicit constructor being called. Copy constructor being called. Copy constructor being called. Move constructor being called. Explicit constructor being called. Explicit constructor being called. Move constructor being called. String length: 100 Custom string push back average time: 3255[ns] Custom string emplace back average time: 10184.5[ns]
Similarly, we only insert two CustomString instances into std::vector in except_move.cpp.
std::chrono::steady_clock::time_point t_push_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_1.push_back(custom_string); } std::chrono::steady_clock::time_point t_push_back_end = std::chrono::steady_clock::now(); std::chrono::steady_clock::time_point t_emplace_back_begin = std::chrono::steady_clock::now(); for (size_t i = 0; i < num_strings; ++i) { vec_strings_source_2.emplace_back(std_string); } std::chrono::steady_clock::time_point t_emplace_back_end = std::chrono::steady_clock::now(); std::cout << "String length: " << len_string << std::endl; std::cout << "Custom string push back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_push_back_end - t_push_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; std::cout << "Custom string emplace back average time: " << std::chrono::duration_cast<std::chrono::nanoseconds>( t_emplace_back_end - t_emplace_back_begin) .count() / static_cast<float>(num_strings) << "[ns]" << std::endl; }
1 2 3 4 5 6 7 8 9 10 11 12
$ g++ except_move.cpp -o except_move -std=c++14 $ ./except_move Explicit constructor being called. Copy constructor being called. Copy constructor being called. Copy constructor being called. Explicit constructor being called. Explicit constructor being called. Copy constructor being called. String length: 100 Custom string push back average time: 2088[ns] Custom string emplace back average time: 1439[ns]
We could see the difference is that during push_back or emplace_back, the std::vector has to resize to have a larger buffer and all the elements has to be migrated from the old smaller buffer to the new larger buffer. With noexcept for the move constructor, move constructor was called for the migration, whereas with noexcept for the move constructor, copy constructor, instead of move constructor, was called for the migration.
The major reason behind this phenomenon is that C++ STL enforces strong exception safety guarantees for std::vectorpush_back and emplace_back. During the std::vector data migration from old smaller buffer to new larger buffer, each element in data has to be either copied using copy constructor or moved using move constructor. If move constructor is not declared with noexcept, if an exception is thrown during move, there could be data loss from the source and there will be no way to recover it. Copy constructor will not suffer from data loss from the source even if an exception is thrown during copy. Because C++ STL enforces strong exception safety guarantees for std::vectorpush_back and emplace_back, when move constructor is not declared with noexcept, copy constructor, instead of move constructor, is used for data migration. Only when move constructor is declared with noexcept, move constructor will be used for data migration.
Conclusions
To make your C++ program have better performance, try to make your move semantics effective, especially by making the move constructor and move assignment noexcept when working with C++ STL containers.