With the language being designed with value semantics in mind, you would imagine that some optimizations are in place to avoid unneeded processing. In this category you can find things like [Named] Return Value Optimization, or copy elision in the current standard, or move semantics in the upcoming C++0x. Understanding what they mean, and when the compiler can or cannot apply them will improve efficiency and readability in the code.
There are quite a few of articles, blogs and what not about how to make your code more efficient. Most of them are good and offer sound advice, but you find out after a while that even the best advice is often misinterpreted. And sometimes the advice is wrong or at the very least misleading (you can try and find some in my last post now as an exercise). As important as the advice is understanding under which circumstances it applies, and when it doesn't. But lets start from the beginning, what does NRVO mean?
[Named] Return Value Optimization
We can start with a small code sample, and a drawing of the objects in the program:
The exact contents of
typedo not really matter much, the fact is that it is a user defined type which might potentially be expensive to copy. The drawing on the right represents the layout of the objects that exist in the program. The blue box represents the
Xresides. The grey box represents the code in
function, where variable
Yis created. According to the standard, the
returnstatement copies the value from
returnedobject, an agreed location where the data is to be handed from
functionto its caller. Where those objects are really depends on the calling convention, you can think of them as decreasing addresses in the stack if you wish, but what matters is that they are somewhere, and data must be copied from one to the other.
The standard explicitly allows the implementation to avoid the creation of temporaries, including
_R. But how is that done? Actually quite simple. When processing
functionthe compiler knows that
Yhas as only purpose in life to serve as the seed from which
_Ris copied, and the lifetimes of the two objects are intimately bound: destruction of
Yand construction of
_Rare basically simultaneous. The compiler can avoid creating two separate objects, and just use the same memory location for both.
_R. None, or both of them, or maybe it does not matter.
_Rcannot be removed from the program, as it is a contract with the callers of the program its location is outside of the control of the compiler when processing the function. But all the code in
Yso it cannot be removed either, unless the object called
functionis located over the agreed location of the
_Robject. The single object is both
The case of RVO is similar, in the case where the object that is being returned by the function is a temporary itself (without a name). The fact that the temporary does not have a name does not mean that it does not exist, it will take the place of
Yin the discussion above.
When can the compiler apply this optimization
The (unnamed) Return Value Optimization can basically applied mostly anywhere. The creation of the temporary inside
functioncan be done in place of the returned object. In the more complex case of Named-RVO, it all depends on how the function is implemented. Compilers are quite smart and can apply the optimization in many different cases, but not always.
To perform the optimization, the compiler needs to know that
Ywill be returned before deciding the location of the object, so that it can match
_R. In the most general case, this means that if a function has a single return statement, or all return statements refer to the same named variable (or possibly a temporary), then the compiler can merge
_Rinto a single object.
In this case, the compiler must create both variables
Yand it must do so before calling
should_return_first. Until that function returns the compiler does not know which of the objects is to be returned. It cannot merge either with
In this last case, there are two local objects might be returned by the function, but the compiler does know which of the two objects will be returned, and can turn the code into the equivalent:
In which in each branch of the if the compiler can place either
Yin the place of
_R(for what matters, the compiler might even be able to avoid the creation of the other variable). Still, your best chance is to keep code simple, create the objects only when you need them and keep your functions simple for the compiler to analyze.
What about the receiving end?
Originally we started with three objects and the compiler optimized with NRVO one of them out. But what about the other? In our program we only need one object, we added a function to factor out some of the complexity into its own reusable piece of code, but we do not want to pay for an extra
typeobject that we do not need.
As with RVO, when processing the caller, and in this particular case where the temporary object
_Ris used to copy-construct a local object, the compiler can follow the same line of reasoning: since the only purpose of the temporary is to serve as the source for
Xand the lifetimes only overlap during the copy, it can merge
X. Now or program has a single object:
Summing it up
C++ is a language with value semantics, and that means that there might be potentially many objects in your program being copied here and there, in particular across function calls. This does not mean that you should not factor out code into functions, or that you should refactor your function signatures to avoid return copies in favor of references, this might actually have a negative impact in behavior. The compiler is there to help you avoid the costs of copies, and it does a good job at it. Never depend on side effects from copy constructors, as small changes in the code inside a function might allow or inhibit NRVO.
Value semantics is a hot topic in the language, more so with the inclusion of r-value-references and move semantics to the upcoming standard. Expect to read more on the subject.
What about arguments to functions (rather than returned values)? Can copies be optimized there? Under which circumstances? Can the interface of the
numbertype in last week's post be improved to allow for some optimization? Is there anything that won't help?