But copy elision is not limited to those particular use cases. The question is where else can we take advantage of this optimization, and where the compiler will not be able to optimize away the extra copies.
In the last installment we saw that the compiler can elide copies when transferring data from a function to the returned value ([Named] Return Value Optimization) and that copies can also be optimized away when initializing local variables of automatic storage duration from temporaries. In principle, copy elision can be applied whenever a variable is constructed from a temporary, that includes to function arguments. But first, lets diverge a bit into a case where the copy elision is taken from granted by most programmers.
Copy elision in function arguments
As with the returned value from a function, the calling convention determines how arguments are passed into functions. So again lets start from a simple example:
arginternally, the compiler must ensure that there are two copies of the object one for
fuse and another for
main. Now, the function above does not need a copy, as it does not modify the object, so the common rule is to pass the object by constant reference and avoid the unnecessary copy. The reference itself will incur the approximate cost of a pointer in this case, so if
typeis expensive to copy we just improved performance.
This is the first advice you get when starting with C++: pass by reference to avoid the cost of copying. But what if you do need to copy anyway? Say that
fabove had to modify the object as part of the operation, what would be more efficient?
To analyze the two options we have to take a detour and understand what it actually means to bind an rvalue to a constant reference.
Binding an rvalue to a constant reference
In C you cannot take the address of a temporary, it is a safety net around what would most probably be an error: the temporary will be destroyed immediately after the expression completes, and then you will left with a dangling pointer. In a similar way, in C++ you cannot bind an rvalue to a reference. But sometimes it can be useful. For example, in the case of
fabove we determined that it would be more efficient to take the argument by reference, and we might want to call it by directly passing the result of an rvalue expression without having to create extra variables. That is why the language adds an special rule in the language (§8.3/5) to allow such binding:
a temporary of type “cv1 T1” is created and initialized from the initializer expression using the rules for a non-reference copy initialization (8.5). The reference is then bound to the temporary. If T1 is reference-related to T2, cv1 must be the same cv-qualification as, or greater cv qualification than, cv2; otherwise, the program is ill-formed.
The temporary variable
_Tmpis created, and the rvalue-expression result
_Ris copied. Finally the reference is bound to the local temporary
_Tmp. This is just a different variety of the same copy elision in the previous post, where there returned
_Rand the temporary variable
_Tmpare merged together. Then the lifetime of the temporary is extended until the reference goes out of scope. Conceptually, the temporary is equivalent in this context to a local variable injected by the compiler, and will have that same lifetime. There are interesting details that make a difference, but that is something for another day.
Pass-by-value or reference when you do need to make a copy
Going back to the
g2example, and assuming that the caller has a local object to pass in, the cost of either solution is roughly equivalent. As in the case of
fthe compiler will create a copy (
arg) when calling
g2, but because that copy is needed, the cost is equivalent to passing by reference and then copying into
g1. There is no extra cost in passing by value, but is there any advantage?
There can be under some circumstances. If the caller does not pass a variable, but rather the result of an rvalue expression, in the first case the reference will be bound to the result of the rvalue expression, and as we just saw that implies copying to another temporary
_Tmp. The compiler will elide that copy and pass the reference into
g1which will then copy it into
On the other hand, with
g2, the compiler can merge the result of the rvalue expression
At no time during the whole discussion we have actually dealt with the definition of the
typeclass, whether it is a small type like a
std::pair<int,int>, a large type with automatic storage like
std::array<int,1000>or an object that manages dynamically allocated resources like
std::vector<int>, the copy elision will be performed, and we will get the exact same optimization.
The compiler will elide copies when returning from a function and when calling a function that takes the argument by value, but can it do both?
Returning an argument passed by value
Sadly it cannot. The situation is simple to understand. The calling convention will determine the location of the argument and the returned value from the function, the compiler cannot place the argument and the returned value on the same memory location.
In the code above, even if only one object is really alive at any given point (excluding the copying), the compiler cannot optimize those the argument to the function with the return statement. While the current standard does not treat all cases for copy elision, the latest draft (n3290) of the upcoming standard explicitly states this in §12.8/31:
[...]This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):
-- in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
The next open question is whether this is the best we can hope for. After all we know that we only need one object in that program, but the language cannot help in removing the extra object and the potentially expensive associated resource copying. We cannot avoid having two objects in the program, but in specific use cases, if the cost of copying the object is not the object itself, but resources managed by it (think dynamically allocated memory), we still have a escape path: move semantics, but that's a story for another day...
Looking back at the number interface, what's the deal with
operator+? Why would it be defined as it is (take first argument by value, return by value) if that copy cannot be elided? Is it not better to pass by reference the first argument, create the local variable on which to operate and then return it?