Wednesday, August 10, 2011

Value semantics: Copy elision

In the last post we discussed Named Return Value Optimization and one case of copy elision when constructing a variable from an rvalue expression.

But copy elision is not limited to those particular use cases. The question is where else can we take advantage of this optimization, and where the compiler will not be able to optimize away the extra copies.

In the last installment we saw that the compiler can elide copies when transferring data from a function to the returned value ([Named] Return Value Optimization) and that copies can also be optimized away when initializing local variables of automatic storage duration from temporaries. In principle, copy elision can be applied whenever a variable is constructed from a temporary, that includes to function arguments. But first, lets diverge a bit into a case where the copy elision is taken from granted by most programmers.

Copy elision in function arguments

As with the returned value from a function, the calling convention determines how arguments are passed into functions. So again lets start from a simple example:

Because the function may modify arg internally, the compiler must ensure that there are two copies of the object one for f use and another for main. Now, the function above does not need a copy, as it does not modify the object, so the common rule is to pass the object by constant reference and avoid the unnecessary copy. The reference itself will incur the approximate cost of a pointer in this case, so if type is expensive to copy we just improved performance.

This is the first advice you get when starting with C++: pass by reference to avoid the cost of copying. But what if you do need to copy anyway? Say that f above had to modify the object as part of the operation, what would be more efficient?

To analyze the two options we have to take a detour and understand what it actually means to bind an rvalue to a constant reference.

Binding an rvalue to a constant reference

In C you cannot take the address of a temporary, it is a safety net around what would most probably be an error: the temporary will be destroyed immediately after the expression completes, and then you will left with a dangling pointer. In a similar way, in C++ you cannot bind an rvalue to a reference. But sometimes it can be useful. For example, in the case of f above we determined that it would be more efficient to take the argument by reference, and we might want to call it by directly passing the result of an rvalue expression without having to create extra variables. That is why the language adds an special rule in the language (§8.3/5) to allow such binding:
a temporary of type “cv1 T1” is created and initialized from the initializer expression using the rules for a non-reference copy initialization (8.5). The reference is then bound to the temporary. If T1 is reference-related to T2, cv1 must be the same cv-qualification as, or greater cv qualification than, cv2; otherwise, the program is ill-formed.

The temporary variable _Tmp is created, and the rvalue-expression result _R is copied. Finally the reference is bound to the local temporary _Tmp. This is just a different variety of the same copy elision in the previous post, where there returned _R and the temporary variable _Tmp are merged together. Then the lifetime of the temporary is extended until the reference goes out of scope. Conceptually, the temporary is equivalent in this context to a local variable injected by the compiler, and will have that same lifetime. There are interesting details that make a difference, but that is something for another day.

Pass-by-value or reference when you do need to make a copy

Going back to the g1 and g2 example, and assuming that the caller has a local object to pass in, the cost of either solution is roughly equivalent. As in the case of f the compiler will create a copy (arg) when calling g2, but because that copy is needed, the cost is equivalent to passing by reference and then copying into copy inside g1. There is no extra cost in passing by value, but is there any advantage?

There can be under some circumstances. If the caller does not pass a variable, but rather the result of an rvalue expression, in the first case the reference will be bound to the result of the rvalue expression, and as we just saw that implies copying to another temporary _Tmp. The compiler will elide that copy and pass the reference into g1 which will then copy it into copy.

On the other hand, with g2, the compiler can merge the result of the rvalue expression _R with arg.

By using value semantics in the function signature we are providing the compiler with extra information about the semantics of our function. When the compiler processes our caller function, it can construct the temporary in place of the argument and avoid the cost of the copy. More information to the compiler usually means greater chances of optimization, and this is one such case.

At no time during the whole discussion we have actually dealt with the definition of the type class, whether it is a small type like a std::pair<int,int>, a large type with automatic storage like std::array<int,1000> or an object that manages dynamically allocated resources like std::vector<int>, the copy elision will be performed, and we will get the exact same optimization.

The compiler will elide copies when returning from a function and when calling a function that takes the argument by value, but can it do both?

Returning an argument passed by value

Sadly it cannot. The situation is simple to understand. The calling convention will determine the location of the argument and the returned value from the function, the compiler cannot place the argument and the returned value on the same memory location.

In the code above, even if only one object is really alive at any given point (excluding the copying), the compiler cannot optimize those the argument to the function with the return statement. While the current standard does not treat all cases for copy elision, the latest draft (n3290) of the upcoming standard explicitly states this in §12.8/31:
[...]This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):

-- in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value

The next open question is whether this is the best we can hope for. After all we know that we only need one object in that program, but the language cannot help in removing the extra object and the potentially expensive associated resource copying. We cannot avoid having two objects in the program, but in specific use cases, if the cost of copying the object is not the object itself, but resources managed by it (think dynamically allocated memory), we still have a escape path: move semantics, but that's a story for another day...

Looking back at the number interface, what's the deal with operator+? Why would it be defined as it is (take first argument by value, return by value) if that copy cannot be elided? Is it not better to pass by reference the first argument, create the local variable on which to operate and then return it?


  1. In picture "Before copy elision", I think is:
    x -> g::_R -> arg -> f::_R.
    And I'm not clear about "We cannot avoid having two objects in the program...", Is that because the standard does restrict the circumstance to use copy elision?


  2. Maybe the picture is not clear enough, so I will try describing it. In a fully unoptimized compiler, inside main, the compiler reserves space for the variable x and also for the returned object from the call to f(). It then calls f() that will fill in f::_R in the picture. At this point there is a sequence point (all side effects of the call to f() (i.e. updating f::_R) complete. The compiler then creates space for the argument and returned objects in g(). It copies the value of f::_R into arg and calls g().

    Regarding the "We cannot avoid having two objects in the program..." it is not just limited by the circumstances under which copy elision can take place. In C++11 the language explicitly states that the copy from the argument to the returned object cannot be elided, but C++03 did not have such wording.

    The core reason why the copy from the argument to the returned value cannot be elided is that C and C++ have a separate compilation model, which means that while processing the code that performs the call to the function, the implementation of the function need not be available. If the implementation is available, the compiler is able to inline and it can do anything (including using a single object in the program), but in general that is not the case. Assuming that the implementation is not available, the call to the function is done by following the calling conventions for the platform, and that calling convention determines how each argument is passed in and how the return value comes out, and this means that the compiler is not free to relocate the two in the same memory location.

    A common calling convention for returning large objects (larger than a register) has the caller reserve space for the return object and pass a pointer to that location as a hidden argument. The caller is also responsible for making the copy of the object for the argument. At this point it can be argued that the caller could make the copy for the argument in the stack and pass a pointer to that object, but that would require knowing that the returned object is the parameter.