Wednesday, August 3, 2011

Value semantics: NRVO

C++ is a language with value semantics. It was designed so that user defined types will behave in the same ways that primitive types do. This offers advantages, but also imposes burdens on development: programmers have to implement those semantics, and it also falls in the programmers to decide how parameters are passed in and out of functions and the impact that has.

With the language being designed with value semantics in mind, you would imagine that some optimizations are in place to avoid unneeded processing. In this category you can find things like [Named] Return Value Optimization, or copy elision in the current standard, or move semantics in the upcoming C++0x. Understanding what they mean, and when the compiler can or cannot apply them will improve efficiency and readability in the code.

There are quite a few of articles, blogs and what not about how to make your code more efficient. Most of them are good and offer sound advice, but you find out after a while that even the best advice is often misinterpreted. And sometimes the advice is wrong or at the very least misleading (you can try and find some in my last post now as an exercise). As important as the advice is understanding under which circumstances it applies, and when it doesn't. But lets start from the beginning, what does NRVO mean?

[Named] Return Value Optimization

We can start with a small code sample, and a drawing of the objects in the program:


The exact contents of type do not really matter much, the fact is that it is a user defined type which might potentially be expensive to copy. The drawing on the right represents the layout of the objects that exist in the program. The blue box represents the main function, where X resides. The grey box represents the code in function, where variable Y is created. According to the standard, the return statement copies the value from Y to _R the returned object, an agreed location where the data is to be handed from function to its caller. Where those objects are really depends on the calling convention, you can think of them as decreasing addresses in the stack if you wish, but what matters is that they are somewhere, and data must be copied from one to the other.

The standard explicitly allows the implementation to avoid the creation of temporaries, including _R. But how is that done? Actually quite simple. When processing function the compiler knows that Y has as only purpose in life to serve as the seed from which _R is copied, and the lifetimes of the two objects are intimately bound: destruction of Y and construction of _R are basically simultaneous. The compiler can avoid creating two separate objects, and just use the same memory location for both.

The first question that comes to mind, is which object is not created, Y or _R. None, or both of them, or maybe it does not matter. _R cannot be removed from the program, as it is a contract with the callers of the program its location is outside of the control of the compiler when processing the function. But all the code in function uses Y so it cannot be removed either, unless the object called Y inside function is located over the agreed location of the _R object. The single object is both Y and _R.

The case of RVO is similar, in the case where the object that is being returned by the function is a temporary itself (without a name). The fact that the temporary does not have a name does not mean that it does not exist, it will take the place of Y in the discussion above.

When can the compiler apply this optimization

The (unnamed) Return Value Optimization can basically applied mostly anywhere. The creation of the temporary inside function can be done in place of the returned object. In the more complex case of Named-RVO, it all depends on how the function is implemented. Compilers are quite smart and can apply the optimization in many different cases, but not always.

To perform the optimization, the compiler needs to know that Y will be returned before deciding the location of the object, so that it can match _R. In the most general case, this means that if a function has a single return statement, or all return statements refer to the same named variable (or possibly a temporary), then the compiler can merge Y with _R into a single object.

In this case, the compiler must create both variables X and Y and it must do so before calling should_return_first. Until that function returns the compiler does not know which of the objects is to be returned. It cannot merge either with _R.

In this last case, there are two local objects might be returned by the function, but the compiler does know which of the two objects will be returned, and can turn the code into the equivalent:

In which in each branch of the if the compiler can place either X or Y in the place of _R (for what matters, the compiler might even be able to avoid the creation of the other variable). Still, your best chance is to keep code simple, create the objects only when you need them and keep your functions simple for the compiler to analyze.

What about the receiving end?

Originally we started with three objects and the compiler optimized with NRVO one of them out. But what about the other? In our program we only need one object, we added a function to factor out some of the complexity into its own reusable piece of code, but we do not want to pay for an extra type object that we do not need.

As with RVO, when processing the caller, and in this particular case where the temporary object _R is used to copy-construct a local object, the compiler can follow the same line of reasoning: since the only purpose of the temporary is to serve as the source for X and the lifetimes only overlap during the copy, it can merge _R with X. Now or program has a single object:


Summing it up

C++ is a language with value semantics, and that means that there might be potentially many objects in your program being copied here and there, in particular across function calls. This does not mean that you should not factor out code into functions, or that you should refactor your function signatures to avoid return copies in favor of references, this might actually have a negative impact in behavior. The compiler is there to help you avoid the costs of copies, and it does a good job at it. Never depend on side effects from copy constructors, as small changes in the code inside a function might allow or inhibit NRVO.

Value semantics is a hot topic in the language, more so with the inclusion of r-value-references and move semantics to the upcoming standard. Expect to read more on the subject.

What about arguments to functions (rather than returned values)? Can copies be optimized there? Under which circumstances? Can the interface of the number type in last week's post be improved to allow for some optimization? Is there anything that won't help?

3 comments:

  1. Thanks David for the interesting post. Question: in the example you gave:

    type function( bool which ) {
    type x,y;
    if ( which ) {
    return x;
    } else {
    return y;
    }
    }

    how does the compiler know about the fate of the 'if' statement, if 'which' is not a constant ?

    ReplyDelete
  2. @Amirhossein: It does not need to know which value it will use, but for simple small functions it can perform the transformation shown in the next block of code in the post: move the 'if' to be the first expression, then in each of the two branches which object 'x' or 'y' to be returned is fixed.

    Note that the compiler does not need to know beforehand which branch it will take. It only needs to know when creating the variable on which the copy elision will be taken.

    A common implementation of a function returning by value in the calling conventions is a function that returns nothing and takes a pointer to the location where the object will be created. Maybe looking at it that way helps you understand that particular case:

    // type function(bool which):
    void __function(void *ret, bool which) {
    if (which) {
    new (ret) type; // x
    type y;
    return; // return x
    } else {
    type x;
    new (ret) type; // y
    return; // return y
    }
    }

    (excuse the poor formatting as it does not seem to allow code formatting in comments)

    ReplyDelete
  3. Hi David,
    "But all the code in function uses Y so it cannot be removed either, unless the object called Y inside function is located over the agreed location of the _R object."
    I did not understand above paragraph.
    thanks for very good article.

    Regards,
    Vinay

    ReplyDelete