tag:blogger.com,1999:blog-69875140893637227502024-02-07T03:10:42.170-05:00[Un]defined behaviorDavid Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-6987514089363722750.post-61421145028510332892011-09-08T21:20:00.002-04:002011-09-09T06:51:18.243-04:00The rule of the [3] 4: swapObject oriented programming focuses on bundling together the data with the operations that can be applied to them. Access specifiers allow you to control the invariants of the type, and deterministic destruction of objects provides the glue that makes management of all types of resources.<br />
<br />
But <i>homogeneous</i> does not mean <i>trivial</i>, and attention has to be paid. In a language with exceptions special care needs to be taken so that resources don't leak, but resources are not the only problem, when we model a domain, we need to maintain the <i>invariants</i> on top of which our algorithms are built.<br />
<a name='more'></a><br />
<b>Managing resources: <i>The Rule of the Three</i></b><br />
<br />
One of the widely known rules for resource management is the <i>rule of the three</i>, that according to the wikipedia <a href="http://en.wikipedia.org/wiki/Rule_of_three_(C%2B%2B_programming)">article</a> says:<br />
<br />
<blockquote>[...] if a class defines one of the following it should probably explicitly define all three: destructor, copy constructor, copy assignment operator</blockquote><br />
The rule is based on the principle that if any of the three needs to be implemented, it is most probably because the class encapsulates management of a resource for which you need to provide sensible <i>copy-semantics</i> and <i>release</i> in the destructor.<br />
<br />
Optionally if you encapsulate each resource in it's own RAII manager, as you should specially if you have to manage more than one resource, then you can avoid the addition of the destructor.<br />
<br />
Whenever I have encountered the rule of the three, my next thought was for <code>swap</code>, not the one implemented in the <code>std</code> namespace, but rather a handcrafted <code>swap</code> function that provides the <code>no throw</code> exception guarantee and is <i>cheap</i> to run. The main point of providing <code>swap</code> is implementing exception safety by means of the <a href="http://en.wikibooks.org/wiki/More_C++_Idioms/Copy-and-swap"><i>copy-and-swap</i></a> idiom.<br />
<br />
<b>The copy and swap idiom</b><br />
<br />
This is a common idiom to implement exception safety in general, but more importantly in types that manage resources. The idea is that instead of executing the operations directly on the object we can <i>copy</i> the object, perform the operation aside and then, and only after the operation has completed successfully, <i>swap</i> the contents of the two objects. We provide the operation as a single <i>transaction</i>: either it succeeds and modifies the object or it fails and leaves the object as it was before the operation started.<br />
<br />
Consider for example the <code>number</code> that we sketched the other day, and assume that it is a <a href="http://en.wikipedia.org/wiki/Bignum">big number</a>, usually implemented with a dynamic array of <i>digits</i><sup><small>1</small></sup>. Addition of a big number to our existing big number (<code>operator+=</code>) may require growing of the array to accomodate a potentially larger number, and performing the operation<sup><small>2</small></sup> on the bigger array.<br />
<br />
If while performing the actual operations anything failed and an exception is thrown (there aren't many things that can go wrong with the addition of the individual digits but bear with me), we would have to take care of ensuring that there are no memory leaks and that the object is left in a consistent state. To ensure that no memory is leaked, we should manage the memory using <a href="http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization">RAII</a>, but our <code>number</code> already implements RAII for the buffer, so why not reuse it?<br />
<br />
Instead of managing the memory through a suitable smart pointer we can create a <code>number</code> with automatic storage that is <i>big enough</i> to hold the result. Then we operate on that local temporary<sup><small>3</small></sup>, and if all goes right we exchange the contents of the object and the temporary with a <code>swap</code> function that offers the <i>no throw</i> guarantee.<br />
<br />
If an exception is thrown during the operation, the stack will be unwound, the local object will be destroyed and that will release the memory. As a nice side effect, because the operation has not modified the current object at all, we provide the <a href="http://www.boost.org/community/exception_safety.html"><i>strong exception guarantee</i></a>.<br />
<br />
<pre class="prettyprint">class number {
int * data;
std::size_t digits;
public:
// other code omitted
std::size_t size() const {
return digits;
}
number& operator+=( number const & rhs );
number& swap( number& rhs ) throw();
private:
number( std::size_t size ) : data( new int[size]), digits(size) {}
};
number& number::swap( number& rhs ) throw() {
using std::swap;
swap( data, rhs.data );
swap( digits, rhs.digits );
return *this;
}
number& number::operator+=( number const & rhs ) {
number tmp( std::max( size(), rhs.size() ) + 1 );
// actual operations here, might throw
return swap( tmp );
}
// Offer a free function for commodity
void swap( number & lhs, number & rhs ) {
lhs.swap( rhs );
}
</pre><br />
<b>When <code>swap</code> meets <i>the three</i></b><br />
<br />
The <i>rule of the three</i> focuses on providing appropriate <i>value semantics</i> to your types while ensuring that resources won't leak. On the other hand, the <i>copy-and-swap</i> idiom focuses on exception safety, on how to implement operations so that they either succeed or fail in a graceful way (i.e. leaving the state of the original objects intact). They do share a common ground: both of them are important to handle exceptions correctly, so you might want to think on implementing <i>both</i> in your user defined types.<br />
<br />
The good news are that the cost of implementing both idioms for your type is not greater than implementing the <i>rule of the three</i> alone. Instead of implementing the <i>assignment operator</i>, you can move that effort into implemeting <code>swap</code>, and the assignment operator will come for free:<br />
<br />
<pre class="prettyprint">class number {
int * data;
std::size_t digits;
public:
// Rule of the three: copy constructor
number( number const & rhs ) : data( new int[ rhs.digits ] ), digits( rhs.digits ) {
std::copy( rhs.data, rhs.data+rhs.digits, data );
}
// Rule of the three: assignment operator
number& operator=( number rhs ) {
return swap( rhs );
}
// Rule of the three: destructor
~number() {
delete [] data;
}
};
</pre><br />
If we had not implemented <code>swap</code>, then the assignment operator would have been much more complex than our <code>swap</code>, and harder to make it right. Given a choice, implement <code>swap</code>, it will not add development cost, and it will help in many ways.<br />
<br />
<b>When not to <code>swap</code></b><br />
<br />
It is always a good exercise to think before starting to code. While implementing a no-throw <code>swap</code> is in general a good idea, you should consider whether it does make sense in your particular problem.<br />
<br />
<i>Copy and swap</i> is a simple generic solution to provide the strong exception guarantee, but it adds the cost of creating a copy of the original object on which to perform the operation. If none of the steps in the algorithm can throw, then you can avoid the cost altogether. Sometimes you cannot provide a <code>swap</code> function that offers the <i>no-throw</i> guarantee, or the <code>swap</code> operation is too expensive. While you should never sacrifice correctness for performance, it might be worth thinking on ad-hoc ways of provide the safety. As always with performance, measure first, and decide whether you even need to consider optimizing later.<br />
<br />
<div style="text-align: right;"><i><small>He who sacrifices correctness for performance deserves neither</small></i></div><div style="text-align: right;"><br />
</div>But if you can implement an <i>efficient</i> <code>swap</code> that does not throw, then that should be the very first function to write.<br />
<br />
<b>Move semantics</b><br />
<br />
There is quite a bit of fuzz going on with the new standard and <i>rvalue-references</i>, and in particular with one of the two primary uses: <i>move semantics</i>. While native support for <i>moving</i> was not present in the C++98/03 standards, people have been <i>moving</i> all along: the <i>copy-and-swap</i> idiom <b>is</b> moving.<br />
<br />
The function <code>swap</code> has a stricter set of requirements than <i>moving</i>. When you <i>move</i> from an object <code>a</code> to an object <code>b</code>, the semantics only require that the state of <code>b</code> after the move operation is equivalent to the state of <code>a</code> before the operation and that <code>a</code> is at the very least destructible. On the other hand, <code>swap</code> states a stronger guarantee, the state of each object <i>after</i> the operation is equivalent to the state of the other object <i>before</i> the operation, which by definition fulfills all the guarantees of moving.<br />
<br />
In the whole discussion, we have not mentioned the state in which the <i>temporary</i> object is left, and the only operation that is applied to that object is the destructor. We don't care about that temporary at all, we only use it as the source from which to <i>move</i> the result of the operation that was performed aside.<br />
<br />
Besides exception safety, as discussed above, <code>swap</code> can also be used to improve performance in your code. For an illustrative example, we can consider performance of the implementation of <code>operator+</code> mentioned <a href="http://definedbehavior.blogspot.com/2011/07/operator-overloading.html">here</a> together with the discussions on <a href="http://definedbehavior.blogspot.com/2011/08/value-semantics-copy-elision.html">copy elision</a> and the final question of whether we could do better. To refresh the memory, the signature of that operator would be:<br />
<br />
<pre class="prettyprint">number operator+( number lhs, number const & rhs );
</pre><br />
And the question is <i>Why define it as that if the compiler cannot elide copying the argument to the returned object?</i>. The answer is that this signature allows the compiler to optimize where you cannot: avoiding temporaries passed as arguments or a returned object used for initialization cannot be controlled by the programmer. It then leaves us with the opportunity to improve on the only copy that is left by <i>moving</i> from <code>lhs</code> to the returned object:<br />
<br />
<pre class="prettyprint">number operator+( number lhs, number const & rhs ) {
lhs += rhs;
number tmp; // [1]
swap( tmp, rhs ); // [2]
return tmp;
}
</pre><br />
The code is a bit more cumbersome now, so is it worthy? Well, the first thing is whether there is something to improve and for that copying must be expensive, expensive enough to compensate the two operations with which we are replacing it. In line [2], <code>swap</code> must be <i>cheap</i> and provide the no throw guarantee (or at least the same guarantee that copy contruction, remember: never optimize at the cost of correctness!), and the construction in [1] must be relatively cheap. There are still two objects in the code, but the cost of keeping those two objects might be much smaller than letting the compiler copy for us.<br />
<br />
For our <code>number</code> implementation, we could implement the value 0 as a <code>number</code> that holds no memory (<code>data</code> is a null pointer, <code>digits</code> is 0), which will make the construction in [1] equivalent to just two assignments. Then we can use <code>swap</code> to <i>move</i> <code>lhs</code> into <code>tmp</code>, which will require another 6 assignments, that overall amount to no cost. We don't even need to modify the destructor to take into account this particular state, as the <code>delete [] data;</code> will be a no-op when <code>data</code> is null.<br />
<br />
This might seem like a forced example just to show how <i>cool</i> <code>swap</code> can be, but it has been implemented in production code, and you might have even used it unknowingly. In the Dinkumware implementation of the STL<sup><small>4</small></sup> types are tagged with a type trait <code>_Move_operation_category</code> that can be used with SFINAE to optimize operations based on this knowledge. In particular, when a <code>std::vector</code> needs to <i>grow</i>, it will allocate the new buffer in memory, and if the contained type has an efficient <code>swap</code>, it will <i>default-initialize</i> the elements in the new buffer, and then use <code>swap</code> to move the values. The effect is that with a <code>std::vector<std::vector<T> ></code> the cost of growing is proportional to the size of the outer vector, regardless of the sizes of the contained vectors, converting an <code>O( N*M )</code> operation into <code>O(N)</code> (where <code>N</code> is the size of the outer vector, and <code>M</code> the average size of the contained vectors)<br />
<br />
<b><code>swap</code> and moving in the future</b><br />
<br />
The recommendation of implementing <code>swap</code> for your types will probably fall in disgrace in the near future. As developers embrace the upcoming standard they will start using <i>move-constructor</i>s and <i>move-assignment</i> instead of resorting to using <code>swap</code> as a poor man's <i>move</i> operation, taking <code>swap</code> away from the spotlight that it deserved but did not really enjoy.<br />
<br />
Not only will <code>swap</code> not be used for <i>moving</i>, but <i>move semantics</i> will be used to trivially implement efficient swapping of resources. The two lines marked with [1] and [2] in the <code>operator+=</code> above disappear and yet we maintain the same behavior of the code. In C++0x, even <code>std::swap</code> will use <i>rvalue-references</i> when the type implements <i>move semantics</i>. Interestingly, the simplest implementation of <i>move-assignment</i> will probably be that implementation of the <code>swap</code> function we will not write.<br />
<br />
But all that is a story for another day, this post is already too long and you don't want to get bored...<br />
<br />
<b>P.S.: </b><i>It's taken me a long time to write this post, and it will take me some time to write the next as I will be doing some traveling (There I go NY!). I hope this huge post makes up for the delay and hope to see you around!</i><br />
<br />
----<br />
<small><sup>1</sup> Here <i>digit</i> does not denote a decimal or hexadecimal digit, but in most cases a much larger entity. A common approach in big number libraries uses half of the bits in the native memory word (16bit in a 32bit architecture) for a digit. That ensures that we can apply any basic operation to a <i>digit</i> and the result will fit in a word. Then we can post-process all the digits <i>normalizing</i> the data to maintain the invariant that each element in the array never uses more than half the bits. That is, we can process all digits separately without overflows, and then fix the carryovers in a single pass.</small><br />
<br />
<small><sup>2</sup> A dynamic array? Manually implemented? Really? In production code, this would be implemented with a <code>std::vector<int></code>. Using a vector will ease some of the problems of manual management, but many of the things in the post still apply when using a vector, but it seems more graphical with manual memory management.</small><br />
<br />
<small><sup>3</sup> Not in the C++ sense, only a temporary in the sense that its a short lived object (only within the scope of this operation)</small><br />
<br />
<small><sup>4</sup> But <i>who</i> exactly is Dinkumware? Dinkumware is the company that licenses the STL implementation that is shipped with Visual Studio, so if you have used Visual Studio 2003 or later, they are the implementors of the STL you used.</small><br />
David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com4tag:blogger.com,1999:blog-6987514089363722750.post-84580320447036880612011-08-17T06:11:00.005-04:002011-09-07T11:19:13.608-04:00Dynamic dispatching to template functionsAs in the first post I wrote, less than a month ago, this time I am going to visit another <a href="http://stackoverflow.com/questions/7089284/dynamic-dispatching-of-template-functions">stackoverflow question</a>. The user has a template parametrized by a template argument and wants to write a non-templated function that takes an integer argument and will <br />
dispatch the call to the appropriate template.<br />
<a name='more'></a><br />
Basically the problem to solve is:<br />
<pre class="prettyprint">template <int N>
struct A {
static void f();
};
void dispatch( int i ) {
A<i>::f(); // !!!
}</pre>This will not compile, the argument <code>i</code> is a runtime value, it might come from user input, a file, the network... but the templates are proccessed way before by the compiler.<br />
<br />
A first approach to the problem would be manually creating a lookup table:<br />
<pre class="prettyprint">typedef void (*fptr)();
fptr lookup[] = { &A<0>::f, &A<1>::f, /* ... */ };
</pre>And it will work: at compile time all of the instantiations of the template are created and functions pointers stored in the array, now <code>dispatch</code> only needs to use that lookup table and call <code>lookup[i]();</code>. But it is a little cumbersome to write if the possible values is more than a couple.<br />
<br />
<b>The problem</b><br />
<br />
Automate the creation of the lookup table so that the user does not need to enter all of the possible values manually.<br />
<br />
<b>Approach</b><br />
<br />
Because we are talking about template instantiation, <i>automatic</i> instantiation of the templates and build up of the lookup table must be done at compile time, only lookups will be performed at runtime.<br />
<br />
Compile time programming implies working with the template sublanguage, which was not <i>designed</i> to be a programming language. Not being born as a true language, it is a bit awkward to work with, and the set of language primitives that can be used is small: there are no loops, or conditions.<br />
<br />
Metaprogramming might be a bit cumbersome, but it is not that hard. It basically boils down to creating a class template that solves the general step of the algorithm and recursively instantiates itself with a smaller subset of the problem, then creating an specialization that represents the stop condition. It is not that different from a regular recursive function if it were not for the fact that the conditions are not checked inside the function, but by the compiler doing pattern matching of the arguments.<br />
<br />
<b>Setup of the problem</b><br />
<br />
First we need to create the lookup table. I don't like having to type complex types often, and arrays of function pointers are not a simple type, so I will just start with a <code>typedef</code>, a constant to represent the size and the lookup array itself:<br />
<pre class="prettyprint">typedef void (*fptr)();
const int limit = 10;
fptr lookup[ limit ];
</pre>Now we just need to fill it with values.<br />
<br />
<b>General step: fill the Nth element</b><br />
<br />
In the general step we fill an array of N elements by filling in the Nth element and recursively calling the same function to solve the smaller problem of filling N-1 elements.<br />
<pre class="prettyprint">template <int N>
struct init_lookup {
static void init( fptr *lookup ) {
lookup[ N ] = &A<N>::f;
init_lookup<N-1>::init( lookup );
}
};
</pre>Not that hard, we just need to call <code>init_lookup<N>::init( lookup );</code> and that will initialize all elements from the Nth to the <i>zeroth</i>... oh, well, not really. We did not add a stop condition, so let's do it.<br />
<br />
<b>The stop condition</b><br />
<br />
The stop condition is an specialization that will be matched by the compiler. In our case, when the problem to solve has a single element (0). In that case we want to fill the element in, but we do not want to continue instantiating the template.<br />
<pre class="prettyprint">template <>
struct init_lookup<0> {
static void init( ftpr *lookup ) {
lookup[ 0 ] = &A<N>::f;
// No recursive call
}
};
</pre><b>Syntactic sugar</b><br />
<br />
With the solution as implemented the user can just call the appropriate <code>init_lookup<limit>::init</code> function from <code>main</code> and it will be set. But I find it nicer if the lookup table was automatically created without me having to retype the sizes, even if I do have a constant for it. That can be easily achieved by providing a helper function:<br />
<pre class="prettyprint">template <int N>
void lookup_initialization( fptr (&lookup)[N] ) {
init_lookup<N-1>::init( lookup );
}
</pre>If we change that function signature to return an integer, we can use it to initialize a static variable and we will not need to call it from <code>main</code>.<br />
<br />
<b>The final solution</b><br />
<br />
In the final solution I have changed the templates a bit so that the array is passed by reference. I tend to prefer passing arrays by reference as the compiler gets the extra size information. We could then add an static assert to verify that we do not try to write beyond the end of the array. I have also offset the <code>position</code> argument by one so that the specialization is just a stop condition and does not contain any logic.<br />
<br />
<pre class="prettyprint">typedef void (*fptr)();
namespace {
// L: size of the array
// N: element to initialize (offset by 1)
template <int L, int N>
struct init_lookup {
static_assert( N <= L );
static void init( fptr (&lookup)[L] ) {
lookup[ N-1 ] = &A<N-1>::f;
init_lookup<L,N-1>::init( lookup );
}
};
template <int L>
struct init_lookup<L,0> {
static_assert( L >= 0 );
static void init( fptr (&lookup)[L] ) {
}
};
template <int N>
int lookup_initialization( fptr (&lookup)[N] ) {
init_lookup<N,N>::init( lookup );
return 0;
}
}
const int limit = 10;
fptr lookup[ limit ];
static const int xxx_ignored = lookup_initialization(lookup);
</pre>David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com6tag:blogger.com,1999:blog-6987514089363722750.post-85470346867375181302011-08-10T19:14:00.002-04:002011-08-10T19:22:59.105-04:00Value semantics: Copy elisionIn the last <a href="http://definedbehavior.blogspot.com/2011/08/value-semantics-nrvo.html">post</a> we discussed <i>Named Return Value Optimization</i> and one case of <i>copy elision</i> when constructing a variable from an rvalue expression.<br />
<br />
But <i>copy elision</i> is not limited to those particular use cases. The question is where else can we take advantage of this optimization, and where the compiler will not be able to optimize away the extra copies.<br />
<a name='more'></a><br />
In the last installment we saw that the compiler can elide copies when transferring data from a function to the returned value (<i>[Named] Return Value Optimization</i>) and that copies can also be optimized away when initializing local variables of automatic storage duration from temporaries. In principle, <i>copy elision</i> can be applied whenever a variable is constructed from a temporary, that includes to function arguments. But first, lets diverge a bit into a case where the <i>copy elision</i> is taken from granted by most programmers.<br />
<br />
<b>Copy elision in function arguments</b><br />
<br />
As with the returned value from a function, the calling convention determines how arguments are passed into functions. So again lets start from a simple example:<br />
<br />
<div class="separator" style="clear: both; text-align: left;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSfMS8mH0pVRKNl0SBtequsm4ds6l-9V0oCI14Sjyn-Bp2OFJG87292OFocE3MshY4zj4TWqLOQmlsahvIa6lhsHY1D6ioxhPiH4968yd7c-72zHImLhg2T1okoJz-cBNUnTlT_ijnJdml/s1600/arg_copy.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSfMS8mH0pVRKNl0SBtequsm4ds6l-9V0oCI14Sjyn-Bp2OFJG87292OFocE3MshY4zj4TWqLOQmlsahvIa6lhsHY1D6ioxhPiH4968yd7c-72zHImLhg2T1okoJz-cBNUnTlT_ijnJdml/s320/arg_copy.png" /><br />
Argument passing by value</a><script src="https://gist.github.com/1132909.js">
</script></div>Because the function may modify <code>arg</code> internally, the compiler must ensure that there are two copies of the object one for <code>f</code> use and another for <code>main</code>. Now, the function above does not <i>need</i> a copy, as it does not modify the object, so the common rule is to pass the object by constant reference and avoid the unnecessary copy. The reference itself will incur the approximate cost of a pointer in this case, so if <code>type</code> is expensive to copy we just improved performance.<br />
<br />
This is the first advice you get when starting with C++: pass by reference to avoid the cost of copying. But what if you do need to copy anyway? Say that <code>f</code> above had to modify the object as part of the operation, what would be more efficient?<br />
<br />
<script src="https://gist.github.com/1132948.js"></script><br />
To analyze the two options we have to take a detour and understand what it actually means to bind an rvalue to a constant reference.<br />
<br />
<b>Binding an rvalue to a constant reference</b><br />
<br />
In C you cannot take the address of a temporary, it is a safety net around what would most probably be an error: the temporary will be destroyed immediately after the expression completes, and then you will left with a dangling pointer. In a similar way, in C++ you cannot bind an rvalue to a reference. But sometimes it can be useful. For example, in the case of <code>f</code> above we determined that it would be more efficient to take the argument by reference, and we might want to call it by directly passing the result of an rvalue expression without having to create extra variables. That is why the language adds an special rule in the language (§8.3/5) to allow such binding:<br />
<blockquote><i>a temporary of type “cv1 T1” is created and initialized from the initializer expression using the rules for a non-reference copy initialization (8.5). The reference is then bound to the temporary. If T1 is reference-related to T2, cv1 must be the same cv-qualification as, or greater cv qualification than, cv2; otherwise, the program is ill-formed.</i></blockquote><div class="separator" style="clear: both; text-align: left;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCcbKCmjYNx_BM-az6nrrNVS_Y0rfdmFKdYyOW9pLv_mpIj4jzcSbZ8CtFATlC9GPuqamkfGxcuZ06cHWGl5892hrY5KF-GkV18lleDVV8wfGsApob7o9rqo2KXSRNHihwX485wR2l8wd6/s1600/Screen+shot+2011-08-08+at+23.13.12.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCcbKCmjYNx_BM-az6nrrNVS_Y0rfdmFKdYyOW9pLv_mpIj4jzcSbZ8CtFATlC9GPuqamkfGxcuZ06cHWGl5892hrY5KF-GkV18lleDVV8wfGsApob7o9rqo2KXSRNHihwX485wR2l8wd6/s320/Screen+shot+2011-08-08+at+23.13.12.png" /><br />
Reference bound to temporary</a><script src="https://gist.github.com/1132856.js">
</script></div><br />
The temporary variable <code>_Tmp</code> is created, and the <i>rvalue-expression</i> result <code>_R</code> is copied. Finally the reference is bound to the local temporary <code>_Tmp</code>. This is just a different variety of the same copy elision in the previous post, where there returned <code>_R</code> and the temporary variable <code>_Tmp</code> are merged together. Then the lifetime of the temporary is extended until the reference goes out of scope. Conceptually, the temporary is equivalent in this context to a local variable injected by the compiler, and will have that same lifetime. There are interesting details that make a difference, but that is something for another day.<br />
<br />
<b>Pass-by-value or reference</b> <small><i>when you do need to make a copy</i></small><br />
<br />
Going back to the <code>g1</code> and <code>g2</code> example, and assuming that the caller has a local object to pass in, the cost of either solution is roughly equivalent. As in the case of <code>f</code> the compiler will create a copy (<code>arg</code>) when calling <code>g2</code>, but because that copy is needed, the cost is equivalent to passing by reference and then copying into <code>copy</code> inside <code>g1</code>. There is no extra cost in passing by value, but is there any advantage?<br />
<br />
There can be under some circumstances. If the caller does not pass a variable, but rather the result of an rvalue expression, in the first case the reference will be bound to the result of the rvalue expression, and as we just saw that implies copying to another temporary <code>_Tmp</code>. The compiler will elide that copy and pass the reference into <code>g1</code> which will then copy it into <code>copy</code>.<br />
<br />
On the other hand, with <code>g2</code>, the compiler can merge the result of the rvalue expression <code>_R</code> with <code>arg</code>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizLZSvryYjSdfUUB62WcgDEeqv13eABZTQgZi_tsnpKebsej5giXuqLBFzlcWngUuOe6qICOeuI-zYTDrwpYxdGFKwHzuGkr4FpHSl51WQHo_n6jhuUJnBZqdCvSz1G8SIpqVY252r6Anx/s1600/cref_arg.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizLZSvryYjSdfUUB62WcgDEeqv13eABZTQgZi_tsnpKebsej5giXuqLBFzlcWngUuOe6qICOeuI-zYTDrwpYxdGFKwHzuGkr4FpHSl51WQHo_n6jhuUJnBZqdCvSz1G8SIpqVY252r6Anx/s320/cref_arg.png" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizLZSvryYjSdfUUB62WcgDEeqv13eABZTQgZi_tsnpKebsej5giXuqLBFzlcWngUuOe6qICOeuI-zYTDrwpYxdGFKwHzuGkr4FpHSl51WQHo_n6jhuUJnBZqdCvSz1G8SIpqVY252r6Anx/s1600/cref_arg.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"> <img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsG2yQoRyS6EG1BGWAfylbRK0-c0Xdn7gFEiQ74xKs9JQsE_tofLcROcD4UGTSlAY_2VvK4mOeTudrr_0l6buLL4UN3NiUQZsvAqLEKlDv7yi8_ZnmHue-webrx_Trn1He7gtZhuW3vsYW/s320/value_arg.png" /><br />
</a></div>By using value semantics in the function signature we are providing the compiler with extra information about the semantics of our function. When the compiler processes our caller function, it can construct the temporary in place of the argument and avoid the cost of the copy. More information to the compiler usually means greater chances of optimization, and this is one such case.<br />
<br />
At no time during the whole discussion we have actually dealt with the definition of the <code>type</code> class, whether it is a small type like a <code>std::pair<int,int></code>, a large type with automatic storage like <code>std::array<int,1000></code> or an object that manages dynamically allocated resources like <code>std::vector<int></code>, the copy elision will be performed, and we will get the exact same optimization.<br />
<br />
The compiler will elide copies when returning from a function and when calling a function that takes the argument by value, but can it do both?<br />
<br />
<b>Returning an argument passed by value</b><br />
<br />
Sadly it cannot. The situation is simple to understand. The calling convention will determine the location of the argument and the returned value from the function, the compiler cannot place the argument and the returned value on the same memory location.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYGVPU93-ISLWWAJWAEklprsztfqrPUBLChT7vLCRxPvb9s9i_Jp6qYJBUroVd1lrbGkR1FAJ3YmwWUZE-HwXbwidE7_D_R-eMo2iDcqjLpXOKzUT2yj4p32M-gv77qTanyqCH4sZ4PRKV/s1600/ra_unopt.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYGVPU93-ISLWWAJWAEklprsztfqrPUBLChT7vLCRxPvb9s9i_Jp6qYJBUroVd1lrbGkR1FAJ3YmwWUZE-HwXbwidE7_D_R-eMo2iDcqjLpXOKzUT2yj4p32M-gv77qTanyqCH4sZ4PRKV/s1600/ra_unopt.png" /><br />
<small>Before copy elision</small></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWCPPrJzoUTcrH0c7N4l0GoDSOKyWsAUjCifXgfx6GBBvxEEgvWiswTc1YHtY6FN0Py1k1oBBD1BDGnfsr4V0l1UC3AxfYAsobtFD_npHh8izhwwni4Wfrcom_RB6EjHuuhKAVVTJQTZTg/s320/ra_opt.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"> <img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWCPPrJzoUTcrH0c7N4l0GoDSOKyWsAUjCifXgfx6GBBvxEEgvWiswTc1YHtY6FN0Py1k1oBBD1BDGnfsr4V0l1UC3AxfYAsobtFD_npHh8izhwwni4Wfrcom_RB6EjHuuhKAVVTJQTZTg/s320/ra_opt.png" /><br />
<small>After copy elision</small></a></div><script src="https://gist.github.com/1138478.js"> </script><br />
<br />
In the code above, even if only one object is really <i>alive</i> at any given point (excluding the copying), the compiler cannot optimize those the argument to the function with the return statement. While the current standard does not treat all cases for copy elision, the latest draft (n3290) of the upcoming standard explicitly states this in §12.8/31:<br />
<blockquote><i>[...]This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):<br />
<br />
-- in a <b>return statement</b> in a function with a class return type, when the expression is the name of a non-volatile automatic object (<b>other than a function</b> or catch-clause <b>parameter</b>) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value</i></blockquote><br />
The next open question is whether this is the best we can hope for. After all we <i>know</i> that we only need one object in that program, but the language cannot help in removing the extra object and the potentially expensive associated resource copying. We cannot avoid having two objects in the program, but in specific use cases, if the cost of copying the object is not the object itself, but resources managed by it (think dynamically allocated memory), we still have a escape path: <i>move semantics</i>, but that's a story for another day...<br />
<br />
<small>Looking back at the <a href="http://definedbehavior.blogspot.com/2011/07/operator-overloading.html">number</a> interface, what's the deal with <code>operator+</code>? Why would it be defined as it is (take first argument by value, return by value) if that copy cannot be elided? Is it not better to pass by reference the first argument, create the local variable on which to operate and then return it?</small>David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com2tag:blogger.com,1999:blog-6987514089363722750.post-21626384335266390152011-08-03T04:40:00.001-04:002011-08-11T15:01:32.898-04:00Value semantics: NRVOC++ is a language with value semantics. It was designed so that user defined types will behave in the same ways that primitive types do. This offers advantages, but also imposes burdens on development: programmers have to <i>implement</i> those semantics, and it also falls in the programmers to decide how parameters are passed in and out of functions and the impact that has.<br />
<br />
With the language being designed with <i>value</i> semantics in mind, you would imagine that some optimizations are in place to avoid unneeded processing. In this category you can find things like <i>[Named] Return Value Optimization</i>, or <i>copy elision</i> in the current standard, or <i>move semantics</i> in the upcoming C++0x. Understanding what they mean, and when the compiler can or cannot apply them will improve efficiency and readability in the code.<br />
<a name='more'></a><br />
There are quite a few of articles, blogs and what not about how to make your code more efficient. Most of them are good and offer sound advice, but you find out after a while that even the best advice is often misinterpreted. And sometimes the advice is <b>wrong</b> or at the very least misleading (you can try and find some in my last <a href="http://definedbehavior.blogspot.com/2011/07/operator-overloading.html">post</a> now as an exercise). As important as the advice is understanding under which circumstances it applies, and when it doesn't. But lets start from the beginning, what does NRVO mean?<br />
<br />
<b>[Named] Return Value Optimization</b><br />
<br />
We can start with a small code sample, and a drawing of the objects in the program:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYOqnNm-eH2Uy1E64P3UuW7n0UN8v6fbD0XmHfHMKRaJ46sVx9uskVj1QKq7E05MZ0g2A1OT0XoE2WwOtd4cl2VizTbJk767Fp9IRVCbJIbw0GweOyy0A0zF8JDMXCv2cuGCVWu_3dLnF3/s1600/RVO.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYOqnNm-eH2Uy1E64P3UuW7n0UN8v6fbD0XmHfHMKRaJ46sVx9uskVj1QKq7E05MZ0g2A1OT0XoE2WwOtd4cl2VizTbJk767Fp9IRVCbJIbw0GweOyy0A0zF8JDMXCv2cuGCVWu_3dLnF3/s320/RVO.png" /><br />
<small>_R: return value</small></a></div><script src="https://gist.github.com/1121268.js"> </script><br />
The exact contents of <code>type</code> do not really matter much, the fact is that it is a user defined type which might potentially be expensive to copy. The drawing on the right represents the layout of the objects that exist in the program. The blue box represents the <code>main</code> function, where <code>X</code> resides. The grey box represents the code in <code>function</code>, where variable <code>Y</code> is created. According to the standard, the <code>return</code> statement <i>copies</i> the value from <code>Y</code> to <code>_R</code> the <code>returned</code> object, an agreed location where the data is to be handed from <code>function</code> to its caller. Where those objects are really depends on the calling convention, you can think of them as decreasing addresses in the stack if you wish, but what matters is that they <i>are</i> somewhere, and data must be copied from one to the other. <br />
<br />
The standard explicitly allows the implementation to avoid the creation of temporaries, including <code>_R</code>. But how is that done? Actually quite simple. When processing <code>function</code> the compiler knows that <code>Y</code> has as only purpose in life to serve as the seed from which <code>_R</code> is copied, and the lifetimes of the two objects are intimately bound: destruction of <code>Y</code> and construction of <code>_R</code> are basically simultaneous. The compiler can <i>avoid creating two separate objects</i>, and just use the same memory location for both.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTug2w1G6M8YHLt9eGfN1U_maubkusaVx9SR52fMWt7nYuzXBBfz8zZk_jCKs0T0gX2UxEfSkL8o62WGegKpHr5H6sQR0agP7de-eB196-uvDMLP9DqhQztfubssWPZ2eGevOnte6BAPR1/s1600/RVO2.png" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTug2w1G6M8YHLt9eGfN1U_maubkusaVx9SR52fMWt7nYuzXBBfz8zZk_jCKs0T0gX2UxEfSkL8o62WGegKpHr5H6sQR0agP7de-eB196-uvDMLP9DqhQztfubssWPZ2eGevOnte6BAPR1/s320/RVO2.png" /><br />
<small>_R and Y are the same object</small></a></div>The first question that comes to mind, is <i>which</i> object is not created, <code>Y</code> or <code>_R</code>. None, or both of them, or maybe it does not matter. <code>_R</code> cannot be removed from the program, as it is a contract with the callers of the program its location is outside of the control of the compiler when processing the function. But all the code in <code>function</code> uses <code>Y</code> so it cannot be removed either, unless the object called <code>Y</code> inside <code>function</code> is <i>located</i> over the agreed location of the <code>_R</code> object. The single object is both <code>Y</code> and <code>_R</code>.<br />
<br />
The case of RVO is similar, in the case where the object that is being returned by the function is a temporary itself (without a name). The fact that the temporary does not have a name does not mean that it does not exist, it will take the place of <code>Y</code> in the discussion above.<br />
<br />
<b>When can the compiler apply this optimization</b><br />
<br />
The (unnamed) Return Value Optimization can basically applied mostly <i>anywhere</i>. The creation of the temporary inside <code>function</code> can be done in place of the returned object. In the more complex case of Named-RVO, it all depends on how the function is implemented. Compilers are quite smart and can apply the optimization in many different cases, but not always. <br />
<br />
To perform the optimization, the compiler needs to know that <code>Y</code> will be returned <i>before</i> deciding the location of the object, so that it can match <code>_R</code>. In the most general case, this means that if a function has a single return statement, or all return statements refer to the same named variable (or possibly a temporary), then the compiler can merge <code>Y</code> with <code>_R</code> into a single object.<br />
<br />
<script src="https://gist.github.com/1121412.js"></script>In this case, the compiler must create both variables <code>X</code> and <code>Y</code> and it must do so before calling <code>should_return_first</code>. Until that function returns the compiler does not know which of the objects is to be returned. It cannot merge either with <code>_R</code>.<br />
<br />
<script src="https://gist.github.com/1121421.js"></script>In this last case, there are two local objects might be returned by the function, but the compiler does know which of the two objects will be returned, and can turn the code into the equivalent:<br />
<br />
<script src="https://gist.github.com/1121431.js"></script>In which in each branch of the <i>if</i> the compiler can place either <code>X</code> or <code>Y</code> in the place of <code>_R</code> (for what matters, the compiler might even be able to avoid the creation of the other variable). Still, your best chance is to keep code simple, create the objects only when you need them and keep your functions simple for the compiler to analyze.<br />
<br />
<b>What about the receiving end?</b><br />
<br />
Originally we started with three objects and the compiler optimized with NRVO one of them out. But what about the other? In our program we only need one object, we added a function to factor out some of the complexity into its own reusable piece of code, but we do not want to pay for an extra <code>type</code> object that we do not need.<br />
<br />
As with RVO, when processing the caller, and in this particular case where the temporary object <code>_R</code> is used to <i>copy-construct</i> a local object, the compiler can follow the same line of reasoning: since the only purpose of the temporary is to serve as the source for <code>X</code> and the lifetimes only overlap during the copy, it can merge <code>_R</code> with <code>X</code>. Now or program has a single object:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga-Ng_ng-7qQP8mtl03xjo1BQ7ARIX6tMk-AyV3He2xNCGNYbjkFkvvP3Ni2-l8CiXZKCF5cjihppn119324WovFQstF0ert5EX3WlWeOFgop30ds4Cq5Cwua8J3GnQ0RnClOWaMTJMRIg/s1600/RVO3.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="64" width="153" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga-Ng_ng-7qQP8mtl03xjo1BQ7ARIX6tMk-AyV3He2xNCGNYbjkFkvvP3Ni2-l8CiXZKCF5cjihppn119324WovFQstF0ert5EX3WlWeOFgop30ds4Cq5Cwua8J3GnQ0RnClOWaMTJMRIg/s320/RVO3.png" /></a></div><br />
<b>Summing it up</b><br />
<br />
C++ is a language with value semantics, and that means that there might be potentially many objects in your program being copied here and there, in particular across function calls. This does not mean that you should not factor out code into functions, or that you should refactor your function signatures to avoid return copies in favor of references, this might actually have a negative impact in behavior. The compiler is there to help you avoid the costs of copies, and it does a good job at it. <i>Never</i> depend on side effects from copy constructors, as small changes in the code inside a function might allow or inhibit NRVO.<br />
<br />
Value semantics is a hot topic in the language, more so with the inclusion of <i>r-value-references</i> and <i>move semantics</i> to the upcoming standard. Expect to read more on the subject. <br />
<br />
<small>What about arguments to functions (rather than returned values)? Can copies be optimized there? Under which circumstances? Can the interface of the <code>number</code> type in last week's <a href="http://definedbehavior.blogspot.com/2011/07/operator-overloading.html">post</a> be improved to allow for some optimization? Is there anything that won't help?</small>David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com3tag:blogger.com,1999:blog-6987514089363722750.post-58133918062600889482011-07-28T18:33:00.004-04:002011-08-02T08:06:59.529-04:00Operator overloading, OO the C++ wayIf you question around about a mainstream object oriented language, most people will point to Java, or C#. Sure that C++ has classes, and objects, inheritance, polymorphism... but it's not <i>really</i> object oriented, there are still non-member functions, and that is just so no-OO. Or is it?<br />
<a name='more'></a><br />
There are different programming paradigms, some of them better suited for some problems than others. You can pretty much solve all problems with any paradigm, but some paradigms help modeling the domain of the problem easier than others. What is important to remember is that programming is not about the tools you use, programming is about solving a problem.<br />
<br />
In pure OO everything is an object, and operations are executed on one object and each object has a set of methods that form its interface. In C++ not everything is an object, and the <a href="http://www.gotw.ca/publications/mill02.htm">interface</a> of a type is not just the set of methods, but it also includes the set of free functions that are defined in the same namespace<br />
<br />
From a design standpoint the question is how does the operations in the domain map to the language? Does every operation belong to a particular type? Or is there space for free functions?<br />
<br />
<b>Study case</b><br />
<br />
Designing a numeric type (be it a biginteger, a decimal or just a simple integer type with a range larger than that those available in your platform) is a good exercise. The domain is well understood and we can focus on the design of the interface. We don't even need to think of names for the operations, overloadable operators fit the domain perfectly. The C++ language allows for overloading of most operators both as a member function and a free function, so it will not force a decision upon us.<br />
<br />
<b>Creating a number</b><br />
<br />
For starters, we want to be able to construct our <code>number</code> and we want to allow conversions from other arithmetic types, for simplicity just consider <code>double</code>. The compiler can convert any integral or floating point number into double for us, and this will enable creation of <code>number</code> from any other arithmetic type.<br />
<br />
<script src="https://gist.github.com/1112619.js"> </script><br />
Because the first constructor can be used to perform implicit conversions from <code>double</code> there is no need to provide an assignment operator that takes a double, it will work out of the box. First the right hand side will be converted, and then the assignment operator will be executed.<br />
<br />
Now we can add some arithmetic operations, using addition as a pattern. In the domain (math) addition takes two elements and produces a third unrelated element that is the result of the operation. In programming it is common for efficiency reasons to provide a <i>+=</i> operator that behaves like addition but stores the result in the first element. Then we can define regular addition in terms of the previous <code>+=</code>: adding two numbers together can be expressed as making a copy of one of them and then applying <code>+=</code> to that copy with the second value.<br />
<br />
<b>Interface of <code>operator+=</code> and <code>operator+</code></b><br />
<br />
The <code>+=</code> operator is a natural candidate for a member function. The operation <i>is</i> applied to a particular instance, it is a feature of that instance. Then we have <code>operator+</code>. As described above, addition takes two elements and yields a third element with the aggregate value. It is not more of an operation on the first argument than it is on the second, so there is no compelling reason to make it a member function, and we should prefer a free function as that treats both arguments similarly.<br />
<br />
<script src="https://gist.github.com/1112621.js"> </script><br />
The signature of <code>operator+</code> follows a common idiom when overloading operators, but for now just focus on the fact that it is a free function.<br />
<br />
<b>Differences between a member function and a free function when overloading</b><br />
<br />
The main difference is how overload resolution is performed. When the operator is implemented as a free function, both arguments have equal standing with respect to the compiler, any conversion that can be applied to the right hand side can also be applied to the left hand side.<br />
<br />
In the member function case, the two arguments don't have equal standing, the left hand side argument <b>must</b> be of the type that implements the operator. The compiler is able to perform implicit conversions on the right hand side, but the left hand side is sacred.<br />
<br />
The same decisions we made in the design are translated into the code: when we implemented <code>operator+</code> as a free function claiming that the operation is no more of the first argument than the second, we got symmetry from the compiler. When we decided that <code>operator+=</code> was a member function we got asymmetry, it can only be called on objects of type <code>number</code>.<br />
<br />
<script src="https://gist.github.com/1112627.js"> </script><br />
You can argue whether C++ is better or worse at object orientation than other languages, but there are certain things that are harder to model while tied to the constraints of <i>everything is an object and all operations are methods</i>.<br />
<br />
I have kind of hand waved over quite a few things of the design. One of the things left aside are the <a href="http://en.wikipedia.org/wiki/Rule_of_three_(C%2B%2B_programming)">Rule of the Three</a> (if the implicitly generated copy constructor and assignment were not good enough, we probably need a destructor, but not having provided an implementation I just ignored it). The signature of the operators are interesting by themselves (can you think of anything I should have done differently? Assume that this is a big number implementation that has to manage some expensive resource. Drop me an email at <i>definedbehavior</i> at <i>gmail</i> dot <i>com</i>), as is the effect of friendship on this problem, note that we do not <i>need</i> it, we can implement <code>operator+</code> on top of the existing member <code>operator+=</code>, but maybe it could help us somehow? But that's a story for another day.David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com0tag:blogger.com,1999:blog-6987514089363722750.post-91683527742280579672011-07-23T12:28:00.003-04:002011-07-24T18:45:10.735-04:00Implementating vs. understandingSolving a problem and actually understanding what the problem is do not always go hand by hand. Many times not just the problem eludes understanding, but even the solution we just wrote.<a name='more'></a><br />
Sometimes this is the case when we are dealing with an elusive bug that only happens under very specific conditions, and by the time you realize that something is wrong it is already too late. And then in desperation you change code here and there or rewrite a piece of code... and the problem seems to go away (<i>has it really or is it just playing hide and seek?</i>)<br />
<br />
But that can also happen with simple problems. I was asked, out of the blue as a thinking exercise, to provide a solution to a simple problem: Given a single linked list, provide a function that reverses the list in place. Oh, well, that is an easy problem to solve... should have a solution in 20 seconds.<br />
<br />
Imagine a list of nodes, linked by a <code>next</code> pointer, and finished in a NULL pointer. To reverse each node you just need to keep pointers to the previous node, the node and the next node, then you make it point back to the previous node, and you advance the list pointer and you<br />
<div style="text-align: right;"><i>cannot see the forest for the trees</i></div><br />
Having to provide an answer in a few seconds has that effect, you work out the details, forget the problem. And you produce a solution: <br />
<br />
<script src="https://gist.github.com/1100763.js"></script><br />
Then the next day I was cycling to work, and I thought that lists and functional programming tend to go hand by hand, so how would a functional programming solution to the problem look like?<br />
<br />
<script src="https://gist.github.com/1100765.js"></script><br />
This solves the problem, but not efficiently. The recursion can turn to be expensive, can we turn this into <i><a href="http://en.wikipedia.org/wiki/Tail_call">tail recursion</a></i>? Tail recursion is good, it avoids the need for the call stack providing a restricted form of <i><a href="http://en.wikipedia.org/wiki/Continuation-passing_style">continuation passing style</a></i> and the compiler can avoid holding to the stack. Compilers can turn tail recursion into much cheaper loops.<br />
<br />
<script src="https://gist.github.com/1100773.js"></script><br />
To avoid holding to state until the function returns we can just pass that state in. The first argument is the input list, the second argument the reversed list. We take one element from the input list and put it at the beginning of the reversed list. When the input list is empty, just return whatever is already reversed. I added an extra function so that the signature matches that of the original problem.<br />
<br />
<b>Epiphany</b><br />
<br />
Wow, now I understand the problem and I understand the original solution. The problem is not moving pointers around. The solution is to iteratively remove one element from the head of the input list and insert it at the head of the reversed list.<br />
<br />
<script src="https://gist.github.com/1100774.js"> </script><br />
When you have a problem that can be stated in simple terms, the answer should be explainable in similar terms. High level languages have that characteristic, they let you focus on the problem, rather than the nuisances of the implementation. If you have to explain the answer as detailed technical steps, you don't really understand it.David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com0tag:blogger.com,1999:blog-6987514089363722750.post-64473204859855302942011-07-21T18:58:00.005-04:002011-08-11T16:11:16.960-04:00Improving conversions of std::pair<T,U>sAs we already <a href="http://definedbehavior.blogspot.com/2011/07/i-decided-to-start-writing-few-months.html">saw</a>, even if the implementations are more permisive, the standard mandates that conversions from <code>std::pair<></code> types should use only <i>implicit</i> conversions. That solves only half of the problem: when two types are implicitly convertible, then <code>std::pair<></code> containing those types are also implicitly convertible.<br />
<br />
But what about <i>explicit</i> conversions? Do we need to fail there? It would seem appropriate if, given two types that are <i>explicitly</i> convertible, we allowed <i>explicit</i> conversions of the <code>std::pair</code>.<br />
<a name='more'></a><br />
From here on, we digress from the standard. This is just a simple example of how to use <a href="http://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error">SFINAE</a><sup>(1)</sup>. As this is the first post with SFINAE, it will be longer than I expected.<br />
<br />
<b>The problem</b><br />
<br />
What we would like is to allow <i>implicit</i> conversions of <code>pair</code>s of different types when the types are <i>implicitly</i> convertible pairwise, and also allow <i>explicit</i> conversions of <code>pair</code> when the respective types are not <i>implicitly</i> convertible. If only one of the types can be <i>implicitly</i> converted, we will require an explicit conversion for the <code>pair</code><br />
<br />
<b>The solution</b><br />
<br />
The solution for the problem is providing two different constructor templates that allow for the conversion. The first one of them will be implicit and will only work for types that are implicitly convertible, while the second one will be marked explicit and will be available whenever the first one is not. If both where available for the same combination of types, the compiler would fail to process the call with an <i>ambiguity</i> error. Because SFINAE requires that the error/failure is during the substitution phase, we will need to change the signatures of the constructors, but we will aim to maintain compatibility of user code.<br />
<br />
<b>Detect implicit convertibility</b><br />
<br />
This is probably the simplest part of the problem. We need a template with two type arguments, that offers a boolean constant <code>true</code> when the types are implicitly convertible.<br />
<pre class="prettyprint">template <typename From, typename To>
class implicit_convertible {
typedef char (&yes)[1];
typedef char (&no)[2];
static yes test( To );
static no test( ... );
public:
static const bool value
= sizeof( test( *(From*)0 ) ) == sizeof( yes );
};
</pre>We create two types of different sizes <code>yes</code> and <code>no</code>, and we define two static functions, the first of which takes an argument of the destination type. The second is an <i>ellipsis</i> catch-all. We ask the compiler to perform overload resolution for an object of type <code>From</code><sup>(2)</sup> and we check whether the first overload was chosen.<br />
<br />
<b>Implementing SFINAE</b><br />
<br />
For SFINAE to work, the compiler must be able to infer the types of the template from the arguments, but when substituting the inferred types in the templates, the compiler must be unable to produce a correct signature. As a helper we can use a variant of <code>enable_if</code>:<br />
<pre class="prettyprint">template <bool enabled, typename T = void>
struct enable_if {};
template <typename T>
struct enable_if<true,T> {
typedef T type;
};
</pre>The template is specialized for the case where the first argument is <code>true</code>, in which case it defines an internal type <code>type</code>. If the first argument is <code>false</code> the type is an empty class. Now we can use this to define our complete solution:<br />
<br />
<b>Implementation</b><br />
<br />
<pre class="prettyprint">template <typename T1, typename T2>
struct pair {
T1 first;
T2 second;
pair() {}
pair( T1 f, T2 s ) : first(f), second(s) {}
template<typename U, typename V>
pair( pair<U,V> const & p,
typename enable_if< implicitly_convertible<U,T1>::value
and implicitly_convertible<V,T2>::value
>::type* p = 0 )
: first( p.first ), second( p.second )
{}
template<typename U, typename V>
explicit pair( pair<U,V> const & p,
typename enable_if< !implicitly_convertible<U,T1>::value
or !implicitly_convertible<V,T2>::value
>::type* p = 0 )
: first( p.first ), second( p.second )
{}
};
</pre>To be able to use SFINAE we have added an extra argument that uses the <code>enable_if</code> template, if the condition is not met, then <code>typename enable_if< false >::type</code> will not resolve to a type, and the compiler will discard that overload. It is important to note that the conditions must be mutually exclusive, else the compiler will generate an ambiguity error.<br />
<br />
If instead of a constructor we were applying SFINAE to a regular function, we could have used <code>enable_if</code> in the return type, and the signature of the function after substitution would have remained unmodified. In the case of a constructor that is not an option, and we were forced to add the extra argument. By making it a pointer and providing a default value we manage to maintain user code compatibility with <code>std::pair</code>, but we accept calls to the constructor with an extra <code>void*</code> argument. Probably not an issue.<br />
<br />
<sup>(1)</sup><small>Substitution Failure Is Not An Error. That weird acronym stands for the fact that, after lookup is performed and a template is determined to be a candidate for overload resolution if the <b>substitution</b> of the inferred types in the template <b>fails</b>, the compiler will discard that particular template and continue trying the rest of the overload candidates <b>without</b> triggering an <b>error</b>.</small><br />
<br />
<sup>(2)</sup><small>To avoid imposing arbitrary requirements in the type, we create a pointer of the source type initialized to <code>0</code> and we dereference it. This is undefined behavior in real code, but because we are using the whole expression inside a <code>sizeof</code> the expression is not evaluated, it is only used to extract the type <code>From&</code> so we are fine.</small>David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com0tag:blogger.com,1999:blog-6987514089363722750.post-74879462178633754092011-07-20T18:41:00.084-04:002011-07-22T04:12:16.576-04:00Conversions of std::pair<T,U>sI decided to start writing a few months ago, but I never actually committed to it. Then I decided a few days ago that I was actually going to do it, and I have spent the last few days trying to decide what would be a good first post. I am still undecided. But today I read an interesting <a href="http://stackoverflow.com/questions/6760668/why-stdpair-calls-explicit-constructors-in-assignment">question</a> in StackOverflow, and decided to write about it.<br />
<br />
The problem Rafał has is with implicit conversions. In his application he has two types that are convertible, but only <i>explicitly</i> convertible. He is surprised that an <code>std::pair<></code>that contains one of those types is <i>implicitly</i> convertible to a <code>std::pair<></code> of that contains the second type.<br />
<a name='more'></a><br />
<b>The problem</b><br />
<br />
Given two types <code>A</code> and <code>B</code>, such that an <i>explicit</i> conversion from <code>A</code> to <code>B</code> is valid, but an <i>implicit</i> conversion is not, why is <code>std::pair<A,A></code> <i>implicitly</i> convertible to <code>std::pair<B,B></code>? <br />
<br />
Well, the answer to this type of question is usually simple: the properties of the types used to instantiate a template do not propagate to the template itself. That is the reason why you cannot pass a <code>std::vector< derived* ></code> to a function that requires a <code>std::vector< base* ></code>, even if you can pass a pointer to <code>derived</code> to a function that takes a pointer to base.<br />
<br />
The only operation in <code>std::pair</code> that allows for conversions is a constructor template. The implementation of that template in the STL shipped with gcc uses the <i>initialization-list</i> to initialize the members of the <code>pair</code>. Because that is explicit initialization, the compiler gladly accepts the code.<br />
<br />
<b>But, is this the case?</b><br />
<br />
No, not really. In this case the standard is by Rafał's side. The description of the behavior for that particular conversion constructor is stated in §20.2.2:<br />
<br />
<blockquote><code>template <typename U, typename V><br />
pair( const pair<U,V> & p );</code><br />
<br />
<i>Initializes members from the corresponding members of the argument, performing <b>implicit</b> conversions as needed.</i></blockquote><br />
The standard seems quite clear in stating that <i>implicit</i> conversions are to be used, so this seems like a bug in the compiler side not adhering to the standard. As always, the interesting question is the why. <br />
<br />
<b>Can we implement that constructor template with the semantics defined in the standard?</b><br />
<br />
We cannot use implicit conversions in the constructor without adding additional requirements to the instantiating types (for example, using default construction and assignment of the arguments). But if our goal is just to abide the standard and reject all calls to that template when an <i>implicit</i> conversion is not available, that is easy. <br />
<br />
Detecting implicit convertibility is simple, and we can add a static assert to the constructor. The con of the approach is that while this will inhibit implicit conversions when the <code>pair</code>'s elements are not implicitly convertible, it does not allow for <i>explicit</i> conversions of the pairs. But that is probably fine. Or <a href="http://definedbehavior.blogspot.com/2011/07/improving-conversions-of-stdpair-s.html">can we do better</a>?David Rodríguezhttp://www.blogger.com/profile/02117106672577313085noreply@blogger.com4