[Un]defined behavior

The rule of the [3] 4: swap

2011-09-08T21:20:00.002-04:00

Object oriented programming focuses on bundling together the data with the operations that can be applied to them. Access specifiers allow you to control the invariants of the type, and deterministic destruction of objects provides the glue that makes management of all types of resources.

But homogeneous does not mean trivial, and attention has to be paid. In a language with exceptions special care needs to be taken so that resources don't leak, but resources are not the only problem, when we model a domain, we need to maintain the invariants on top of which our algorithms are built.

Managing resources: The Rule of the Three

One of the widely known rules for resource management is the rule of the three, that according to the wikipedia article says:

[...] if a class defines one of the following it should probably explicitly define all three: destructor, copy constructor, copy assignment operator

The rule is based on the principle that if any of the three needs to be implemented, it is most probably because the class encapsulates management of a resource for which you need to provide sensible copy-semantics and release in the destructor.

Optionally if you encapsulate each resource in it's own RAII manager, as you should specially if you have to manage more than one resource, then you can avoid the addition of the destructor.

Whenever I have encountered the rule of the three, my next thought was for swap, not the one implemented in the std namespace, but rather a handcrafted swap function that provides the no throw exception guarantee and is cheap to run. The main point of providing swap is implementing exception safety by means of the copy-and-swap idiom.

The copy and swap idiom

This is a common idiom to implement exception safety in general, but more importantly in types that manage resources. The idea is that instead of executing the operations directly on the object we can copy the object, perform the operation aside and then, and only after the operation has completed successfully, swap the contents of the two objects. We provide the operation as a single transaction: either it succeeds and modifies the object or it fails and leaves the object as it was before the operation started.

Consider for example the number that we sketched the other day, and assume that it is a big number, usually implemented with a dynamic array of digits¹. Addition of a big number to our existing big number (operator+=) may require growing of the array to accomodate a potentially larger number, and performing the operation² on the bigger array.

If while performing the actual operations anything failed and an exception is thrown (there aren't many things that can go wrong with the addition of the individual digits but bear with me), we would have to take care of ensuring that there are no memory leaks and that the object is left in a consistent state. To ensure that no memory is leaked, we should manage the memory using RAII, but our number already implements RAII for the buffer, so why not reuse it?

Instead of managing the memory through a suitable smart pointer we can create a number with automatic storage that is big enough to hold the result. Then we operate on that local temporary³, and if all goes right we exchange the contents of the object and the temporary with a swap function that offers the no throw guarantee.

If an exception is thrown during the operation, the stack will be unwound, the local object will be destroyed and that will release the memory. As a nice side effect, because the operation has not modified the current object at all, we provide the strong exception guarantee.

class number {
    int * data;
    std::size_t digits;
public:
    // other code omitted 
    std::size_t size() const {
        return digits;
    }
    number& operator+=( number const & rhs );
    number& swap( number& rhs ) throw();
private:
    number( std::size_t size ) : data( new int[size]), digits(size) {}
};
number& number::swap( number& rhs ) throw() {
    using std::swap;
    swap( data, rhs.data );
    swap( digits, rhs.digits );
    return *this;
}
number& number::operator+=( number const & rhs ) {
    number tmp( std::max( size(), rhs.size() ) + 1 );
    // actual operations here, might throw
    return swap( tmp );
}
// Offer a free function for commodity
void swap( number & lhs, number & rhs ) {
   lhs.swap( rhs );
}

When swap meets the three

The rule of the three focuses on providing appropriate value semantics to your types while ensuring that resources won't leak. On the other hand, the copy-and-swap idiom focuses on exception safety, on how to implement operations so that they either succeed or fail in a graceful way (i.e. leaving the state of the original objects intact). They do share a common ground: both of them are important to handle exceptions correctly, so you might want to think on implementing both in your user defined types.

The good news are that the cost of implementing both idioms for your type is not greater than implementing the rule of the three alone. Instead of implementing the assignment operator, you can move that effort into implemeting swap, and the assignment operator will come for free:

class number {
    int * data;
    std::size_t digits;
public:
    // Rule of the three: copy constructor
    number( number const & rhs ) : data( new int[ rhs.digits ] ), digits( rhs.digits ) {
        std::copy( rhs.data, rhs.data+rhs.digits, data );
    }
    // Rule of the three: assignment operator
    number& operator=( number rhs ) {
        return swap( rhs );
    }
    // Rule of the three: destructor
    ~number() {
        delete [] data;
    }
};

If we had not implemented swap, then the assignment operator would have been much more complex than our swap, and harder to make it right. Given a choice, implement swap, it will not add development cost, and it will help in many ways.

When not to swap

It is always a good exercise to think before starting to code. While implementing a no-throw swap is in general a good idea, you should consider whether it does make sense in your particular problem.

Copy and swap is a simple generic solution to provide the strong exception guarantee, but it adds the cost of creating a copy of the original object on which to perform the operation. If none of the steps in the algorithm can throw, then you can avoid the cost altogether. Sometimes you cannot provide a swap function that offers the no-throw guarantee, or the swap operation is too expensive. While you should never sacrifice correctness for performance, it might be worth thinking on ad-hoc ways of provide the safety. As always with performance, measure first, and decide whether you even need to consider optimizing later.

He who sacrifices correctness for performance deserves neither

But if you can implement an efficient swap that does not throw, then that should be the very first function to write.

Move semantics

There is quite a bit of fuzz going on with the new standard and rvalue-references, and in particular with one of the two primary uses: move semantics. While native support for moving was not present in the C++98/03 standards, people have been moving all along: the copy-and-swap idiom is moving.

The function swap has a stricter set of requirements than moving. When you move from an object a to an object b, the semantics only require that the state of b after the move operation is equivalent to the state of a before the operation and that a is at the very least destructible. On the other hand, swap states a stronger guarantee, the state of each object after the operation is equivalent to the state of the other object before the operation, which by definition fulfills all the guarantees of moving.

In the whole discussion, we have not mentioned the state in which the temporary object is left, and the only operation that is applied to that object is the destructor. We don't care about that temporary at all, we only use it as the source from which to move the result of the operation that was performed aside.

Besides exception safety, as discussed above, swap can also be used to improve performance in your code. For an illustrative example, we can consider performance of the implementation of operator+ mentioned here together with the discussions on copy elision and the final question of whether we could do better. To refresh the memory, the signature of that operator would be:

number operator+( number lhs, number const & rhs );

And the question is Why define it as that if the compiler cannot elide copying the argument to the returned object?. The answer is that this signature allows the compiler to optimize where you cannot: avoiding temporaries passed as arguments or a returned object used for initialization cannot be controlled by the programmer. It then leaves us with the opportunity to improve on the only copy that is left by moving from lhs to the returned object:

number operator+( number lhs, number const & rhs ) {
    lhs += rhs;
    number tmp;         // [1]
    swap( tmp, rhs );   // [2]
    return tmp;
}

The code is a bit more cumbersome now, so is it worthy? Well, the first thing is whether there is something to improve and for that copying must be expensive, expensive enough to compensate the two operations with which we are replacing it. In line [2], swap must be cheap and provide the no throw guarantee (or at least the same guarantee that copy contruction, remember: never optimize at the cost of correctness!), and the construction in [1] must be relatively cheap. There are still two objects in the code, but the cost of keeping those two objects might be much smaller than letting the compiler copy for us.

For our number implementation, we could implement the value 0 as a number that holds no memory (data is a null pointer, digits is 0), which will make the construction in [1] equivalent to just two assignments. Then we can use swap to move lhs into tmp, which will require another 6 assignments, that overall amount to no cost. We don't even need to modify the destructor to take into account this particular state, as the delete [] data; will be a no-op when data is null.

This might seem like a forced example just to show how cool swap can be, but it has been implemented in production code, and you might have even used it unknowingly. In the Dinkumware implementation of the STL⁴ types are tagged with a type trait _Move_operation_category that can be used with SFINAE to optimize operations based on this knowledge. In particular, when a std::vector needs to grow, it will allocate the new buffer in memory, and if the contained type has an efficient swap, it will default-initialize the elements in the new buffer, and then use swap to move the values. The effect is that with a std::vector<std::vector<T> > the cost of growing is proportional to the size of the outer vector, regardless of the sizes of the contained vectors, converting an O( N*M ) operation into O(N) (where N is the size of the outer vector, and M the average size of the contained vectors)

swap and moving in the future

The recommendation of implementing swap for your types will probably fall in disgrace in the near future. As developers embrace the upcoming standard they will start using move-constructors and move-assignment instead of resorting to using swap as a poor man's move operation, taking swap away from the spotlight that it deserved but did not really enjoy.

Not only will swap not be used for moving, but move semantics will be used to trivially implement efficient swapping of resources. The two lines marked with [1] and [2] in the operator+= above disappear and yet we maintain the same behavior of the code. In C++0x, even std::swap will use rvalue-references when the type implements move semantics. Interestingly, the simplest implementation of move-assignment will probably be that implementation of the swap function we will not write.

But all that is a story for another day, this post is already too long and you don't want to get bored...

P.S.: It's taken me a long time to write this post, and it will take me some time to write the next as I will be doing some traveling (There I go NY!). I hope this huge post makes up for the delay and hope to see you around!

----
¹ Here digit does not denote a decimal or hexadecimal digit, but in most cases a much larger entity. A common approach in big number libraries uses half of the bits in the native memory word (16bit in a 32bit architecture) for a digit. That ensures that we can apply any basic operation to a digit and the result will fit in a word. Then we can post-process all the digits normalizing the data to maintain the invariant that each element in the array never uses more than half the bits. That is, we can process all digits separately without overflows, and then fix the carryovers in a single pass.

² A dynamic array? Manually implemented? Really? In production code, this would be implemented with a std::vector<int>. Using a vector will ease some of the problems of manual management, but many of the things in the post still apply when using a vector, but it seems more graphical with manual memory management.

³ Not in the C++ sense, only a temporary in the sense that its a short lived object (only within the scope of this operation)

⁴ But who exactly is Dinkumware? Dinkumware is the company that licenses the STL implementation that is shipped with Visual Studio, so if you have used Visual Studio 2003 or later, they are the implementors of the STL you used.

Dynamic dispatching to template functions

2011-08-17T06:11:00.005-04:00

As in the first post I wrote, less than a month ago, this time I am going to visit another stackoverflow question. The user has a template parametrized by a template argument and wants to write a non-templated function that takes an integer argument and will
dispatch the call to the appropriate template.

Basically the problem to solve is:

template <int N>
struct A {
   static void f();
};
void dispatch( int i ) {
    A<i>::f();              // !!!
}

This will not compile, the argument i is a runtime value, it might come from user input, a file, the network... but the templates are proccessed way before by the compiler.

A first approach to the problem would be manually creating a lookup table:

typedef void (*fptr)();
fptr lookup[] = { &A<0>::f, &A<1>::f, /* ... */ };

And it will work: at compile time all of the instantiations of the template are created and functions pointers stored in the array, now dispatch only needs to use that lookup table and call lookup[i]();. But it is a little cumbersome to write if the possible values is more than a couple.

The problem

Automate the creation of the lookup table so that the user does not need to enter all of the possible values manually.

Approach

Because we are talking about template instantiation, automatic instantiation of the templates and build up of the lookup table must be done at compile time, only lookups will be performed at runtime.

Compile time programming implies working with the template sublanguage, which was not designed to be a programming language. Not being born as a true language, it is a bit awkward to work with, and the set of language primitives that can be used is small: there are no loops, or conditions.

Metaprogramming might be a bit cumbersome, but it is not that hard. It basically boils down to creating a class template that solves the general step of the algorithm and recursively instantiates itself with a smaller subset of the problem, then creating an specialization that represents the stop condition. It is not that different from a regular recursive function if it were not for the fact that the conditions are not checked inside the function, but by the compiler doing pattern matching of the arguments.

Setup of the problem

First we need to create the lookup table. I don't like having to type complex types often, and arrays of function pointers are not a simple type, so I will just start with a typedef, a constant to represent the size and the lookup array itself:

typedef void (*fptr)();
const int limit = 10;
fptr lookup[ limit ];

Now we just need to fill it with values.

General step: fill the Nth element

In the general step we fill an array of N elements by filling in the Nth element and recursively calling the same function to solve the smaller problem of filling N-1 elements.

template <int N>
struct init_lookup {
    static void init( fptr *lookup ) {
        lookup[ N ] = &A<N>::f;
        init_lookup<N-1>::init( lookup );
    }
};

Not that hard, we just need to call init_lookup<N>::init( lookup ); and that will initialize all elements from the Nth to the zeroth... oh, well, not really. We did not add a stop condition, so let's do it.

The stop condition

The stop condition is an specialization that will be matched by the compiler. In our case, when the problem to solve has a single element (0). In that case we want to fill the element in, but we do not want to continue instantiating the template.

template <>
struct init_lookup<0> {
    static void init( ftpr *lookup ) {
        lookup[ 0 ] = &A<N>::f;
        // No recursive call
    }
};

Syntactic sugar

With the solution as implemented the user can just call the appropriate init_lookup<limit>::init function from main and it will be set. But I find it nicer if the lookup table was automatically created without me having to retype the sizes, even if I do have a constant for it. That can be easily achieved by providing a helper function:

template <int N>
void lookup_initialization( fptr (&lookup)[N] ) {
    init_lookup<N-1>::init( lookup );
}

If we change that function signature to return an integer, we can use it to initialize a static variable and we will not need to call it from main.

The final solution

In the final solution I have changed the templates a bit so that the array is passed by reference. I tend to prefer passing arrays by reference as the compiler gets the extra size information. We could then add an static assert to verify that we do not try to write beyond the end of the array. I have also offset the position argument by one so that the specialization is just a stop condition and does not contain any logic.

typedef void (*fptr)();
namespace {
    // L: size of the array
    // N: element to initialize (offset by 1)
    template <int L, int N>
    struct init_lookup {
        static_assert( N <= L );
        static void init( fptr (&lookup)[L] ) {
            lookup[ N-1 ] = &A<N-1>::f;
            init_lookup<L,N-1>::init( lookup );
        }
    };
    template <int L>
    struct init_lookup<L,0> {
        static_assert( L >= 0 );
        static void init( fptr (&lookup)[L] ) {
        }
    };
    template <int N>
    int lookup_initialization( fptr (&lookup)[N] ) {
        init_lookup<N,N>::init( lookup );
        return 0;
    }
}
const int limit = 10;
fptr lookup[ limit ];
static const int xxx_ignored = lookup_initialization(lookup);

Value semantics: Copy elision

2011-08-10T19:14:00.002-04:00

In the last post we discussed Named Return Value Optimization and one case of copy elision when constructing a variable from an rvalue expression.

But copy elision is not limited to those particular use cases. The question is where else can we take advantage of this optimization, and where the compiler will not be able to optimize away the extra copies.

In the last installment we saw that the compiler can elide copies when transferring data from a function to the returned value ([Named] Return Value Optimization) and that copies can also be optimized away when initializing local variables of automatic storage duration from temporaries. In principle, copy elision can be applied whenever a variable is constructed from a temporary, that includes to function arguments. But first, lets diverge a bit into a case where the copy elision is taken from granted by most programmers.

Copy elision in function arguments

As with the returned value from a function, the calling convention determines how arguments are passed into functions. So again lets start from a simple example:

Argument passing by value

Because the function may modify arg internally, the compiler must ensure that there are two copies of the object one for f use and another for main. Now, the function above does not need a copy, as it does not modify the object, so the common rule is to pass the object by constant reference and avoid the unnecessary copy. The reference itself will incur the approximate cost of a pointer in this case, so if type is expensive to copy we just improved performance.

This is the first advice you get when starting with C++: pass by reference to avoid the cost of copying. But what if you do need to copy anyway? Say that f above had to modify the object as part of the operation, what would be more efficient?

To analyze the two options we have to take a detour and understand what it actually means to bind an rvalue to a constant reference.

Binding an rvalue to a constant reference

In C you cannot take the address of a temporary, it is a safety net around what would most probably be an error: the temporary will be destroyed immediately after the expression completes, and then you will left with a dangling pointer. In a similar way, in C++ you cannot bind an rvalue to a reference. But sometimes it can be useful. For example, in the case of f above we determined that it would be more efficient to take the argument by reference, and we might want to call it by directly passing the result of an rvalue expression without having to create extra variables. That is why the language adds an special rule in the language (§8.3/5) to allow such binding:

a temporary of type “cv1 T1” is created and initialized from the initializer expression using the rules for a non-reference copy initialization (8.5). The reference is then bound to the temporary. If T1 is reference-related to T2, cv1 must be the same cv-qualification as, or greater cv qualification than, cv2; otherwise, the program is ill-formed.

Reference bound to temporary

The temporary variable _Tmp is created, and the rvalue-expression result _R is copied. Finally the reference is bound to the local temporary _Tmp. This is just a different variety of the same copy elision in the previous post, where there returned _R and the temporary variable _Tmp are merged together. Then the lifetime of the temporary is extended until the reference goes out of scope. Conceptually, the temporary is equivalent in this context to a local variable injected by the compiler, and will have that same lifetime. There are interesting details that make a difference, but that is something for another day.

Pass-by-value or reference when you do need to make a copy

Going back to the g1 and g2 example, and assuming that the caller has a local object to pass in, the cost of either solution is roughly equivalent. As in the case of f the compiler will create a copy (arg) when calling g2, but because that copy is needed, the cost is equivalent to passing by reference and then copying into copy inside g1. There is no extra cost in passing by value, but is there any advantage?

There can be under some circumstances. If the caller does not pass a variable, but rather the result of an rvalue expression, in the first case the reference will be bound to the result of the rvalue expression, and as we just saw that implies copying to another temporary _Tmp. The compiler will elide that copy and pass the reference into g1 which will then copy it into copy.

On the other hand, with g2, the compiler can merge the result of the rvalue expression _R with arg.

By using value semantics in the function signature we are providing the compiler with extra information about the semantics of our function. When the compiler processes our caller function, it can construct the temporary in place of the argument and avoid the cost of the copy. More information to the compiler usually means greater chances of optimization, and this is one such case.

At no time during the whole discussion we have actually dealt with the definition of the type class, whether it is a small type like a std::pair<int,int>, a large type with automatic storage like std::array<int,1000> or an object that manages dynamically allocated resources like std::vector<int>, the copy elision will be performed, and we will get the exact same optimization.

The compiler will elide copies when returning from a function and when calling a function that takes the argument by value, but can it do both?

Returning an argument passed by value

Sadly it cannot. The situation is simple to understand. The calling convention will determine the location of the argument and the returned value from the function, the compiler cannot place the argument and the returned value on the same memory location.

Before copy elision
After copy elision

In the code above, even if only one object is really alive at any given point (excluding the copying), the compiler cannot optimize those the argument to the function with the return statement. While the current standard does not treat all cases for copy elision, the latest draft (n3290) of the upcoming standard explicitly states this in §12.8/31:

[...]This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):

-- in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value

The next open question is whether this is the best we can hope for. After all we know that we only need one object in that program, but the language cannot help in removing the extra object and the potentially expensive associated resource copying. We cannot avoid having two objects in the program, but in specific use cases, if the cost of copying the object is not the object itself, but resources managed by it (think dynamically allocated memory), we still have a escape path: move semantics, but that's a story for another day...

Looking back at the number interface, what's the deal with operator+? Why would it be defined as it is (take first argument by value, return by value) if that copy cannot be elided? Is it not better to pass by reference the first argument, create the local variable on which to operate and then return it?

Value semantics: NRVO

2011-08-03T04:40:00.001-04:00

C++ is a language with value semantics. It was designed so that user defined types will behave in the same ways that primitive types do. This offers advantages, but also imposes burdens on development: programmers have to implement those semantics, and it also falls in the programmers to decide how parameters are passed in and out of functions and the impact that has.

With the language being designed with value semantics in mind, you would imagine that some optimizations are in place to avoid unneeded processing. In this category you can find things like [Named] Return Value Optimization, or copy elision in the current standard, or move semantics in the upcoming C++0x. Understanding what they mean, and when the compiler can or cannot apply them will improve efficiency and readability in the code.

There are quite a few of articles, blogs and what not about how to make your code more efficient. Most of them are good and offer sound advice, but you find out after a while that even the best advice is often misinterpreted. And sometimes the advice is wrong or at the very least misleading (you can try and find some in my last post now as an exercise). As important as the advice is understanding under which circumstances it applies, and when it doesn't. But lets start from the beginning, what does NRVO mean?

[Named] Return Value Optimization

We can start with a small code sample, and a drawing of the objects in the program:

_R: return value

The exact contents of type do not really matter much, the fact is that it is a user defined type which might potentially be expensive to copy. The drawing on the right represents the layout of the objects that exist in the program. The blue box represents the main function, where X resides. The grey box represents the code in function, where variable Y is created. According to the standard, the return statement copies the value from Y to _R the returned object, an agreed location where the data is to be handed from function to its caller. Where those objects are really depends on the calling convention, you can think of them as decreasing addresses in the stack if you wish, but what matters is that they are somewhere, and data must be copied from one to the other.

The standard explicitly allows the implementation to avoid the creation of temporaries, including _R. But how is that done? Actually quite simple. When processing function the compiler knows that Y has as only purpose in life to serve as the seed from which _R is copied, and the lifetimes of the two objects are intimately bound: destruction of Y and construction of _R are basically simultaneous. The compiler can avoid creating two separate objects, and just use the same memory location for both.

_R and Y are the same object

The first question that comes to mind, is which object is not created, Y or _R. None, or both of them, or maybe it does not matter. _R cannot be removed from the program, as it is a contract with the callers of the program its location is outside of the control of the compiler when processing the function. But all the code in function uses Y so it cannot be removed either, unless the object called Y inside function is located over the agreed location of the _R object. The single object is both Y and _R.

The case of RVO is similar, in the case where the object that is being returned by the function is a temporary itself (without a name). The fact that the temporary does not have a name does not mean that it does not exist, it will take the place of Y in the discussion above.

When can the compiler apply this optimization

The (unnamed) Return Value Optimization can basically applied mostly anywhere. The creation of the temporary inside function can be done in place of the returned object. In the more complex case of Named-RVO, it all depends on how the function is implemented. Compilers are quite smart and can apply the optimization in many different cases, but not always.

To perform the optimization, the compiler needs to know that Y will be returned before deciding the location of the object, so that it can match _R. In the most general case, this means that if a function has a single return statement, or all return statements refer to the same named variable (or possibly a temporary), then the compiler can merge Y with _R into a single object.

In this case, the compiler must create both variables X and Y and it must do so before calling should_return_first. Until that function returns the compiler does not know which of the objects is to be returned. It cannot merge either with _R.

In this last case, there are two local objects might be returned by the function, but the compiler does know which of the two objects will be returned, and can turn the code into the equivalent:

In which in each branch of the if the compiler can place either X or Y in the place of _R (for what matters, the compiler might even be able to avoid the creation of the other variable). Still, your best chance is to keep code simple, create the objects only when you need them and keep your functions simple for the compiler to analyze.

What about the receiving end?

Originally we started with three objects and the compiler optimized with NRVO one of them out. But what about the other? In our program we only need one object, we added a function to factor out some of the complexity into its own reusable piece of code, but we do not want to pay for an extra type object that we do not need.

As with RVO, when processing the caller, and in this particular case where the temporary object _R is used to copy-construct a local object, the compiler can follow the same line of reasoning: since the only purpose of the temporary is to serve as the source for X and the lifetimes only overlap during the copy, it can merge _R with X. Now or program has a single object:

Summing it up

C++ is a language with value semantics, and that means that there might be potentially many objects in your program being copied here and there, in particular across function calls. This does not mean that you should not factor out code into functions, or that you should refactor your function signatures to avoid return copies in favor of references, this might actually have a negative impact in behavior. The compiler is there to help you avoid the costs of copies, and it does a good job at it. Never depend on side effects from copy constructors, as small changes in the code inside a function might allow or inhibit NRVO.

Value semantics is a hot topic in the language, more so with the inclusion of r-value-references and move semantics to the upcoming standard. Expect to read more on the subject.

What about arguments to functions (rather than returned values)? Can copies be optimized there? Under which circumstances? Can the interface of the number type in last week's post be improved to allow for some optimization? Is there anything that won't help?

Operator overloading, OO the C++ way

2011-07-28T18:33:00.004-04:00

If you question around about a mainstream object oriented language, most people will point to Java, or C#. Sure that C++ has classes, and objects, inheritance, polymorphism... but it's not really object oriented, there are still non-member functions, and that is just so no-OO. Or is it?

There are different programming paradigms, some of them better suited for some problems than others. You can pretty much solve all problems with any paradigm, but some paradigms help modeling the domain of the problem easier than others. What is important to remember is that programming is not about the tools you use, programming is about solving a problem.

In pure OO everything is an object, and operations are executed on one object and each object has a set of methods that form its interface. In C++ not everything is an object, and the interface of a type is not just the set of methods, but it also includes the set of free functions that are defined in the same namespace

From a design standpoint the question is how does the operations in the domain map to the language? Does every operation belong to a particular type? Or is there space for free functions?

Study case

Designing a numeric type (be it a biginteger, a decimal or just a simple integer type with a range larger than that those available in your platform) is a good exercise. The domain is well understood and we can focus on the design of the interface. We don't even need to think of names for the operations, overloadable operators fit the domain perfectly. The C++ language allows for overloading of most operators both as a member function and a free function, so it will not force a decision upon us.

Creating a number

For starters, we want to be able to construct our number and we want to allow conversions from other arithmetic types, for simplicity just consider double. The compiler can convert any integral or floating point number into double for us, and this will enable creation of number from any other arithmetic type.

Because the first constructor can be used to perform implicit conversions from double there is no need to provide an assignment operator that takes a double, it will work out of the box. First the right hand side will be converted, and then the assignment operator will be executed.

Now we can add some arithmetic operations, using addition as a pattern. In the domain (math) addition takes two elements and produces a third unrelated element that is the result of the operation. In programming it is common for efficiency reasons to provide a += operator that behaves like addition but stores the result in the first element. Then we can define regular addition in terms of the previous +=: adding two numbers together can be expressed as making a copy of one of them and then applying += to that copy with the second value.

Interface of operator+= and operator+

The += operator is a natural candidate for a member function. The operation is applied to a particular instance, it is a feature of that instance. Then we have operator+. As described above, addition takes two elements and yields a third element with the aggregate value. It is not more of an operation on the first argument than it is on the second, so there is no compelling reason to make it a member function, and we should prefer a free function as that treats both arguments similarly.

The signature of operator+ follows a common idiom when overloading operators, but for now just focus on the fact that it is a free function.

Differences between a member function and a free function when overloading

The main difference is how overload resolution is performed. When the operator is implemented as a free function, both arguments have equal standing with respect to the compiler, any conversion that can be applied to the right hand side can also be applied to the left hand side.

In the member function case, the two arguments don't have equal standing, the left hand side argument must be of the type that implements the operator. The compiler is able to perform implicit conversions on the right hand side, but the left hand side is sacred.

The same decisions we made in the design are translated into the code: when we implemented operator+ as a free function claiming that the operation is no more of the first argument than the second, we got symmetry from the compiler. When we decided that operator+= was a member function we got asymmetry, it can only be called on objects of type number.

You can argue whether C++ is better or worse at object orientation than other languages, but there are certain things that are harder to model while tied to the constraints of everything is an object and all operations are methods.

I have kind of hand waved over quite a few things of the design. One of the things left aside are the Rule of the Three (if the implicitly generated copy constructor and assignment were not good enough, we probably need a destructor, but not having provided an implementation I just ignored it). The signature of the operators are interesting by themselves (can you think of anything I should have done differently? Assume that this is a big number implementation that has to manage some expensive resource. Drop me an email at definedbehavior at gmail dot com), as is the effect of friendship on this problem, note that we do not need it, we can implement operator+ on top of the existing member operator+=, but maybe it could help us somehow? But that's a story for another day.

Implementating vs. understanding

2011-07-23T12:28:00.003-04:00

Solving a problem and actually understanding what the problem is do not always go hand by hand. Many times not just the problem eludes understanding, but even the solution we just wrote.
Sometimes this is the case when we are dealing with an elusive bug that only happens under very specific conditions, and by the time you realize that something is wrong it is already too late. And then in desperation you change code here and there or rewrite a piece of code... and the problem seems to go away (has it really or is it just playing hide and seek?)

But that can also happen with simple problems. I was asked, out of the blue as a thinking exercise, to provide a solution to a simple problem: Given a single linked list, provide a function that reverses the list in place. Oh, well, that is an easy problem to solve... should have a solution in 20 seconds.

Imagine a list of nodes, linked by a next pointer, and finished in a NULL pointer. To reverse each node you just need to keep pointers to the previous node, the node and the next node, then you make it point back to the previous node, and you advance the list pointer and you

cannot see the forest for the trees

Having to provide an answer in a few seconds has that effect, you work out the details, forget the problem. And you produce a solution:

Then the next day I was cycling to work, and I thought that lists and functional programming tend to go hand by hand, so how would a functional programming solution to the problem look like?

This solves the problem, but not efficiently. The recursion can turn to be expensive, can we turn this into tail recursion? Tail recursion is good, it avoids the need for the call stack providing a restricted form of continuation passing style and the compiler can avoid holding to the stack. Compilers can turn tail recursion into much cheaper loops.

To avoid holding to state until the function returns we can just pass that state in. The first argument is the input list, the second argument the reversed list. We take one element from the input list and put it at the beginning of the reversed list. When the input list is empty, just return whatever is already reversed. I added an extra function so that the signature matches that of the original problem.

Epiphany

Wow, now I understand the problem and I understand the original solution. The problem is not moving pointers around. The solution is to iteratively remove one element from the head of the input list and insert it at the head of the reversed list.

When you have a problem that can be stated in simple terms, the answer should be explainable in similar terms. High level languages have that characteristic, they let you focus on the problem, rather than the nuisances of the implementation. If you have to explain the answer as detailed technical steps, you don't really understand it.

Improving conversions of std::pair<T,U>s

2011-07-21T18:58:00.005-04:00

As we already saw, even if the implementations are more permisive, the standard mandates that conversions from std::pair<> types should use only implicit conversions. That solves only half of the problem: when two types are implicitly convertible, then std::pair<> containing those types are also implicitly convertible.

But what about explicit conversions? Do we need to fail there? It would seem appropriate if, given two types that are explicitly convertible, we allowed explicit conversions of the std::pair.

From here on, we digress from the standard. This is just a simple example of how to use SFINAE⁽¹⁾. As this is the first post with SFINAE, it will be longer than I expected.

The problem

What we would like is to allow implicit conversions of pairs of different types when the types are implicitly convertible pairwise, and also allow explicit conversions of pair when the respective types are not implicitly convertible. If only one of the types can be implicitly converted, we will require an explicit conversion for the pair

The solution

The solution for the problem is providing two different constructor templates that allow for the conversion. The first one of them will be implicit and will only work for types that are implicitly convertible, while the second one will be marked explicit and will be available whenever the first one is not. If both where available for the same combination of types, the compiler would fail to process the call with an ambiguity error. Because SFINAE requires that the error/failure is during the substitution phase, we will need to change the signatures of the constructors, but we will aim to maintain compatibility of user code.

Detect implicit convertibility

This is probably the simplest part of the problem. We need a template with two type arguments, that offers a boolean constant true when the types are implicitly convertible.

template <typename From, typename To>
class implicit_convertible {
    typedef char (&yes)[1];
    typedef char (&no)[2];
    static yes test( To );
    static no test( ... );
public:
    static const bool value 
           = sizeof( test( *(From*)0 ) ) == sizeof( yes );
};

We create two types of different sizes yes and no, and we define two static functions, the first of which takes an argument of the destination type. The second is an ellipsis catch-all. We ask the compiler to perform overload resolution for an object of type From⁽²⁾ and we check whether the first overload was chosen.

Implementing SFINAE

For SFINAE to work, the compiler must be able to infer the types of the template from the arguments, but when substituting the inferred types in the templates, the compiler must be unable to produce a correct signature. As a helper we can use a variant of enable_if:

template <bool enabled, typename T = void>
struct enable_if {};

template <typename T>
struct enable_if<true,T> {
   typedef T type;
};

The template is specialized for the case where the first argument is true, in which case it defines an internal type type. If the first argument is false the type is an empty class. Now we can use this to define our complete solution:

Implementation

template <typename T1, typename T2>
struct pair {
   T1 first;
   T2 second;
   pair() {}
   pair( T1 f, T2 s ) : first(f), second(s) {}
   
   template<typename U, typename V>
   pair( pair<U,V> const & p, 
         typename enable_if< implicitly_convertible<U,T1>::value
                         and implicitly_convertible<V,T2>::value 
                           >::type* p = 0 )
      : first( p.first ), second( p.second )
   {}

   template<typename U, typename V>
   explicit pair( pair<U,V> const & p, 
                  typename enable_if< !implicitly_convertible<U,T1>::value
                                   or !implicitly_convertible<V,T2>::value
                                    >::type* p = 0 )
      : first( p.first ), second( p.second )
   {}
};

To be able to use SFINAE we have added an extra argument that uses the enable_if template, if the condition is not met, then typename enable_if< false >::type will not resolve to a type, and the compiler will discard that overload. It is important to note that the conditions must be mutually exclusive, else the compiler will generate an ambiguity error.

If instead of a constructor we were applying SFINAE to a regular function, we could have used enable_if in the return type, and the signature of the function after substitution would have remained unmodified. In the case of a constructor that is not an option, and we were forced to add the extra argument. By making it a pointer and providing a default value we manage to maintain user code compatibility with std::pair, but we accept calls to the constructor with an extra void* argument. Probably not an issue.

⁽¹⁾Substitution Failure Is Not An Error. That weird acronym stands for the fact that, after lookup is performed and a template is determined to be a candidate for overload resolution if the substitution of the inferred types in the template fails, the compiler will discard that particular template and continue trying the rest of the overload candidates without triggering an error.

⁽²⁾To avoid imposing arbitrary requirements in the type, we create a pointer of the source type initialized to 0 and we dereference it. This is undefined behavior in real code, but because we are using the whole expression inside a sizeof the expression is not evaluated, it is only used to extract the type From& so we are fine.

Conversions of std::pair<T,U>s

2011-07-20T18:41:00.084-04:00

I decided to start writing a few months ago, but I never actually committed to it. Then I decided a few days ago that I was actually going to do it, and I have spent the last few days trying to decide what would be a good first post. I am still undecided. But today I read an interesting question in StackOverflow, and decided to write about it.

The problem Rafał has is with implicit conversions. In his application he has two types that are convertible, but only explicitly convertible. He is surprised that an std::pair<>that contains one of those types is implicitly convertible to a std::pair<> of that contains the second type.

The problem

Given two types A and B, such that an explicit conversion from A to B is valid, but an implicit conversion is not, why is std::pair<A,A> implicitly convertible to std::pair<B,B>?

Well, the answer to this type of question is usually simple: the properties of the types used to instantiate a template do not propagate to the template itself. That is the reason why you cannot pass a std::vector< derived* > to a function that requires a std::vector< base* >, even if you can pass a pointer to derived to a function that takes a pointer to base.

The only operation in std::pair that allows for conversions is a constructor template. The implementation of that template in the STL shipped with gcc uses the initialization-list to initialize the members of the pair. Because that is explicit initialization, the compiler gladly accepts the code.

But, is this the case?

No, not really. In this case the standard is by Rafał's side. The description of the behavior for that particular conversion constructor is stated in §20.2.2:

template <typename U, typename V> pair( const pair<U,V> & p );

Initializes members from the corresponding members of the argument, performing implicit conversions as needed.

The standard seems quite clear in stating that implicit conversions are to be used, so this seems like a bug in the compiler side not adhering to the standard. As always, the interesting question is the why.

Can we implement that constructor template with the semantics defined in the standard?

We cannot use implicit conversions in the constructor without adding additional requirements to the instantiating types (for example, using default construction and assignment of the arguments). But if our goal is just to abide the standard and reject all calls to that template when an implicit conversion is not available, that is easy.

Detecting implicit convertibility is simple, and we can add a static assert to the constructor. The con of the approach is that while this will inhibit implicit conversions when the pair's elements are not implicitly convertible, it does not allow for explicit conversions of the pairs. But that is probably fine. Or can we do better?