Saturday 28 February 2009

Managing object ownership in C++ with auto_ptr

I really like garbage collected languages, they make you feel safe, confident for not having to explicitly manage memory...but at some price: expressiveness on object ownership and lack of control of objects lifecycle; when an instance of a class is created, no one but the garbage collector is responsible for taking care of that object.

We are fine as long as we are only dealing with memory, but lacking such ability to communicate responsibilities is a pain when resources (e.g. files, locks, connections to external systems) come to scene.

C++ has mechanisms to express responsibilities a class has over other classes and to control when objects are destroyed. This first of a series of post in which I want to write about such mechanisms; today it's the turn for std::auto_ptr.

Smart pointers are classes that mimic "classic" pointers by overloading operator* and operator->, but provide "extra features" (e.g. memory management, locking) to their pointees.

std::auto_ptr is a smart pointer included in the Standard Template Library; its main features are:
  • it owns its pointee, which means that when an auto_ptr is destroyed, so is its pointee.

  • it can transfer the ownership of its pointee to another auto_ptr.


This is a simplified version of auto_ptr:

namespace std {
template <class X>
class auto_ptr {
public:
explicit auto_ptr(X* p =0) : pointee_(p) { };
auto_ptr(auto_ptr& rhs) {
pointee_ = rhs.pointee_;
rhs.pointee_ = 0; // relinquish ownership
}

~auto_ptr() { delete pointee_; }

X& operator*() const { return *pointee_; }
X* operator->() const { return pointee_; }
private:
X* pointee_;
};
};

Notice how pointee_ is destroyed in the destructor of auto_ptr and how ownership of pointee_ is transfered in the copy constructor.

auto_ptr's power arises when it is used with value semantics, that is:
  • as a local variable (stack allocated): when the auto_ptr goes out of scope, it is destroyed and so its pointee is deleted, avoiding any explicit memory management on our side. This still holds true in presence of exceptions:

    void doStuff1 () {
    MyClass *myClass (new MyClass);

    // some dangerous stuff here that likely
    // throws an exception, leaking *myClass

    delete myClass;
    }

    void doStuff2 () {
    auto_ptr<MyClass> myClass (new MyClass);

    // some dangeous stuff here that, even
    // upon exceptions, does not cause leaks

    // no explicit delete needed
    }

    This is a form of the RAII (Resource Acquisition Is Initialization) C++ idiom, which I plan to post about soon...


  • as a function parameter passed by value: when an invocation to that function occurs, and because of the copy of the auto_ptr performed to build the parameter, the caller relinquishes ownership on the auto_ptr in favor of the callee. The auto_ptr's pointee will be destroyed when the callee returns, unless it also relinquishes ownership within its body. This idiom is called the "auto_ptr sink".


  • as return value of a function: when a function returns an auto_ptr by value, it is relinquishing ownership on it in favor of the caller. This idiom is called the "auto_ptr source".


  • as a class member: this is often used to tie the lifetime of the auto_ptr's pointee with that of the class instance the auto_ptr is member of; when that object is destroyed, so is the auto_ptr and so the pointee.



These uses of auto_ptr enable the programmer to communicate how variables are meant to be used. Let's see some examples:

  • Object creation by factories: factories are meant to encapsulate object creation. They are often implemented this way:


    class MyFactory {
    public:
    MyObject* createMyObject();
    };

    Seeing that code, you cannot be sure whether the caller is responsible or not for the destruction of the just created object. A possible solution would be to drop a comment depicting who the responsible one is, but instead let's use a smart pointer to replace the dumb one:

    class MyFactory {
    public:
    auto_ptr<MyObject> createMyObject();
    };

    This way, using the auto_ptr source idiom, the factory relinquishes ownership of the just created object in favour of the caller, so we are sure about who the one responsible for the object is just looking at the code.


  • Decorator pattern: with such design pattern, an object wraps another one to provide extra features; both them implement the same interface, so the wrapped version is used seamlessly as if it were not wrapped. In the usual implementation of this pattern, the decorator class receives a pointer to the wrapped one in its constructor, stores it as a member variable, and destroys it in its own destructor. As in the previous example, let's use a smart pointer instead of a dumb one to better communicate responsibilities on the wrapped object's lifetime.


    class IStuff {
    public:
    virtual std::string getStuff () = 0;
    };

    class Foo : public IStuff {
    public:
    Foo () { }
    std::string getStuff () {
    return "foo";
    }
    };

    class Decorator : public IStuff {
    public:
    Decorator (std::auto_ptr<IStuff> decorated)
    : decorated_ (decorated) { }
    std::string getStuff () {
    return "decorated " + decorated_->getStuff();
    }
    private:
    std::auto_ptr<IStuff> decorated_;
    };

    int main (void) {
    std::auto_ptr<IStuff> foo (new Foo);
    std::auto_ptr<IStuff> decoratedFoo (new Decorator(foo));

    std::cout << decoratedFoo->getStuff() << std::endl;

    return 0;
    }

    Note how the decorated object is passed inside an auto_ptr to Decorator's constructor, thus we are expressing Decorator is the one responsible for managing it. Also note how the auto_ptr is stored in a member variable, thus tying the lifetime of the decorated object with that of the decorator.


Some final thoughts...

  • A variation in how auto_ptr behaves, arises when it is declared const: a const auto_ptr cannot transfer the ownership of its pointee. In this case, the pointee is tied to that very auto_ptr, no matter what happens. This is a somehow degenerated use of auto_ptr; for this, it is more appropriate to use boost's scoped_ptr.

  • Beware of auto_ptr when dealing with STL containers, they are not auto_ptr friendly because they make internal copies of the data, sometimes maintaining more than one copy at the same time, which goes agains the auto_ptr allowed usage. So just avoid using lists (or vectors, or whatever STL container) of auto_ptr.

  • I intentionally avoided mentioning a member function of auto_ptr called "reset" that enables reusing the auto_ptr for other pointee. I tend not to use it because I think it's quite misleading, as auto_ptr is often used to communicate durable ownership semantics.



Preparing for the future: unique_ptr
Since its inclusion in the C++ standard, auto_ptr has hold much controversy due to both its semantics and its implementation. In the new C++ standard, known as C++0x, auto_ptr is DEPRECATED. It is to be replaced with class unique_ptr, taken from boost, which basically work like auto_ptr but without its deficiencies. To do so, it makes use of rvalue references, one of the goodies C++0x will bring, which enable to express "move semantics".

Saturday 14 February 2009

Koenig lookup...wha?

This is the first piece of C++ code many programmers wrote:


std::operator<<(std::cout, "hello world!").operator<<(std::endl);



Don't believe me? Let's reorganize it a little bit:


std::cout << "Hello world!" << std::endl;


More familiar now, isn't it?

The dark feature that makes it possible for the latter to be equivalent to the former is a feature of C++ language called "Koenig lookup" a.k.a. "Argument Dependent name lookup".

Let's unveil why without it you could not write that piece of code the easy way....

This is the signature for operator<<:


// Extracted from libstdc++-v3, with minor modifications
namespace std {
template<typename _CharT, typename _Traits&
basic_ostream<_CharT, _Traits> &
operator<<(basic_ostream<_CharT, _Traits>& __out, const char* __s);
};


Better without the templates clutter...


namespace std {
ostream& operator<<(ostream&, const char*);
};


You can see it's defined in namespace std. like the other stuff from the STL.

When something is defined inside a namespace, we have to either use "using" directive or prefix it with that namespace and "::", like with cout and endl in this piece of code...


std::cout << "Hello" << std::endl;


But then why don't we need to use any prefix for operator<< ??? Because of Koenig lookup, which states that unqualified calls (i.e. not namespace-prefixed) to functions are also looked up in the namespaces associated to the types of the invocation arguments.

This makes it possible for operators to be called using the classic infix syntax and to avoid annoying namespace prefixes all over the place, without the need of a "using namespace".

Koenig lookup relies on the interface principle, which states that nonmember functions which mention a class and are supplied along with that very class (e.g. are declared in the same header file), are also part -logically speaking- of its interface.

But it can also lead to the opposite, forcing us to prefix a function call even when we already placed a "using" directive or directly being inside in the same namespace the function is declared in, due to the ambiguity it introduces (this is often referred to as Myers' example):


namespace A {
class AClass { };
ostream& operator<<(ostream&, const AClass&);
}

namespace B {
ostream& operator<<(ostream&, const A::AClass&);
void doStuff(A::AClass& a) {
cout << a; // ambiguous, operator<< from A or from B?
}
}


One last curious thing...do you know the type of std::end? it actually is a function template that receives an ostream as argument; it just appends an endline character to its input ostream and then flushes it:


// From libstdc++ v3.
template <typename _CharT, typename _Traits>
inline basic_ostream <_CharT, _Traits>&
endl(basic_ostream <_CharT, _Traits>& __os)
{
return flush(__os.put(__os.widen('\n')));
}


It is possible to append a function template to a series of invocations to << because of class basic_ostream's member operator<<, defined as follows:


// From libstdc++ v3.
__ostream_type&
operator<<(__ostream_type& (*__pf)(__ostream_type&))
{
return __pf(*this);
}


To test it, just use std:endl as the function it actually is:


std::endl( std::operator<<( std::cout, "hello world!" ) );


NOTE: for those historicians out there, just recall Koenig lookup was not there since the beginning; it replaced "friend name injection" which stated that when a class template is instantiated, the names of friend classes and functions are “injected” into the enclosing namespace. g++ still supports this feature with the -ffriend-injection flag.

NOTE: cited pieces of the standard template library have been taken from GNU Standard C++ Library v3.