C++ Custom Iterators

Iterators are such a great concept. They completely decouple the container holding the data from the algorithms that operate on it. They are a great example of modularity, where two distinct systems operate together via shared auxilary objects.

An iterator is an object of some distinct type, and have similar semantics to that of a pointer depending on which category they belong to. Since they are objects they can, and often do, hold custom data specific for the container for which it belongs to.

Iterator Categories

Iterators belong to one of the five iterator categories:

input iterators are read-only and can only be read once; e.g. when reading from a file or similar stream of items.

output iterators are write-only and can only be written once; e.g. when writing to a file or similar stream of items.

forward iterators support the same operations as input and output iterators but they can be read and written to multiple times. They can only be iterated on in a forward direction, i.e. they only support operator++ and not operator-- for “navigation”.

bidirectional iterators support the same operations as forward iterators with the additional support of operator--, i.e. they can “navigate” backwards through the container.

random access iterators support the same operations as bidirectional iterators as well as the ability to jump to any location in the container, i.e. they support operator+ and operator-.

These are not classes in a class hierarchy, rather semantic definitions in terms of each other. Diagrammatically it looks like this:

input  <- forward <- bidirectional <- random access
output

Iterator Semantics

As mentioned an iterator is an abstraction of a pointer and thus support the same set of operations to a varying degree depending on which category the iterator belongs to.

The minimum set of operations an iterator must support is:

Iterator

Copy constructible - X(const X&)
Copy assignable - X operator=(const X&)
Destructible - ~X()
Dereferenceable - operator*
Pre-incrementable - operator++

Each iterator category then incrementally extends this basic operational set¹:

Input / Output Iterator

Everything from Iterator
Equality - operator==
Inequality - operator!=
Post-incrementable - operator++(int)

Forward Iterator

Everthing from Input/Output Iterator
Default constructible - X()
Immutable (const iterator)

Bidirectional Iterator

Everything from Forward Iterator
Pre-decrementable - operator--
Post-decrementable - operator--(int)

Random Access Iterator

Everything from Bidirectional Iterator
Compound addition assignment - operator+=
Compound subtraction assignment - operator-=
Addition - operator+
Subtraction - operator-
Relational less-than - operator<
Relational greather-than - operator>
Relational less-than-or-equal - operator<=
Relational greater-than-or-equal - operator>=
Subscripting - operator[]

Iterator Use

All the Standard Library algorithms operate on iterators, as do range-based for loops. This is an incredible toolkit to provide support for as clients are already familiar with its usage, and don’t have to learn new potentially esoteric APIs for similar functionality.

Consider the base case example of clients wanting to iterate over the elements in your container. Using a container specific API you might have to write code like this:

for (int i = 0; i < dataStructure.GetCount(); ++i)
{
    auto p = dataStructure.GetObject(i);
    
    // Do something with p
}

Which is very noisy and unintuitive; you’d probably have to look at the class definition or documentation just to figure out how to iterate over the elements. It’s much more intuitive and easy to use when supporting the standard semantics of iteration:

for (auto p : dataStructure)
{
    // Do something with p
}

Not only is this less to type it’s also more clearly expresses the intent of the code, and more importantly: it support the expected behaviour from clients. There’s no confusion, no needing to look at header files or documentation, it just works as expected.

Iterator Implementation

As its name implies range-based for iterates over a range. This range is determined by the open-ended [begin, end), meaning starting at and including begin up to but not including end.

In order for a type to support this basic operation is has to implement two functions, begin() and end() and they need to return an iterator. end() generally returns one-beyond-the-end and signifies the end of sequence. It’s undefined behaviour to dereference the end iterator (or any invalid iterator).

class Foo {
public:
    iterator begin() const;
    iterator end() const;
};

What type is iterator? That’s user defined and depends on the need. For instance if Foo operates on a simple array, the iterator type might just be an ordinary pointer:

template <typename T, size_t N>
class Foo {
public:
    using iterator = T*;

    iterator begin() const { return &data[0]; }
    // ...

private:
    int data[N];
};

For any container that’s more involved we need to implement a specific iterator type for that container. Let’s consider the example of having a binary searh tree for which we’d like to give clients an easy way of traversing it in sorted order.

Binary Search Tree Example

We don’t want to add any esoteric in-order traversal API because it would most likely be cumbersome to use. Instead we want clients to be able to write clear and intuitive code like this:

BinarySearchTree bst { 6, 9, 3, 12, 5, 69, 2, 55, 4 };

// Manually iterate tree in sorted order.
for (const auto& n : bst)
{
    std::cout << n << "\n";
}

// Use Standard algorithm.
std::for_each(bst.begin(), bst.end(),
    [](const auto& n) { std::cout << n << "\n"; });

Let’s have a look at how one might got about implementing such an iterator.

Note: Since the focus of this article is on iterators and not binary search trees, but I still want to provide a working example, the binary search tree we’ll use here will be very limited - it will only support the insert operation! And some implementation details will be left out to avoid distraction, most notably the destructor. For ease of reading it will also only support ints.

Let’s first define our simple binary search tree. Questions to ask when implementing any container type is wether they should support copying, assigning, and move² operations. These might make sense for a BST, but in order to not distract from the iterator implementation this implementation will not support any of those.

BinarySearchTree Implementation

// binary_search_tree.h

#include <initializer_list>

/** Implementation namespace. Not for client use. */
namespace impl {

struct BstNode {
    BstNode* left { nullptr };
    BstNode* right { nullptr };
    int value { 0 };
};

} // namespace impl

class BinarySearchTree {
public:
    BinarySearchTree() = default;
    explicit BinarySearchTree(std::initializer_list<int> il);

    ~BinarySearchTree() { // ... }

    // Copy and move operations are not supported.
    BinarySearchTree(const BinarySearchTree&) = delete;
    BinarySearchTree(BinarySearchTree&&) = delete;
    BinarySearchTree& operator=(const BinarySearchTree&) = delete;
    BinarySearchTree& operator=(BinarySearchTree&&) = delete;

    // Client API.
    void insert(const int value);

private:
    void insert(impl::BstNode** node, const int value);

    impl::BstNode* root { nullptr };
};

Pretty straightforward, no odd API functions to remember. The functions of interest can be implemented as:

// binary_search_tree.cc

#include "binary_search_tree.h"

BinarySearchTree::BinarySearchTree(std::initializer_list<int> il)
{
    for (const auto n : il)
        insert(n);
}

void BinarySearchTree::insert(const int value)
{
    insert(&root, value);
}

/** Private helper. */
void BinarySearchTree(impl::BstNode** node, const int value)
{
    if (*node == nullptr) {
        impl::BstNode* p = new impl::BstNode;
        p->value = value;
        *node = p;
        return;
    }

    if (value < (*node)->value)
        insert(&((*node)->left), value);
    else
        insert(&((*node)->right), value);
}

So now we have a functioning, albeit very limited, BST which we can initialize like the previous example:

BinarySearchTree bst { 6, 9, 3, 12, 5, 69, 2, 55, 4 };

Next we need to think about what kind of iterator semantics we want to support. Since our initial goal was to be able to iterate over the tree in sorted order a forward iterator would do. However, it might also make sense to be able to iterate backwards. That is, given the iterator it, *(--it) should give us the smallest number less than or equal to *it.

For that we need a bidirectional iterator. By choosing iterator type we also implicitly chose which feature set we need to implement. The main question we need to address is that of implementation efficiency, as in space/time performance. We’re at the initial implementation stage where correctness and reasonable performance is more important than optimal performance. Because of that, and to keep this example a bit simpler, we’re going with a functionally correct implementation, but not an optimal one. Any improvements made later can be shipped to clients without the need for them to change their code.

Normally I follow the latest published C++ standard, however in this instance I’m making an exception. C++17 will deprecate the std::iterator³, from which you normally derive your own iterator from, and since we can implement the recommended alternative with the current standard that’s what we’ll do.

BstIterator Implementation

Let’s first stub out the iterator support in the BinarySearchTree definition:

class BinarySearchTree {
public:
    using iterator = impl::BstIterator;

    // As before.

    iterator begin() const;
    iterator end() const;

private:
    // As before.
};

Next up, let’s look at the BstIterator definition, again only supporting ints. Also, any explicit error handling, such asserts etc, has been left out. Instead it’s undefined behaviour operating on an invalid iterator. This might not be good enough for production code, but it will suffice for this example.

#include <iterator>
#include <vector>

/** Implementation namespace. Not for client use. */
namespace impl {

class BstIterator {
public:
    // Iterator traits, previously from std::iterator.
    using value_type = int;
    using difference_type = std::ptrdiff_t;
    using pointer = int*;
    using reference = int&;
    using iterator_category = std::bidirectional_iterator_tag;

    // Default constructible.
    BstIterator() = default;
    explicit BstIterator(BstNode* node);

    // Dereferencable.
    reference operator*() const;

    // Pre- and post-incrementable.
    BstIterator& operator++();
    BstIterator operator++(int);

    // Pre- and post-decrementable.
    BstIterator& operator--();
    BstIterator operator--(int);

    // Equality / inequality.
    bool operator==(const BstIterator& rhs);
    bool operator!=(const BstIterator& rhs);

private:
    // Since we're not concerned with optimal performace, we'll simply
    // store the sorted sequence in a vector.
    using Nodes = std::vector<BstNode*>;

    // Helper function, will in-order traverse the tree and store the
    // nodes in sorted order.
    void store_sorted_nodes(BstNode* n);

    // Our sorted nodes.
    Nodes nodes;

    // The node this iterator references.
    Nodes::size_type current { 0 };
};

} // namespace impl

Nothing out of the ordinary there. And as the implemention will show, it’s all pretty straightforward. Let’s take a look.

namespace impl {

BstIterator::BstIterator(BstNode* node)
{
    // Use internal helper to traverse in-order and store the nodes.
    store_sorted_nodes(node);

    // End-of-tree delimiter. This is also what end() ends up being.
    nodes.push_back(nullptr);
}

// Dereferencable.
BstIterator::reference BstIterator::operator*() const
{
    return nodes[current]->value;
}

// Pre-incrementable: ++it.
BstIterator& BstIterator::operator++()
{
    ++current;
    return *this;
}

// Post-incrementable: it++.
BstIterator BstIterator::operator++(int)
{
    BstIterator tmp = *this;
    ++current;
    return tmp;
}

// Pre-decrementable: --it.
BstIterator& BstIterator::operator--()
{
    --current;
    return *this;
}

// Post-decrementable: it--.
BstIterator BstIterator::operator--(int)
{
    BstIterator tmp = *this;
    --current;
    return tmp;
}

// Equality: it == end().
bool BstIterator::operator==(const BstIterator& rhs)
{
    return nodes[current] == rhs.nodes[rhs.current];
}

// Inequality: it != end().
bool BstIterator::operator!=(const BstIterator& rhs)
{
    return !(*this == rhs);
}

// Private helper functions for storing the tree nodes in sorted
// order.
void BstIterator::sort_sorted_nodes(BstNode* node)
{
    if (node)
    {
        store_sorted_nodes(node->left);
        nodes.push_back(node);
        store_sorted_nodes(node->right);
    }
}

} // namespace impl

All we have to do now is implement the begin() and end() functions in BinarySearchTree:

BinarySearchTree::iterator BinarySearchTree::begin() const
{
    return impl::BstIterator(root);
}

BinarySearchTree::iterator BinarySearchTree::end() const
{
    return impl::BstIterator(nullptr);
}

We’re set! We can now to the following:

#include "binary_search_tree.h"

#include <iostream>
#include <algorithm>

int main()
{
    BinarySearchTree bst { 6, 9, 3, 12, 5, 69, 2, 55, 4 };

    using BstIter = BinarySearchTree::iterator;

    BstIter it = bst.begin();
    BstIter it_end = bst.end();

    // Test the equality / inequality.
    std::cout << "it == it: " << (it == it) << "\n";
    std::cout << "it != it: " << (it != it) << "\n";

    // Test dereferencing and iteration.
    std::cout << "First value: " << *it << "\n";
    std::cout << "Second value: " << *(++it) << "\n";
    std::cout << "Previous value: " << *(--it) << "\n";

    // Test range for loop.
    for (const auto& n : bst)
        std::cout << n << "\n";

    // Test standard library algorithm.
    std::for_each(bst.begin(), bst.end(),
        [](const auto& n) { std::cout << n << "\n"; });

    return 0;
}

Which will output:

it == it: 1
it != it: 0
First value: 2
Second value: 3
Previous value: 2
2
3
4
5
6
9
12
55
69
(same again for std::for_each)

const_iterator

The astute reader will have noticed that I didn’t include a const_iterator, which is required by the forward iterator semantics. I did so mainly for space reason (as article is already quite long) and to be able to show a simple example. Briefly, however, this can be solved by using template type deduction:

template <typename T>
class BstIterator {
public:
    using value_type = T;
    using pointer = T*;
    // ...
};

// A BST supporting only ints.
class BinarySearchTree {
public:
    using iterator = BstIerator<int>;
    using const_iterator = BstIterator<const int>;
    // ...
};

reverse_iterator

reverse_iterator is an iterator adapter that allows you to, not suprisingly, iterate over containers in reverse order, i.e. back to front. The container needs to support rbegin() and rend() and the required functionality for the iterator they return. Implementing a reverse iterator for the simple example presented in this article is trivial, and is left as an exercise for the reader.

Summary

Iterators decouple your container type from the algorithms operating on it. They let you interface with the C++ standard library in a intuitive and easy to understand way.

An iterator is an abstraction of a pointer
Use iterators for your container rather than custom esoteric APIs
Think about which iterator type that makes sense for your container
Get the implementation correct before optimal

This is a simplfied view, please consult the Standard for the full details ↩︎
See C++ Move Semantics for more information about move semantics ↩︎
More details at The C++ Standards Committee ↩︎