The Return of the Erased Iterator

In yesterday’s post, I wrote about a minimal (and minimally good) type-erased iterator implementation in C++. As I had mentioned, I had some ideas on how to improve the absolute state of that initial draft. I managed to do away with every single instance of actual polymorphism and simplified the machinery a whole bunch.

What?

void print(iterator<const int> begin, iterator<const int> end) {
    for (auto it = begin; it != end; ++it)
        std::cout << *it << std::endl;
}
int main () {
    std::set<int> si {/*...*/};
    print(si.begin(),   si.end());

    std::unordered_set<int> usi {/*...*/};
    print(usi.begin(), usi.end());

    std::vector<int> vi {/*...*/};
    print(vi.begin(),   vi.end());

    return 0;
}

Why??

It is true, this could be achieved by simply templating the print function over the type of iterator it accepts. This has the added benefit of ensuring the iterators are actually of the same type and won’t do something funky when compared, etc. This does has some problems as well, since you’d have to account for sentinel iterators for end, for instance. My implementation doesn’t account for any sentinels, actually. You can always wrap your iterator and its sentinel on a std::variant, for instance. But let’s not go there.

The bigger reason isn’t that templating the method doesn’t work or takes more effort, but rather that it is impossible to accomplish certain things by storing the underlying type of the given iterator. Namely, “templates propagate”: if you want to hold the templated print method, it becomes necessary to template the class that has it as a member function, etc. Moreover, certain things just cannot be done through templating.

By which I mean, if you’d like to take any sort of iterator but you further want to involve some polymorphism in the function, you run into trouble: polymorphic types cannot be templated (think about why: what is the size of a vtable of a class with a virtual templated method?) and thus you are either forced to erase the type of the iterator, or take a templated iterator and instead use some kind of callback on the reference type obtained by dereferencing the generic iterator. Since this method is not templated, it can become virtual and be overloaded.

The storage problem is significant, because if you want to combine several containers into a single iterator range (begin, end) and, say, return that. Well, then you either erase the iterator types or you need to template the class that returns it. As you can imagine, you easily get to combinatorial explosion levels of upwards template tainting.

How???

struct iter_capability_t {
    template <typename Iter> constexpr static iter_capability_t create() { /*...*/ }

    bool destroy = false;
    bool copy    = false;
    bool eq      = false;
    bool lt      = false;
    bool gt      = false;
    bool deref   = false;
    bool incr    = false;
    bool decr    = false;
    bool dist    = false;
    bool adv     = false;
};

template <iter_capability_t Strong, iter_capability_t Weak>
concept weaker_than = /*...*/

The create static method provides the iterator capabilities for a given iterator Iter (shocking, right?) A sufficiently general example of how that is done is

if constexpr (std::equality_comparable<Iter>)
    capability.eq = true;

Meaning we start with an empty capability and constexprly set its fields to true depending on traits, concepts etc.

This snipped contains both the mask for the iterator “virtual” interface (hand-rolled, like the Phoenicians used to do back in the good old days). Said iterator virtual interface is templated over the iterator reference type, as well as the different type—but it also uses the capability to turn off certain capabilities:

template <typename Ref, typename Diff, iter_capability_t capability>
class iter_vtable_t {
    bool (*_eq)     (void const*, void const*) = nullptr;
    bool (*_lt)     (void const*, void const*) = nullptr;
    bool (*_gt)     (void const*, void const*) = nullptr;
    Diff (*_dist)   (void const*, void const*) = nullptr;
    Ref& (*_deref)  (void const*)              = nullptr;
    void*(*_copy)   (void const*)              = nullptr;
    void (*_adv)    (void*, Diff)              = nullptr;
    void (*_incr)   (void*)                    = nullptr;
    void (*_decr)   (void*)                    = nullptr;
    void (*_destroy)(void*)                    = nullptr;
    
    public:
    void destroy(void) const requires(capability.destroy) {
        _destroy(p);
    }
    /*...*/
};

So we are holding the function pointers for the most general type of iterators (such as std::vector<int>::iterators) but, as you can see: those pointers are being hidden behind member functions that get enabled depending on capabilities. Importantly, in the elided /*...*/ block, we have the following:

template <...>
class iter_vtable_t {
    /*...*/
    template <typename>
    friend constexpr auto vtable_for();
	
    template <typename, typename, iter_capability_t>
    friend class iter_vtable_t;

    constexpr iter_vtable_t() = default;

    template <iter_capability_t Stronger> 
    requires (weaker_than<Stronger, capability>)
    iter_vtable_t(iter_vtable_t<Ref, Diff, Stronger> const& stronger) {
        /*...*/
    }
};

Let us unpack this: first, we will use the vtable_for to obtain the associated iterator for a given concrete Iterator and, since our members are private, we need to befriend the function for it to populate the pointer-to-functions values. Then, we friend ourselves… well, not ourselves really, rather: “us but different.” Meaning other vtables will be able to read our pointers.

Finally, we have a constructor requiring another instantiation of iter_vtable_t with another capability that is required to be at least as strong as our capability. This means we can project/truncate stronger iterators to weaker iterators. Finally, we get to the fun/user facing part: the iterator class is templated over the value type, the reference type, the difference type and the capability structure. Simplifying bit we get the following:

template <...> class iterator {
    vtable_t const* vt  = nullptr;
    void          * obj = nullptr;

    public:
    template <typename Iter>
    requires (
        not std::same_as<iterator, std::remove_cvref_t<Iter>>
        and weaker_than<iter_capability<Iter>, capability>
    )
    iterator (Iter&& iter) {
        // forces conversion
        static vtable_t vtable{vtable_for<Iter>()};
        vt  = &vtable;
        // because iterators constructed from different types of 
        // concrete iterators are never "the same", it's okay if 
        // the pointers actually differ. 
        obj =  vt->copy(new Iter(std::forward<Iter>(iter)));	
    }

    /*...*/

    iterator& operator++() 
    requires (capability.incr) {
        vt->incr(obj);
        return *this;
    }

    iterator operator++(int)
    requires (capability.incr && capability.copy) {
        iterator copy;
        vt->incr(obj);
        return copy;
    } 
    
    /*...*/
};

As you can observe, we use our vt virtual table to dispatch the call to each bit of the iterator interface to a some combination of pointer-to-function calls stored in it. Our obj is an opaque pointer such that the class invariant is that our vt’s functions know how to operate on that object.

This is achieved in the vtable_for<Iter>() function we touched on earlier. There, we essentially query the capacity = iter_capacity_t::create<Iter>() to determine if we set that particular pointer, and we store it by the means of a non-capturing lambda (convertible to function pointer) that encapsulates the pointer casting, trusting that the invariant is observed:

if constexpr (capability.incr)
    table._incr = +[](void* ptr) {
        ++*static_cast<Iter*>(ptr);
    };

Aside from some friend operators the machinery is basically just this. Since the iterator constructor is templated over concrete iterators types, we get to automatically convert, if possible, the natural vtable_t of a given iterator to the weaker vtable_t we require.