Iterators++, Part 2

Disclaimer: This is a long, boring post about minutia. For serious library wonks only.

This is the third in a series about proxy iterators, the limitations of the existing STL iterator concept hierarchy, and what could be done about it. In the first post I explained what proxy iterators are (an iterator like vector<bool>‘s that, when dereferenced, returns a proxy object rather than a real reference) and three specific difficulties they cause in today’s STL:

  1. What, if anything, can we say in general about the relationship between an iterator’s value type and its reference type?
  2. How do we constrain higher-order algorithms like for_each and find_if that take functions that operate on a sequence’s elements?
  3. How do we implement algorithms that must swap and move elements around, like sort and reverse?

In the second post, I zoomed in on the problem (3) and showed how the existing std::iter_swap API could be pressed into service, along with a new API that I propose: std::iter_move. Together, these APIs give an iterator a channel through which to communicate to the algorithms how its elements should be swapped and moved. With the addition of the iter_move API, iterators pick up a new associated type: rvalue_reference, which can live in std::iterator_traits alongside the existing value_type and reference associated types.

In this post, I’ll dig into the first problem: how we define in code what an iterator is.

Values and References

As in the first two articles, I’ll use the zip view to motivate the discussion, because it’s easy to grok and yet totally bedeviling for the STL algorithms. Recall that zip lazily adapts two sequences by making them look like one sequence of pairs, as demonstrated below:

std::vector<int> x{1,2,3,4};
std::vector<int> y{9,8,7,6};

using namespace ranges;
auto zipped = view::zip(x, y);

assert(*zipped.begin() == std::make_pair(1,9));
assert(&(*zipped.begin()).first == &x[0]);

As the two assertions above show, dereferencing a zip iterator yields a pair, and that the pair is actually a pair of references, pointing into the underlying sequences. The zip range above has the following associated types:

Associated type… … for the zip view
value_type pair<int, int>
reference pair<int &, int &>
rvalue_reference pair<int &&, int &&>

With Concepts coming to C++, we’re going to need to say in code what an iterator is. The Palo Alto TR, published in 2012, takes a stab at it: an InputIterator is Readable and Incrementable, where Readable is defined as follows:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        { *i } -> const ValueType<I> &;
    };

This says that a Readable type has an associated ValueType. It also says that *i is a valid expression, and that the result of *i must be convertible to const ValueType<I> &. This is fine when *i returns something simple like a real reference. But when it returns a proxy reference, like the zip view does, it causes problems.

Substituting a zip iterator into the requires clause above results in something like this:

const pair<int,int>& x = *i;

This tries to initialize x with a pair<int&, int&>. This actually works in a sense; the temporary pair<int &, int &> object is implicitly converted into a temporary pair<int, int> by copying the underlying integers, and that new pair is bound to the const & because temporaries can bind to const references.

But copying values is not what we want or expect. If instead of ints, we had pairs of some move-only type like unique_ptr, this wouldn’t have worked at all.

So the Readable concept needs to be tweaked to handle proxy references. What can we do?

One simple way to make the zip iterator model the Readable concept is to simply remove the requirement that *i be convertible to const ValueType<I>&. This is unsatisfying. Surely there is something we can say about the relationship between an iterator’s reference type and its value type. I think there is, and there’s a hint in the way the Palo Alto TR defines the EqualityComparable constraint.

Common Type Constraints

What do you think about code like this?

vector<string> strs{"three", "blind", "mice"};
auto it = find(strs.begin(), strs.end(), "mice");

Seems reasonable, right? This searches a range of string‘s for a char const*. This should this work, even though it’s looking for an orange in a bucket of apples. The orange is sufficiently apple-like, and because we know how to compare apples and oranges; i.e., there is an operator== that compares strings with char const*. But what does “sufficiently apple-like” mean? If we are ever to constrain the find algorithm with Concepts, we need to be able to say in code what “apple-like” means for any apple and any orange.

The Palo Alto TR doesn’t think that the mere existence of an operator== is enough. Instead, it defines the cross-type EqualityComparable concept as follows:

template< typename T1, typename T2 >
concept bool EqualityComparable =
    EqualityComparable<T1> &&
    EqualityComparable<T2> &&
    Common<T1, T2> &&
    EqualityComparable< std::common_type_t<T1, T2> > &&
    requires(T1 a, T2 b) {
        { a == b } -> bool;
        { b == a } -> bool;
        { a != b } -> bool;
        { b != a } -> bool;
        /* axioms:
            using C = std::common_type_t<T1, T2>;
            a == b <=> C{a} == C{b};
            a != b <=> C{a} != C{b};
            b == a <=> C{b} == C{a};
            b != a <=> C{b} != C{a};
        */
    };

In words, what this says is for two different types to be EqualityComparable, they each individually must be EqualityComparable (i.e., with themselves), they must be comparable with each other, and (the key bit) they must share a common type which is also EqualityComparable, with identical semantics.

The question then becomes: do std::string and char const * share a common type, to which they can both be converted, and which compares with the same semantics? In this case, the answer is trivial: std::string is the common type.

Aside: why does the Palo Alto TR place this extra CommonType requirement on the argument to find when surely that will break some code that works and is “correct” today? It’s an interesting question. The justification is mathematical and somewhat philosophical: when you compare things for equality, you are asking if they have the same value. Just because someone provides an operator== to compare, say, an Employee with a SocialSecurityNumber doesn’t make an employee a social security number, or vice versa. If we want to be able to reason mathematically about our code (and we do), we have to be able to substitute like for like. Being able to apply equational reasoning to our programs is a boon, but we have to play by its rules.

Readable and Common

You may be wondering what any of this have to do with the Readable concept. Let’s look again at the concept as the Palo Alto TR defines it:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        { *i } -> const ValueType<I> &;
    };

To my mind, what this is trying to say is there there is some substitutability, some mathematical equivalence, between an iterator’s reference type and its value type. EqualityComparable uses Common to enforce that substitutability. What if we tried to fix Readable in a similar way?

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        requires Common< ValueType<I>, decltype(*i) >;
    };

Here we’re saying that for Readable types, the reference type and the value type must share a common type. The common type is computed using something like std::common_type_t, which basically uses the ternary conditional operator (?:). (I say “something like” since std::common_type_t isn’t actually up to the task. See lwg2408 and lwg2465.)

Sadly, this doesn’t quite solve the problem. If you try to do common_type_t<unique_ptr<int>, unique_ptr<int>&> you’ll see why. It doesn’t work, despite the fact that the answer seems obvious. The trouble is that common_type always strips top-level const and reference qualifiers before testing for the common type with the conditional operator. For move-only types, that causes the conditional operator to barf.

I’ve always found it a bit odd that common_type decays its arguments before testing them. Sometimes that’s what you want, but sometimes (like here) its not. Instead, what we need is a different type trait that test for the common type, but preserves reference and cv qualifications. I call it common_reference. It’s a bit of a misnomer though, since it doesn’t always return a reference type, although it might.

The common reference of two types is the minimally qualified type to which objects of both types can bind. common_reference will try to return a reference type if it can, but fall back to a value type if it must. Here’s some examples to give you a flavor:

Common reference… … result
common_reference_t<int &, int const &> int const &
common_reference_t<int &&, int &&> int &&
common_reference_t<int &&, int &> int const &
common_reference_t<int &, int> int

With a common_reference type trait, we could define a CommonReference concept and specify Readable in terms of it, as follows:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        requires CommonReference<
            ValueType<I> &,
            decltype(*i) && >;
    };

The above concept requires that there is some common reference type to which both *i and a mutable object of the iterator’s value type can bind.

This, I think, is sufficiently general to type check all the iterators that are valid today, as well as iterators that return proxy references (though it takes some work to see that). We can further generalize this to accommodate the iter_move API I described in my previous post:

template< typename I >
concept bool Readable =
    Semiregular<I> &&
    requires(I i) {
        typename ValueType<I>;
        requires CommonReference<
            ValueType<I> &,
            decltype(*i) && >;          // (1)
        requires CommonReference<
            decltype(iter_move(i)) &&,
            decltype(*i) && >;          // (2)
        requires CommonReference<
            ValueType<I> const &,
            decltype(iter_move(i)) &&>; // (3)
    };

OK, let’s see how this works in practice.

Iterators and CommonReference

First, let’s take the easy case of an iterator that returns a real reference like int&. The requirements are that its value type, reference type, and rvalue reference type satisfy the three CommonReference constraints above. (1) requires a common reference between int& and int&. (2), between int&& and int&, and (3) between int const& and int&&. These are all demonstrably true, so this iterator is Readable.

But what about the zip iterator? Things here are much trickier.

The three common reference constraints for the zip iterator amount to this:

Common reference… … result
common_reference_t<
pair<int,int> &,
pair<int&,int&> &&>
???
common_reference_t<
pair<int&&,int&&> &&,
pair<int&,int&> &&>
???
common_reference_t<
pair<int,int> const &,
pair<int&&,int&&> &&>
???

Yikes. How is the common_reference trait supposed to evaluate this? The ternary conditional operator is just not up to the task.

OK, let’s first imagine what we would like the answers to be. Taking the last one first, consider the following code:

void foo( pair< X, Y > p );

pair<int,int> const & a = /*...*/;
pair<int &&,int &&> b {/*...*/};

foo( a );
foo( move(b) );

If there are types that we can pick for X and Y that make this compile, then we can make pair<X,Y> the “common reference” for pair<int&&,int&&>&& and pair<int,int> const &. Indeed there are: X and Y should both be int const &.

In fact, for each of the CommonReference constraints, we could make the answer pair<int const&,int const&> and be safe. So in principle, our zip iterator can model the Readable concept. W00t.

But look again at this one:

common_reference_t<pair<int,int> &, pair<int&,int&> &&>

If this coughs up pair<int const&,int const&> then we’ve lost something in the translation: the ability to mutate the elements of the pair. In an ideal world, the answer would be pair<int&,int&> because a conversion from both pair<int,int>& and pair<int&,int&>&& would be safe and meets the “minimally qualified” spirit of the common_reference trait. But this code doesn’t compile:

void foo( pair< int&,int& > p );

pair<int,int> a;
pair<int&,int&> b {/*...*/};

foo( a );       // ERROR here
foo( move(b) );

Unfortunately, pair doesn’t provide this conversion, even though it would be safe in theory. Is that a defect? Perhaps. But it’s something we need to work with.

Long story short, the solution I went with for range-v3 is to define my own pair-like type with the needed conversions. I call it common_pair and it inherits from std::pair so that things behave as you would expect. With common_pair and a few crafty specializations of common_reference, the Readable constraints are satisfied for the zip iterator as follows:

Common reference… … result
common_reference_t<
pair<int,int> &,
common_pair<int&,int&> &&>
common_pair<int&,int&>
common_reference_t<
common_pair<int&&,int&&> &&,
common_pair<int&,int&> &&>
common_pair<int const&,int const&>
common_reference_t<
pair<int,int> const &,
common_pair<int&&,int&&> &&>
common_pair<int const&,int const&>

Computing these types is not as tricky as it may appear at first. For types like pair<int,int>& and common_pair<int&,int&>&&, it goes like this:

  1. Distribute any top-level ref and cv qualifiers to the members of the pair. pair<int,int>& becomes pair<int&,int&>, and common_pair<int&,int&>&& becomes common_pair<int&,int&>.
  2. Compute the element-wise common reference, and bundle the result into a new common_pair, resulting in common_pair<int&,int&>.

Generalizing

Our zip iterator, with enough ugly hackery, can model our re-specified Readable concept. That’s good, but what about other proxy reference types, like vector<bool>‘s? If vector<bool>‘s reference type is bool_ref, then we would need to specialize common_reference such that the Readable constraints are satisfied. This will necessarily involve defining a type such that it can be initialized with either a bool_ref or with a bool&. That would be a decidedly weird type, but it’s not impossible. (Imagine a variant<bool&,bool_ref> if you’re having trouble visualizing it.)

Getting vector<bool>‘s iterators to fit the mold is an ugly exercise in hackery, and actually using its common reference (the variant type) would incur a performance hit for every read and write. But the STL doesn’t actually need to use it. It just needs to exist.

What is the point of jumping through these hoops to implement an inefficient type that in all likelihood will never actually be used? This is going to be unsatisfying for many, but the answer is for the sake of mathematical rigour. There must be some substitutability relationship between an iterator’s reference type and its value type that is enforceable. Requiring that they share a common reference is the best I’ve come up with so far. And as it turns out, this “useless” type actually does have some uses, as we’ll see in the next installment.

Summary

So here we are. There is a way to define the Readable concept — and hence the InputIterator concept — in a way that is general enough to permit proxy iterators while also saying something meaningful and useful about an iterator’s associated types. Actually defining a proxy iterator such that it models this concept is no small feat and requires extensive amounts of hack work. BUT IT’S POSSIBLE.

One could even imagine defining a Universal Proxy Reference type that takes a getter and setter function and does all the hoop jumping to satisfy the Iterator concepts — one proxy reference to rule them all, if you will. That’s left as an exercise for the reader.

If you made it this far, congratulations. You could be forgiven for feeling a little let down; this solution is far from ideal. Perhaps it’s just awful enough to spur a real discussion about how we could change the language to improve the situation.

In the next installment, I’ll describe the final piece of the puzzle: how do we write the algorithm constraints such that they permit proxy iterators? Stay tuned.

As always, you can find all code described here in my range-v3 repo on github.

"\e"

62 Replies to “Iterators++, Part 2”

  1. I know that you are working with what you have, and are doing an amazing job at it, BUT…

    But the STL doesn’t actually need to use it. It just needs to exist.

    I don’t think we should be even joking about this. Having to invent a type that will never be used to get your program to type check is just wrong at every level.

    I’m not saying that such a type doesn’t need to exist for the sake of mathematical rigor. What I’m saying is that there should be a way to efficiently implement such a type and actually use it. A variant<bool&,bool_ref> is just not good enough. I don’t think one can get much better than a variant with a library only solution tho.

    • Don’t be so quick to dismiss this idea. I had to define a similarly “useless” type to get iterators and sentinels to model the EqualityComparable concept. It turned out that useless type was actually quite useful: view::bounded uses it to turn an iterator/sentinel pair into an iterator/iterator pair for use with “legacy” APIs (i.e. today’s STL and the range-based for loop).

      And I have found uses for an iterator’s common reference type. If you want to pass a monomorphic function object (like a C++11 lambda) to higher order STL algorithms, the lambda should take its argument by the iterator’s common reference type. I’ll get into that in the next post.

      I agree that variant<bool&,bool_ref> is a bad hack. This is what the library solution looks like, IMO. If there is a language solution that is better, I’m all for it.

      • the lambda should take its argument by the iterator’s common reference type

        From a point of view of concept checking I agree.

        From an user perspective when writing a lambda I see three general use cases.

        In the first one, the user wants to avoid copies and doesn’t want to mutate the arguments from within the lambda. He wants a const reference, here auto&& works just fine.

        Then, we have an user that might want to modify the arguments from within the lambda. He wants an auto&. This can be made to work as you have shown.

        In these two cases the important thing is, that if the code compiles, it satisfies the user expectations, independently of the user getting a real reference or a proxy reference.

        However… if the user wants to make a value copy of the reference type in the lambda arguments, and then mutate the copy from within the lambda to compute some result, “auto” won’t fulfill the user expectations, since for proxy references the type of the argument will be a reference type, and mutating the “copy” is actually mutating the original.

        That is, the code will compile, and it won’t do what the user expects it to do. This is unfortunate, and I don’t think this can be repaired to do the right thing.

        But maybe a “sufficiently smart” concept checking machinery might be able to probe the function to see if you are taking arguments by “auto” , and force you to take them either by value or by reference, instead of by “auto”.

        I thought about disabling most of the functionality of proxy reference types for the non-const lvalue reference case (&) by just deleting those functions using ref qualifiers. Now I think it is just a dumb thing to do since reference_proxy& won’t work either, but there might be another way to make the reference proxies useless when taken by auto.

        • In the first one, the user wants to avoid copies and doesn’t want to mutate the arguments from within the lambda. He wants a const reference, here auto&& works just fine.

          Notice above when saying where the common reference type has uses, I say specifically a C++11 lambda. Not everybody has a C++14 compiler (or works in a group where the C++14 switch has been thrown). Otherwise, I agree with you … a polymorphic function object doesn’t have this problem.

          However… if the user wants to make a value copy of the reference type in the lambda arguments, and then mutate the copy from within the lambda to compute some result, “auto” won’t fulfill the user expectations, since for proxy references the type of the argument will be a reference type, and mutating the “copy” is actually mutating the original.

          The solution in this case is to assign to a local object of the iterator’s value_type. Here, a language solution would probably do better.

  2. I have a question – does the common reference type is really necessary? I mean cannot we require that the:
    1) ValueType&& to RvalueRerefenceType, ReferenceType and View type;
    2) ValueType& is implicitly convertible ReferenceType and ViewType
    3) ValueType const& is implicitly convertible to ViewType
    This is true for your common_pair and pair as value type desing for the zip. Also the bool& may be converted to bit reference (address to bool and mask that takes the last bit).

    As both types are in the control of writier of the iterator, so there is no problem with supporting such conversion. They are even required in your case. I do not think that introduction of third type is necessary.

    • Copy paste typo.
      1) ValueType&& to RvalueRerefenceType, ReferenceType and View type;
      Should be:
      1) ValueType&& to RvalueRerefenceType and ViewType;

      • I don’t know what ViewType is. But no, we can’t require that the ValueType&& is convertible to the rvalue reference type, or that ValueType& is convertible to the reference type. Consider the vector<bool> proxy ref, which “references” a bit in a packed integer. It isn’t referring to a real bool. Requiring that you can initialize it with a real bool means that the proxy ref should be capable of referring to a bit in a packed integer or a real bool. That means it needs to be a variant. vector<bool> proxy ref gets used a lot. Variants are slow. Congrats, you’ve just made vector<bool> slow. 🙁

        When designing the concepts, care must be taken to make sure you aren’t over-constraining or pessimizing useful models.

        • ViewType is another name to ConstReferenceType, I used it to avoid putting the reference there – the type is intended to be able to see a value, not to take addess.

          Also for a vector you are sticking to variant design, while other compatilbe desingn exists. Lets assume that vector uses int as the packed integer. The bit_reference is effectivel an address to an integer and the mask that is used to extract a specific bit. If we declared ValueType for a iterator of bitvector (I hope that vector will be separate class in Concept STL) as int, then we can intialized bit_reference from int&, just take and address and mask that takes last bit. This may even work with bool, as it is integral type. No variants are ncessary,

          My concern is that your design is not only puttin a constrain on the library implementor (which is fine) but also on library user. Let assume that I want to declare a compator for sort and provide argument types. Now I must use something like: common_reference_type_t<ValueType const &, ConstReferenceType&&>
          Which is fare for being intuituve.

          I think that to cover your example and most of proxy iterators imposing requirements that I have posted would be viable. Then user would need to use ReferenceType for mutable access or ViewType (ConstReferenceType) for inmutable.

          • If we declared ValueType for a iterator of bitvector (I hope that vector will be separate class in Concept STL) as int, then we can intialized bit_reference from int&

            We’re not talking about bitvector. We’re talking about vector<bool>. The ValueType of vector<bool> is bool, regardless of how inconvenient that is for your preferred solution.

            You would like to see vector<bool> go away. So do I, and it probably will. But when vector<bool> goes away, it won’t take the proxy reference problem with it. Are we to pick a half solution that isn’t general enough to permit a true proxied container?

          • I am trying to point out that vector (or bitvector that has the same semantics but it not specialization) can be modeled in a way that would be compatible with my simpler proxy model that required of four associated type with every iterator:
            – ValueType – to keep a copy of element
            – RvalueReferenceType – to move elements
            – LvalueRerefenceType – to perform mutable access
            – ViewType – to access value of the element, does not allow taking address
            This makes a live a lot easier from the prespective of the end user, that would need to write a predicate or comparator – he would just accept parameter of type ViewType. Or if he would like to mutate element then choose a RvalueRererenceType.

            And as the algorithms are not supporting proxy iterators at all, we can impose some requirments on how such iterators must be written to coperate. I mean that commitee is aiming to create a new “compatible modulo bugs” STL, and we can tweak a compressed bit vector structure a bit to match created concepts. I do not think that may proposed design limit the set of supported proxy iterator, at least it does not prevent us for creation of compressed bit vectors containers (like vector).

            I would like to keep the desing as simple as possible, because otherwise we may end up with something that is more similiar to C++11 concepts than simplified concept-lite.

        • Also the representatives of ReferenceType, LValueReferenceType and View type exits in your desing, they just need to be spelled differently using a common_reference_t.

          And if we reason a bit, why the possiblity to declare a third distinct type was necessary in your situations. For the zip_iterator you need to do a workaround for the missing construction of the pair, but if you would use a common_pair as a ValueType/Reference from begining, not trait would be needed.

          For the vector you needed it to create a reference type. But I think that this design failed – exposing a variat to the user that would need to handle it and make the predicate that use it slow. But I still think that the problem lies in the desing of packed bit container (which effectively a vector).

          But we can desing such bit container in other way, so no variants or separate class that can be intialized both from ValueType& and ReferenceType are needed. Let assume that UT is the type of integer used to store a bit.
          – ValueType definied as UT or wrapper that contains UT but allow only operations permited for bool (the UT can be zero or 1).
          – ReferenceType is wrapper that contains address to the UT and mask that is used to extract specific bit. This type can be constructed from our ValueType by taking address and mask that extract lowest bit.

          • You’ve already said all of this. And I’ve already responded. You’re essentially saying that vector<bool> and any container like it cannot efficiently satisfy the concepts. Rather, we should give up on vector<bool> and design a different container. And likewise for any other proxy container for which ValueType& is not efficiently convertible to the ReferenceType. And I’m saying that’s a compromise we don’t have to make. I know that your hypothetical bitvector doesn’t suffer from vector<bool>‘s problems. I don’t care.

          • But you are not resolving the burden, you are pushing it to be resolved by the end-user, the one that will use your library. Your have made a compromise, just compromised usability to support designs that can be avoided. I am striving to restrict how proxy could be implemented to preserve usability.

            The situation on the user perspective when they must to decide if they want to provide a heterogeneus overloads for ValueType& and ReferenceType or use slow variant or other not effective class.

          • My concern is that there are other types out there like vector<bool> that have no ReferenceType that is efficient while meeting the concept requirements. In an effort to make things convenient for the user, you are pessimizing real code with no recourse. With my approach, users have a way to get native performance (provide heterogeneous operations) or convenience (use the common reference), whichever meets their needs — and while fully supporting a type like vector<bool> as-is.

          • I think that the correlated nature of ValueType and ReferenceType would imply that there is allways design that have effective conversion from ValueType& to ReferenceType. ValueType must effectively a copy of things pointed by Reference, so having ReferenceType being able to point out to ValueType& should be doable.

            Also, I think that the vector may be bad example – the alternative designs that provide equivalent functionality are abvious. We should get rid of it, and provide an alternative, because your proposed solution is fair for being usab;e. Altought I accept the argument that there may be other types that may not have such alternative design.

            Altought we should consider the impact of supporting such proxy iterator that uses variant as common refererence. The user must provide an overload that accepts variant – such overload is required by yout concept check that checks if Function is callable with variant. And to have maximal effecitvency provide a heteregenous overload. I mean that if you would have requiremnt that Function is callable with every combination with types, then lambda accepting variant would fullfill it, but if you are checking if function is callable with variant, then providing hetergeneous overloads is not enought – you must have function accepting variant.

            Maybe we should keep it as simple as possible until the example of proxy container that cannot be redesigned to support simple apporach (ReferenceType inuitialized effectively from ValueType&) will emerge. The cost put on the user that want to write their own algorithms or lambda is to high for me.

            Also the support for such container may cause that more types designed like vector will emerge in situations when they can be designed to be easier to use.

          • I think that the correlated nature of ValueType and ReferenceType would imply that there is allways design that have effective conversion from ValueType& to ReferenceType.

            What makes you so confident of this, considering that we already have a counter-example: vector<bool>? Yes, we should get rid of it. What other types must we also get rid of because of a poorly (IMO) chosen concept? I’ve explained myself repeatedly. I understand your point, and you understand mine. We’re not convincing each other, and nothing new is being said. It’s time to drop this.

          • I do not tread vector of bool as conunterexample, as I showed (and you agreed) that the container providing same functionality can be designed in a way that is compatible with simpler design that is less intrusive for the user.

            When desiging proxy support it is important to support all possible proxy container “concept” (like compresed container of bits in our example), not every possible implementation (vector specialization of bools in our case). It’s like you would like to support container that does not expose iterator to its elements.

            In ligth of above I do not see any convicing example of container “concept” that is requiring third type for references.

          • And vector specialization for bools is also example of thing that I would like avoid. It is implementation of compressed bit container that drops on end user burden of dealing with three possible reference types (bool_ref, bool&, variant<bool_ref, bool&>), while more user friendly implementation of such container exists.

            We should make the things as simple as possible for the end user (that just invokes algorithms) and put the cost of library writiers (as their are more experienced).

  3. “Just because someone provides an operator== to compare, say, an Employee with a SocialSecurityNumber doesn’t make an employee a social security number, or vice versa. If we want to be able to reason mathematically about our code (and we do), we have to be able to substitute like for like.”

    To program to being matematically sound, we do not need to common type – we need have bijection function to a comain domain, within this assumption there is no loss of mathematical soudness. The common type is to strick representation.

    Lets imagine following situation – I have a real life entity number 123, I can represent it in the artificial representation like signed, unsigned integer, floating number. This types happen to have a common type, but even if they do not have one, there is nothing unsound in the comparing them.

    Returning to the example of Employee, SocialSecurityNumber. Lets assume that my company has a system to manage employees. They represent me as the preson (real life entity) using artificial Employee class. But when they search for given employee using a SocialSecurityNumber to find a given one in the list is sound. Both this object represents me in the preson as integers and floating numbers repesent numer 123. The fact that there is no common type for this two types is only matter of design and avoiding errors.

    • You argued for bijection on an earlier post and I addressed your concern there. There must be a deterministic procedure that can determine whether an operation like equality makes sense for arguments of different types. Positing the hypothetical existence of a bijection that makes the operation mathematically sound doesn’t cut the mustard. The common type requirement is the closest we can get to a deterministic procedure.

      • Neither the fact that operator< imposes relation that is total order can be actually checked by the compiler. And that is more important for mathematically checking your programs than common_type.

        Do you really think that the fact that I will add conversion from Employee to SocialSecurityNumber – which may be viable, just a desing decision – if I have employee I can pass it where only SocialSecurityNumber, will magically change my program to be more sound?

          • I will regret asking, but why this design is broken for application that just manages emplyees? Or just ignore it.

            I will give you another example. Lets assume that I have developed something like BigInteger, as it is necessary it mine program. Should it be comparable with double?

            I think most of people will agree, because they cannot imagine the Number concept/class/entity that is represented by this two implementation. But do I really need to implmenen such BigFloat to just be able to do above comparision?

          • Lets assume that I have developed something like BigInteger, as it is necessary it mine program. Should it be comparable with double?

            IMO, yes. I assume that you can initialize a BigInteger with an int, or a long long or something. An IMO this conversion should be implicit because they are all Numbers conceptually, and widening to an infinite precision representation is not lossy. And in C++, the float-to-integer conversion is understood to be truncation, so you should be able to implicitly convert from a double to a BitInteger. In other words, the common type between a double and a BigInteger is a BigInteger. The comparison should succeed.

            Why is the double->BigInteger implicit conversion OK in my opinion, but not for Employee->SocialSecurityNumber? Because in the former case, they are both models of the same underlying thing: Number. In the later case, they are not. That conversion feels more like a hack to fool the type system.

            Your point about compilers not being able to check for total ordering is 100% correct. That is a semantic constraint. Concepts are both syntactic and semantic in nature, but the compiler only ever has enough information to test the syntactic constraints. The existence of uncheckable semantic constraints doesn’t justify throwing up our hands when trying to enforce syntactic constraints, or in the few cases where we can infer something about semantics from syntax.

          • “Why is the double->BigInteger implicit conversion OK in my opinion, but not for Employee->SocialSecurityNumber? Because in the former case, they are both models of the same underlying thing: Number. In the later case, they are not. That conversion feels more like a hack to fool the type system.”

            My point is in same application they both are models of the same underlying thing: real preson that works in company. If we have a program that only do a payrolls then both may SSN and the more complex Emplyoee class will points to me. And they both will fail to represent me completly (my after-work hobbies for example). I mean that both of this classes are only sets of bits, not real life entities, we (as programers) create an artifical models on realities not reality itself.

          • The SocialSecurityNumber isn’t a model of the real entity. It’s merely a reference.

            Besides, the EqualityComparable is specified in order to be transitive (a == b and b == c means a == c).

            If we accept your argument that SSN and Employee are EqualityComparable, then this code shouldn’t fail and yet it does :

            auto a = Employee(SSN(1), "SomeLady", "MaidenName");
            auto b = SSN(1);
            auto c = Employee(SSN(1), "SomeLady", "LastName");
            assert(a == b);
            assert(b == c);
            assert(a == c); // assertion failed
            
    • To program to being matematically sound, we do not need to common type – we need have bijection function to a comain domain, within this assumption there is no loss of mathematical soudness. The common type is to strick representation.

      I would argue that we actually need much more than a simple bijection. A bijection is only a set-theoretic concept that counts the number of elements in a set. My understanding of your position is that you seem to ignore the shape of information in favor of its quantity. With this kind of reasoning, you end up comparing Employees with rational numbers because there are bijections

      Employees <-> SocialSecurityNumber <-> Integers <-> Rationals

      Of course, this does not make sense because the shape of what’s an Employee was lost in the process. Here’s my take on cross-type operations:

      First, if you want to talk about “mathematical soundness”, you would have to rigorously define what is an Employee, using for example laws that should be satisfied for all Employees. What I mean is that you would have to define an Employee concept with the same mathematical precision used to define mathematical structures (Monoids, Groups, Topological spaces, etc..).

      Of course, you won’t be able to do that all the time, because that would require rethinking the whole world. Still, let’s say that you approach this level of rigor by stating a couple of properties that make sense for models of this Employee concept. Perhaps the laws make it so that there is only a single model of this concept (e.g. an employee type), but there could be many as well depending on the generality of your concept. The next step would then be to define what is a structure-preserving transformation between Employees, that is a transformation that preserves the laws that you stated.

      Thirdly, you would define an embedding to be an injective structure-preserving transformation between two models A and B of the Employee concept. Since the mapping is injective, there is no loss of quantity of information. Since the mapping is structure-preserving, it preserves the relationships that A has with its environment when you turn it into a B; hence there is no loss of the information’s shape.

      Finally, you could compare two types that both model the Employee concept whenever they share a common embedding, i.e. whenever they can both be embedded into a common type C which is also a model of the Employee concept. I was personally seduced by the way category theory resolves these issues. If you are inclined to and have not done so already, I would recommend this excellent article by Barry Mazur.

  4. One of the things you argue for in your ranges proposal is that <algorithm> functions should allow the first and last iterators to have different types so that one could do something like…

    auto p = std::find(“str”, sentinel(‘\0’), ‘t’);

    Thus, the sentinel must be comparable with str, or char*, but does that mean that the common type between any iterator and a sentinel iterator is the iterator? If so, how can we prove the soundness of p == sentinel(‘\0’)?

    Perhaps with ranges, this would be a moot point because a c_string_range would not test for equality with the sentinel, but rather if the range is empty, but I’m still curious to hear your thoughts on this.

    • does that mean that the common type between any iterator and a sentinel iterator is the iterator?

      Perceptive question. No. I address this issue in my proposal, but if you didn’t read the appendices you missed it. It’s in Appendix 2: Sentinels, Iterators, and the Cross-Type EqualityComparable Concept. Basically, the common type between an iterator and a sentinel is a different iterator that is effectively a variant<iterator,sentinel>. Such an iterator is at best a Forward iterator.

      This turns out to be a very useful iterator. It is used by the view::bounded adaptor to turn an Iterable into a BoundedIterable so that you can iterate over it using C++11’s range-based for loop.

  5. Also please do not understand my comments wrongly – I really like your range proposal and appreciate amount of the work that you have put into it, to improve live of all C++ community.

    My “nitpick” post on this blog are effects on the time I spend thinking about your ideas, and this is a my (maybe convoluted) way of saying that I see great value in your work. It takes several minutes to says “great work”, but hours are needed to try to really understand all its consequence. And I am finding ranges-v3 being worth putting time upfront.

    • My “nitpick” post on this blog are effects on the time I spend thinking about your ideas, and this is a my (maybe convoluted) way of saying that I see great value in your work.

      I greatly appreciate both your feedback and your thanks. I should point out that very, very few of the ideas in my proposal are truly mine. If you want to get to the bottom of things like the EqualityComparable concept, you should read the Palo Alto TR (of which I am not an author) and Stepanov’s “Elements of Programming.”

      • I am accustomed with both and from that I know that the Palo Alto TR does not put in stone the requirments on existence of CommonType in the code for heteregeneous comparisions – just check the Apeendix D, the common_type requirement is moved into Axiom. I agree that conceptually such must exists and for my BigInteger and Float this conceptuall type are real numbers (methematical). I do not need to have its implementation, not want to allow conversion that loss information (Float to BigInteger).

        For the “Elements of Programming” – this book discuss most of the algorithms in terms of Weak Total Order and comparing Person class instances by their first name lexographically constitutes Weak Total Order (name is key_function). I think that we both agree that comparing only first name is bad idea for operator==.

  6. I think that you have accidentally make a Readable concept as being to strick. I am reffering to this two parts:

    requires CommonReference<
        ValueType<I> &,
        decltype(*i) && >          // (1)
    requires CommonReference<
        decltype(iter_move(i)) &&,
        decltype(*i) && >          // (2)
    

    They are effectively imposing that the every iterator provides a mutable access for the elements (ValueType is not const qualified for const_iterator – we do not need to keep copy const). That excludes every const iterator from librarary from being Readable.

    • For an “ordinary” iterator like int*, the first requirement is checking whether there is a common reference between int & and int const &. There obviously is: int const &. The second tests whether there is a common reference between int const && and int const &. Again, the answer is int const &.

      The same is true of proxy iterators. The common reference logic takes account of cv qualifiers to make it work.

      • Oh, I understand. The && after decltype(*it) is necessary for iterator that return something by-value from *it.

        Also I think that your example conisder int const* rather than int*, as “ordinary” const iterator.

  7. It seems to me that the whole point of using a common type instead of a comparability would be to focus on (a degree of) interchangeability rather than whether two things can be matched to each other. You can argue till the cows come home as to whether a social security number and an employment history are more like an employee’s height and weight (which are fundamentally uncomparable, belonging to the same employee but being truly unrelated things about that employee) or are more like two different IDs for an employee (ignoring potential issues of social security numbers uniquely identifying not employees but rather current members of the social security system, i.e. they could theoretically be reused and probably won’t apply to foreigners with work permits), but the fact is that there are situations where you can’t use a social security number in place of an employment history or vice versa. The big question is not, “Is a social security number and some kind of employee record an example of something that can be compared/matched or something that cannot be compared/matched?” the question is, “Is this algorithm or concept one that requires mere comparison/matching or instead one that requires a degree of interchangeability?” …Right? Something like that?

  8. A slight off-topic, Mr. Niebler.
    I really appreciate your work and find most of the algorithms really the future of the standard.
    Unfortunately I can’t use them in the code at work because I’m using Microsoft compiler. Besides asking to them to adhere to the standard (which I’ve already done), I was wandering if there was a remote chance to have a version of Ranges v3 working for VS2013NovCTP or VS2015. Maybe with some functionalities stripped down.
    I know it could be impossible, just asking 🙂

    • There is no way that I know of to support the Microsoft compiler. range-v3 heavily relies on emulated concept-based overloading. I don’t know how to implement that emulation without generalized SFINAE on expressions. Microsoft’s compiler doesn’t support that language feature, and there’s no ETA for when it will. So there’s no ETA for when a Microsoft compiler will be capable of compiling range-v3.

      Microsoft has had 4 years to produce a conforming C++11 compiler. It hasn’t. I would encourage your employer to switch to clang-cl as soon at it reaches an acceptable level of stability.

      • I actually forked range-v3 in an attempt to hack in MSVC2015 compatibility.

        https://github.com/CornedBee/range-v3

        But I’m not really getting far, because not only is eSFINAE not implemented properly, the heavy use of it in range-v3 actually crashes the compiler, making my attempts to work around the lack of support extremely frustrating.

        I would love to switch to clang-cl, but our programs rely on some dark corners of the MS exception model, and I’m not sure when, if ever, Clang’s emulation will be good enough to switch.

        • You would have to rewrite the entire concept checking emulation layer, and then rewrite all the concept checks throughout the library. In addition to all the other places extended SFINAE is used. Good luck. You would have an easier time rewriting range-v3 from scratch. And you can imagine how I would feel about accepting a pull request that made such a hack job of my code. 😛

          • A little late reply: clang-cl has almost reached stability. Unfortunately ranges v3 still doesn’t compile with it. I’m wondering what it is still missing. Could be useful adding it to the library’s tested compilers? But maybe is too early 🙂

  9. Eric Niebler wrote:

    Your point about compilers not being able to check for total ordering is 100% correct. That is a semantic constraint. Concepts are both syntactic and semantic in nature, but the compiler only ever has enough information to test the syntactic constraints.

    The problem with concepts is that if a type models a concept syntactically but not semantically your program still type checks. That is just wrong, since we only care about semantics, syntax is isomorphic and can be mapped.

    What we need is explicit opt-in for concepts and a way to automatically derive them for a type that satisfies the syntactic requirements. For types that don’t satisfy the syntactic requirements we need concept maps.

    That way if you explicitly implement or derive a concept for a type, you are telling the compiler “this type models the semantics requirements (and the syntactic requirements map as follows)”.

    This is what Rust does: you only have concept maps (Traits), and they are opt-in, that is, you have to explicitly implement them for every type (or “derive” them for a type that satisfies the syntactic requirements).

    That way you end up with very useful “syntactically empty but full of meaning” Traits like, e.g., Send and Sync, that the compiler can type check.

      • If they had:
        – type level integers (I need this, it is coming)
        – variadics, and
        – an easier way to reuse code (they don’t have any form of inheritance, I workaround this with concepts maps but…),

        I would drop C++ without blinking and never come back because it is a very small very expressive language, that just does a lot of things right:
        – Traits (concepts/concept maps),
        – type erasure based on Traits (this is just Sean’s concept-based runtime-polymorphism as the only way to do runtime polymorphism, built into the language),

        And then: immutability by default, move semantics by default, built in tuples and type-safe variants with destructuring, data-race free concurrency, hygienic macros, a very good FFI with C, a very very very good package manager, and well… the most advertised thing is full memory safety (compiler errors on: using uninitialized values, null pointers, references to dead stack variables, …), but for me it is not its major selling point.

        They also have a “range” library, they call them iterators, that has solutions similar to yours (e.g. a size hint on a range). However, they are constrained by proving memory safety to the compiler, so their solutions are different. As an example, if in C++ a SizedRange lies about its size, you’ll probably get a segfault. In Rust, however, the size hint is (right now) safe, that is, they have to assume that the range might be lying, making the hint useless in practice… And that is why they, for example, discuss here about making it unsafe (which is just a way of telling the compiler: this can’t be proven, but i’m willing to segfault here for performance):

        http://internals.rust-lang.org/t/the-trusted-iterator-length-problem/1302

    • ‘The problem with concepts is that if a type models a concept syntactically but not semantically your program still type checks. That is just wrong, since we only care about semantics, syntax is isomorphic and can be mapped.”

      In C++ you already have a solution for semantics requirments – use a type trait like is_something, that will be specialized for the types that match concepts semantically. That is opt-in apporach. Futhermoe you may create an Something concept that will check if is_something is true and check syntatic requirment.

      The problem with opt-in apporach is that this does not scale well if the program is using multiple libraries that uses the same concept. For example if I have a library L1 that defines a Vector concept and library L2 that also define equivalent semantic concept Vector. With the opt-in apporach I must somehow say that every type that implements L1::Vector concept also fullfills L2::Vector (to use types from L2 in L1) and vice versa. If I combine N libraries, then I need to provide at least N^2 such declarations. This problem does not occur with implicitly deduced concept – if the L1::Vector and L2::Vector are equivalent then this will work out of box. And if I need opt-in I will just add a trait and run into same problem.

  10. In C++ you already have a solution for semantics requirments – use a type trait like is_something, that will be specialized for the types that match concepts semantically. That is opt-in apporach.

    Of course, this works pretty good, but my point is that you cannot define an empty concept like “concept TotalOrder;” and then say “int models TotalOrder;”. You have to work around this using “empty templated structs” (a.k.a type traits). It is a common idiom, everyone understands it, and it is better than defining for every type a macro that is set to 0 or 1 depending on the type modeling the concept or not, but it is still a work around, a convention, a trick, a shadow world, a language within the language.

    The problem with opt-in apporach is that this does not scale well if the program is using multiple libraries that uses the same concept. [sic]

    It scales perfectly if multiple libraries use the same concept.

    For example if I have a library L1 that defines a Vector concept and library L2 that also define equivalent semantic concept Vector.

    This is the case when two libraries don’t define the same concept, they define two different concepts L1::Vector and L2::Vector, them being semantically equal is irrelevant. This situation happens in practice (it is not common tho). The problem here is that the compiler doesn’t know that these concepts are semantically equal since “they are different concepts”. What if they would be semantically “slightly” different? Risking correctness for saving a bit of typing is not worth it at all since:

    concepts are not defined often,
    most important concepts are in the standard library,
    the library that defines a new concept, implements it for the standard types,
    some libraries provide extension modules with implementations for types present in other libraries.

    With the opt-in apporach I must somehow say that every type that implements L1::Vector concept also fullfills L2::Vector (to use types from L2 in L1) and vice versa.

    Yes, and this is a good thing. It shouldn’t be you who implements those concepts, it is library L1’s job to decide if its types model the L2::Vector concept, and viceversa. What if L2 changes the semantics of its concept “just a little bit” in the next version? You could work around this in your own library by conditionally defining a concept: e.g. if library L1 is included, do not define the concept L2::Vector, use L1::Vector concept. But this is wrong, don’t do it.

    If you define a new concept for L1 and L2, then it is your job, maybe together with L1 and L2 library authors, to decide if your concept should be implemented for their types and to actually do it.

    If for whatever reason you are forced to implement a concept for the types of an external library, just do it in its own module, put it on github, and done, everybody can use it.

    If I combine N libraries, then I need to provide at least N^2 such declarations.

    If you combine N libraries, with M types each, and each library defines semantically and syntactically (otherwise you would need a map from one concept to another anyways) equivalent concepts, and this applies to the M types within each library, and these libraries are not aware of each other and do not provide extension headers, and you want to use those types with all those libraries, then yes, you would have to do this N * M times, once. By doing this what you win is correctness, and no, this doesn’t happen in practice for large values of N and M. And for small values of N and M only one person (typically the author of each library) has to do it, so that the types of the other libraries work with its library. But hey he decided to reimplement an already existing concept, so it is his own fault.

  11. Sorry, I said

    it is library L1’s job to decide if its types model the L2::Vector concept,

    It should have been

    it is library L2’s job to decide if L1 types model the L2::Vector concept,

    • ‘it is library L2’s job to decide if L1 types model the L2::Vector concept”

      I do not agree. I can have two unrelated and developed independently libraries that I want to use in my program. For example lets assume that I want to combine a 2D drawing library that introduces a Point concept and some graphical one (like CGAL) that provide me a way to make traingulization of the mesh. This libraries will propably be developed indepnedently, and the authors will se not need to declare in drawing library that CGAL point meets their point requirment.

      Usually program uses a a bunch of idepended libraries, and need to glue them together. Also you are assuming the “Centralized Standard Library with Everything” model that does not apply to C++. The Standard will never define a concepts that will be used for each possible application of the C++ language in specific domain, so this libraries will need to provide they own concepts. And two independed libraries may overlap with some of them (like point for drawing and mesh creation or differencial equation solving).

      The language that is aiming to be one that is used to write a libraries, must provide a way to glue independed libraries together in the programs. And the opt-in apporach moves language in the oposite direction.

      And we do not need the language support for things that can be easily implemented in library. Traits (empty template struct) can be used to represent opt-int empty concepts without any drawback, so there is no need to have special language support.

      With the current design of Concept Lite we can implemented the rust model if needed (trait for sementaics/opt-in + syntax checks in concept). But the rust model does not support implicit concepts checks that is supported by Concept Lites. So if you can achieve what you need, why your are inclined to limit possibilities of peoples who need different model in their application.

      • The Application should have a App::Vector concept, and then say whether L1::Vector and/or L2::Vector match the concept.

        I haven’t followed it closely enough, but in Concepts Lite, since App::Vector is really a constexpr function, can’t you overload that function on L1::Vector and L2::Vector to explicitly opt in/out?

        • “The Application should have a App::Vector concept, and then say whether L1::Vector and/or L2::Vector match the concept.”

          What if library L1 provides algorithm that works on Vector concept, but not class that fullfills it. And L2 provide a Vector implemenation. Introducing App::Vector concept would not allow you to pass it to functions that accepts L1::Vector, L2::Vector from these libraries.

          “I haven’t followed it closely enough, but in Concepts Lite, since App::Vector is really a constexpr function, can’t you overload that function on L1::Vector and L2::Vector to explicitly opt in/out?”

          You cannot overload/specialize constexpr function.

          • Basically we are asking for concept maps. Or something like them. Then I can tell L1 that App::Vector isa L1::Vector, etc.

          • Yes, but my point is that use of concepts maps os not scallable, if tyou use many librareis is single application.

            Implicitly deduced concept, does nto have such problems. If one set of requirments is subset of another, you get the relation automatically.

    • I mean that there is difference between saying “my way of doing things is only right way” (pure OO or functional languages) versus “my way of doing things is right and I want it to be supported in language” (multi-paradigm languages).

      There is great value in having opt-in apporach for the concepts in some situations, and for sure the C++ should have ability to express that – and we can do it by combination of the trait and concept. But we should not go into direction to support only that apporach and do not allow onces that will be implicitly fullfilled by any type that meet syntax requirements.

      • But we should not go into direction to support only that apporach and do not allow onces that will be implicitly fullfilled by any type that meet syntax requirements.

        Agreed that we should allow both, but I am worried about the default:

        A Concept is “just” semantics, we just have to live with the syntax while writing code. The implicit approach defaults to “syntax equals semantics”. The explicit approach requires you to, by default, write a statement from a programmer to the compiler and other programmers stating that “type A models the semantics of concept B”. This is something that the compiler and other programmers cannot infer from syntax.

        Being pragmatic, you should be able to opt into “syntax equals semantics for concept B;”.

        But I think that “syntax equals semantics” is the wrong default, by default it means that unless you opt out for a given concept or type, a type that models the syntax but doesn’t model the semantics will type check and result in an incorrect program. Incorrect programs by default is just the wrong default.

        • I think that the default were choosed that way as we need less lanaguage changes.

          If the default would be opt-in, then you would need some way to declare that the concept should be deduced. For that we would need something like implicit/explicit declaration of concepts.

          With the language of size of C++, I am all for solutions that requires less language rules. So I preffer current trait + concept solutions from Concept Lite. And the opt-in apporach requires just 4 additiona lines (trait and one line in Concept declaration):
          template
          struct is_serializable : std::false_type
          {};

          template
          concept bool Serializable =
          is_serializable::value &&
          /* syntatic requiremetn that would be needed anyway */

          Also adding syntatic requirement as default will prevent the notion of more specialized concept and whole idea of implicit overloading of concepts. And if we separate semtnatics into separate trait, then we still can have more and less specialized concept if they use same trait to define semetnics. If two concept would mean different semantics by default (like separate trait), then whole relatation between them would be asla need to be stated explicitly.

Leave a Reply to Scott Prager Cancel reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.