N4128: Ranges for the Standard Library

Eleven months ago, I began work on an updated range library for modern C++. Yesterday, I submitted a proposal to the C++ standardization committee to add ranges to the Standard Library. The proposal presents a vision for a future Standard Library that is familiar and yet more powerful, more usable, and more efficient than today’s.

My goal is nothing less than to change how C++ programmers write code. Seriously.

I want more people to use the standard algorithms. I want it to be simple to do so. I want operations on data to compose in logical, simple, succinct and powerful ways. I want it to be easy for people to adapt their data so that those operations can be applied to them. I want it to be easy to create new operations that compose: lazy algorithms that snap together like Lego blocks and that give your programs a literate, declarative feel. I want it to be efficient. And it has to be safe.

D4128 lays the foundation. I’m honored to list Sean Parent and Andrew Sutton as coauthors. Although most of the text of the proposal is mine, many of the ideas in it are not.

Next month I fly to Urbana to present our work to the committee. Come to Urbana and watch the fun. And wish me luck.

28 Replies to “N4128: Ranges for the Standard Library”

  1. A bit unrelated strictly to your proposal, but here I go: If uniform call syntax was in for the next standard, I think it would be good to integrate lazy range pipelines with dot notation, without having to write extra code for the pipe support:

    auto result =
        iota(1).filtered([](auto v) { return v % 2 == 0; })
               .transformed([](auto v) { return v * v; })
               .take(10)
               .accumulate(0);
    

    Stroustrup & Dos Reis paper: https://isocpp.org/files/papers/N4174.pdf

    Herb Sutter paper: https://isocpp.org/files/papers/N4165.pdf

  2. Pingback: Module Range | C++ | Boost — Développement

  3. This is very nice. I have a question about the additional range versions of standard algorithms you want to add. How will you define the range overloads of algorithms that take three iterators into the same range, such as rotate or nth_element? I assume these will take a range and an iterator, but which will come first and will the iterator type be constrained by the range’s iterator_type?

  4. Thanks for your work on this. Very exciting.

    Do you have any plans for remove_erase? It’s one of the hidden gems in Boost Range, and think it would fit well into the standard, but I doubt it belongs in a general proposal such as this.

    • Microsoft’s Stephan T. Lavavej has written a proposal, n4009, about uniform container erase, in which he talks about the erase/remove idiom. He has given deeper thought to the problem than I have. I expect Stephan’s proposal will be discussed at next month’s committee meeting.

  5. Minor typo:

    It is, in fact, a far more interesting and useful result that the Function that for_each currently returns.

    should be:

    It is, in fact, a far more interesting and useful result than the Function that for_each currently returns.

  6. Great work.
    Many people are looking forward to this.

    two questions:

    Will you propose your range library(algorithm, numeric, view) not only the range concepts to JTC1/SC22/WG21?

    Do you keep developing your range library?

  7. 13.2.1 Projections versus Range Transform View:

    auto it = std::find( a, 42, &employee::age );

    auto a2 = a | view::transform( &employee::age );
    auto it2 = std::find( a2, 42 );

    In you library is assert( it == it2.base() ) no longer valid (is the base() method no longer existant)?

      • I’ve used this approach for a while now with boost::range and previously with boost::iterators. It works well for shallow range adaptation, but starts to look ungainly with …base().base().base() when reversed/filtered/transformed. What I usually need is the “original_base” iterator. Any ideas on extracting that in a generic fashion?

        • I agree that chains of .base().base().base() get unsightly and brittle. And original_base() function might be useful — it’s nothing that a little meta-programming can’t solve — but would be of limited use in generic context. If an algorithm is passed an arbitrary range and adds two layers of adaptation (say, a filter and a transform), it would only want to peel off two layers. And original_base() might peel off more if the range passed in was itself adapted.

        • How about defining a bookmark origin_tag, insert that and then be able to say .base_at<bookmark_tag>().

          Something like:
          struct origin_tag{};
          auto new_range = source_range | bookmark<origin_tag> | filter(…) | take(5);
          auto end_iterator = end(new_range).base_at<origin_tag>();
          end_iterator is now an iterator pointing to an item of the same type as the source range.

          Or it might even be possible to to make .base_at(source_range) work, which would be identical to .base_at<decltype(source_range)>(). That way, it would be unnecessary to introduce the notion of bookmark tags, etc.

          • Or it might even be possible to to make .base_at(source_range) work, which would be identical to .base_at<decltype(source_range)>(). That way, it would be unnecessary to introduce the notion of bookmark tags, etc.

            That’s not a bad idea. I prefer to keep the interface small for now, but if this becomes a pain point, your suggestion is the best I’ve heard so far.

          • @Eric: Yes, a small interface is of course good. Also, as you mentioned, if .base() is supported by all included iterators (and/or ranges?), then it might be possible to retro-fit it by a standalone implementation:
            auto new_range = source_range | filter(…) | take(5);
            org_equivalent_range = new_range | base_at(source_range);
            or at least
            auto org_end_equivalent = base_at(source_range, end(new_range))

  8. Excellent work, and really needed!

    Coming back to C++ from C# and working with data processing, the poor support from lazy enumerations has really been a stumblestone for me.

    Please excuse me if the following comments/questions are obvious, naïve or plain wrong.

    One aspect nags me: Iterators, iterables and ranges still seem to contain one flavor of the original non-lazy iteration: it seems to assume that data exist in some way and can cheaply be passed back, while the alternative "bool try_get_next(T& item)" and "while(try_get_next(item)){}"-based idiom does not require such an item existing in the iterator. Any comments on that?

    Take for example reading all lines of a file, and storing them in a vector.
    The current line is really not fundamentally needed to be stored in the iterator, as it it’s only ever accessed once.

    The straight forward code could look something like:
    auto r = read_lines(file_name); //returns a range
    vector<string> v(begin(r), end(r));

    This code seems to be semantically equivalent to the following (looking at the current clang implementation):

    auto r = read_lines(file_name); //returns a range
    vector<string> v;
    auto current_iter = begin(r);
    auto end_iter = end(r);
    for (; current_iter != end_iter; ++current_iter) {
    push_back(*current_iter); //Results in a copy operation, since *current_iter is int&
    }

    However, this would result in N copy operations from the temporary string (stored in an iterator or in the range), even though none are needed. What is the proposed idiom for making this more optimal?

    I guess to circumvent this problem, one could std::move the current dereferenced item into the vector.
    auto r = read_lines(file_name); //returns a range
    vector<string> v;
    foreach(r, [&v](string& line){
    v.push_back(std::move(line)); //Results in a move operation
    });
    It this the proposed answer to this problem? How would this be expressed in high-level code such as the above?
    Is there some move-adapter, that moves the items when dereferenced, so that the move constructor will be applied?

    Thanks again for the excellent work.

  9. In section 3.3.10 of the proposal (“Allow Mutable-Only Iterables”), it is proposed to keep the current item in the range, rather than in the iterator.

    Coming from the C# world, my mental concept of a range is something like an IEnumerable/ICollection (something that can be iterated through) and the iterators are what is currently doing the iteration (the IEnumerable:s). Thus, the range should not be changed by an enumeration, and it should be safe to iterate through it concurrently with different iterators (i.e. by different threads). In the istream example, it is read-once, so then it might be ok, but in other examples where the range is read-many (say my previous exemple of reading all items of a file, where a new stream can be opened for each enumeration), it’s not the case. I guess this question ties into my previous comment, seemingly indicating that some shoe-horning is still needed to fit read-once enumerations of data not present in memory into iterators.

    Does this make sense? Any thoughts on how this could be made even smoother?

    • I think that any range that mutates while it is being iterated is an Input iterable. I need to carefully think through the implications of that, and possibly bake that into the concepts.

      • Have you identified any implications of a mutating range?

        I’ve a sequence of road segments(which have lengths) and a total distance d from the beginning of the sequence. I need to determine which road segment I’m on in order to access the road segment’s additional properties. So what I need is a partial_sum adapter. As far as i can tell this requires storing mutable state (the previous partial sum).

        • Oh, interesting. Yes, you would need a partial_sum view, and I can’t see a way to do this efficiently without storing mutable state in the range itself. Which is fine, but the implication is that this is an input range — it can only be traversed once — and only on one thread.

  10. The ranges library looks very similar to LINQ in C# and Streams in Java8. I haven’t used boost ranges before, so my comments may be completely wrong, but I think from an end-programmer’s point of view it would be nice to have ranges semantics be uniform/consistent. Using the sum-of-first-10-squares example in https://ericniebler.github.io/std/wg21/D4128.html:

    int total = accumulate(view::iota(1) |
    view::transform([](int x){return x*x;}) |
    view::take(10), 0);

    I think my biggest gripe in the example is mixing pipe semantics (the pipe operator) with the function-calling semantics (STL functions like std::accumulate). To me at least, the semantics would look more consistent if written this way:

    int total = integer_range() // <– a lazy generator of numbers

    first(10)
    select( [](int x){return x*x;} )
    sum() // <– can take a lambda if we are “summing” user-defined types; this is effective std::accumulate()

    In fact, this semantic style is used in the cpplinq library (cpplinq.codeplex.com). Likewise, if we wanted to take the first 10 odd integers from a random list of integers, square the values, sort them, and save them to a new vector, the code would look like this:

    vec randVec { /* long vector of random integers unsorted */ };
    vector vec = from( randVec ) // <– from() returns a generator

    where( [](int x){ return x%2 != 0; } )
    first( 10 )
    select( [](int x){return x*x;} )
    to_vec();

    which looks more consistent than calling std::sort() and std::filter() in the middle.

    Is there consideration for this, or is it beause we want ranges to be implemented on top of iterators (or section 3.3.1.1 of D4128), that this is out of the question?

    • Sorry, I formatted the code in my post incorrectly:


      int total = integer_range()
      >> first(10)
      >> select( [](int x){return x*x;} )
      >> sum();

      // sum() can take a lambda if we are “summing” user-defined types; this is effective std::accumulate()

      and


      vec randVec { /* long vector of random integers unsorted / };
      vector vec = from( randVec )
      >> where( [](int x){ return x%2 != 0; } )
      >> first( 10 )
      >> select( [](int x){return x
      x;} )
      >> to_vec();

Leave a Reply to Joe Gottman Cancel reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.