N4128: Ranges for the Standard Library

Posted on October 11, 2014 by Eric Niebler — 28 Comments ↓

Eleven months ago, I began work on an updated range library for modern C++. Yesterday, I submitted a proposal to the C++ standardization committee to add ranges to the Standard Library. The proposal presents a vision for a future Standard Library that is familiar and yet more powerful, more usable, and more efficient than today’s.

My goal is nothing less than to change how C++ programmers write code. Seriously.

I want more people to use the standard algorithms. I want it to be simple to do so. I want operations on data to compose in logical, simple, succinct and powerful ways. I want it to be easy for people to adapt their data so that those operations can be applied to them. I want it to be easy to create new operations that compose: lazy algorithms that snap together like Lego blocks and that give your programs a literate, declarative feel. I want it to be efficient. And it has to be safe.

D4128 lays the foundation. I’m honored to list Sean Parent and Andrew Sutton as coauthors. Although most of the text of the proposal is mine, many of the ideas in it are not.

Next month I fly to Urbana to present our work to the committee. Come to Urbana and watch the fun. And wish me luck.

28 Replies to “N4128: Ranges for the Standard Library”

Nicola Gigante on October 12, 2014 at 5:16 am said:

I’ve read the proposal and I find it wonderful! Good luck for Urbana 😉

Reply ↓
German Diago on October 12, 2014 at 11:57 am said:

Will read it when I have some time. It’s long! Good luck. Many eyes will be watching your paper 😀

Reply ↓
German Diago on October 13, 2014 at 4:10 am said:
A bit unrelated strictly to your proposal, but here I go: If uniform call syntax was in for the next standard, I think it would be good to integrate lazy range pipelines with dot notation, without having to write extra code for the pipe support:
```
auto result =
    iota(1).filtered([](auto v) { return v % 2 == 0; })
           .transformed([](auto v) { return v * v; })
           .take(10)
           .accumulate(0);
```
Stroustrup & Dos Reis paper: https://isocpp.org/files/papers/N4174.pdf

Herb Sutter paper: https://isocpp.org/files/papers/N4165.pdf
Reply ↓
- Eric Niebler on November 4, 2014 at 6:38 am said:
  
  That would be pretty cool. And it would end the debate over pipe notation, which is the best feature. 🙂
  
  Reply ↓
Pingback: Module Range | C++ | Boost — Développement
Joe Gottman on October 14, 2014 at 7:39 pm said:

This is very nice. I have a question about the additional range versions of standard algorithms you want to add. How will you define the range overloads of algorithms that take three iterators into the same range, such as rotate or nth_element? I assume these will take a range and an iterator, but which will come first and will the iterator type be constrained by the range’s iterator_type?

Reply ↓
- Eric Niebler on October 14, 2014 at 7:46 pm said:
  
  The range parameter comes first, then the iterator, the type of which is determined by the type of the range. I should have listed that as one of the guidelines.
  
  Reply ↓
edflanders on October 15, 2014 at 9:31 am said:

Thanks for your work on this. Very exciting.

Do you have any plans for remove_erase? It’s one of the hidden gems in Boost Range, and think it would fit well into the standard, but I doubt it belongs in a general proposal such as this.

Reply ↓
- Eric Niebler on October 15, 2014 at 9:38 am said:
  
  Microsoft’s Stephan T. Lavavej has written a proposal, n4009, about uniform container erase, in which he talks about the erase/remove idiom. He has given deeper thought to the problem than I have. I expect Stephan’s proposal will be discussed at next month’s committee meeting.
  
  Reply ↓
edflanders on October 15, 2014 at 12:19 pm said:

Beautiful; thanks for the link. Good luck to you both.

Reply ↓
Jeff on October 17, 2014 at 9:04 am said:

Minor typo:

It is, in fact, a far more interesting and useful result that the Function that for_each currently returns.

should be:

It is, in fact, a far more interesting and useful result than the Function that for_each currently returns.

Reply ↓
tktk on October 18, 2014 at 5:25 am said:

Great work.
Many people are looking forward to this.

two questions:

Will you propose your range library(algorithm, numeric, view) not only the range concepts to JTC1/SC22/WG21?

Do you keep developing your range library?

Reply ↓
- Eric Niebler on October 18, 2014 at 10:28 am said:
  
  I will submit proposals for all of it, time permitting. And yes, I’m continuing to develop the range library.
  
  Reply ↓
Jeff on October 20, 2014 at 7:15 am said:

13.2.1 Projections versus Range Transform View:

auto it = std::find( a, 42, &employee::age );

auto a2 = a | view::transform( &employee::age );
auto it2 = std::find( a2, 42 );

In you library is assert( it == it2.base() ) no longer valid (is the base() method no longer existant)?

Reply ↓
- Eric Niebler on October 20, 2014 at 9:54 am said:
  
  Ah, good point. I just added a base() member to the iterators of adapted ranges. You’re right that using the base() member is one solution for this problem.
  
  Reply ↓
  - Jeff on October 20, 2014 at 10:30 am said:
    
    I’ve used this approach for a while now with boost::range and previously with boost::iterators. It works well for shallow range adaptation, but starts to look ungainly with …base().base().base() when reversed/filtered/transformed. What I usually need is the “original_base” iterator. Any ideas on extracting that in a generic fashion?
    
    Reply ↓
    - Eric Niebler on October 20, 2014 at 12:36 pm said:
      
      I agree that chains of .base().base().base() get unsightly and brittle. And original_base() function might be useful — it’s nothing that a little meta-programming can’t solve — but would be of limited use in generic context. If an algorithm is passed an arbitrary range and adds two layers of adaptation (say, a filter and a transform), it would only want to peel off two layers. And original_base() might peel off more if the range passed in was itself adapted.
      
      Reply ↓
    - Anders Sjögren on November 4, 2014 at 5:48 am said:
      
      How about defining a bookmark origin_tag, insert that and then be able to say .base_at<bookmark_tag>().
      
      Something like:
      struct origin_tag{};
      auto new_range = source_range | bookmark<origin_tag> | filter(…) | take(5);
      auto end_iterator = end(new_range).base_at<origin_tag>();
      end_iterator is now an iterator pointing to an item of the same type as the source range.
      
      Or it might even be possible to to make .base_at(source_range) work, which would be identical to .base_at<decltype(source_range)>(). That way, it would be unnecessary to introduce the notion of bookmark tags, etc.
      
      Reply ↓
      - Eric Niebler on November 4, 2014 at 6:43 am said:
        
        Or it might even be possible to to make .base_at(source_range) work, which would be identical to .base_at<decltype(source_range)>(). That way, it would be unnecessary to introduce the notion of bookmark tags, etc.
        
        That’s not a bad idea. I prefer to keep the interface small for now, but if this becomes a pain point, your suggestion is the best I’ve heard so far.
      - Anders Sjögren on November 5, 2014 at 2:10 am said:
        
        @Eric: Yes, a small interface is of course good. Also, as you mentioned, if .base() is supported by all included iterators (and/or ranges?), then it might be possible to retro-fit it by a standalone implementation:
        auto new_range = source_range | filter(…) | take(5);
        org_equivalent_range = new_range | base_at(source_range);
        or at least
        auto org_end_equivalent = base_at(source_range, end(new_range))
Anders Sjögren on October 31, 2014 at 2:56 am said:

Excellent work, and really needed!

Coming back to C++ from C# and working with data processing, the poor support from lazy enumerations has really been a stumblestone for me.

Please excuse me if the following comments/questions are obvious, naïve or plain wrong.

One aspect nags me: Iterators, iterables and ranges still seem to contain one flavor of the original non-lazy iteration: it seems to assume that data exist in some way and can cheaply be passed back, while the alternative "bool try_get_next(T& item)" and "while(try_get_next(item)){}"-based idiom does not require such an item existing in the iterator. Any comments on that?

Take for example reading all lines of a file, and storing them in a vector.
The current line is really not fundamentally needed to be stored in the iterator, as it it’s only ever accessed once.

The straight forward code could look something like:
auto r = read_lines(file_name); //returns a range
vector<string> v(begin(r), end(r));

This code seems to be semantically equivalent to the following (looking at the current clang implementation):

auto r = read_lines(file_name); //returns a range
vector<string> v;
auto current_iter = begin(r);
auto end_iter = end(r);
for (; current_iter != end_iter; ++current_iter) {
push_back(*current_iter); //Results in a copy operation, since *current_iter is int&
}

However, this would result in N copy operations from the temporary string (stored in an iterator or in the range), even though none are needed. What is the proposed idiom for making this more optimal?

I guess to circumvent this problem, one could std::move the current dereferenced item into the vector.
auto r = read_lines(file_name); //returns a range
vector<string> v;
foreach(r, [&v](string& line){
v.push_back(std::move(line)); //Results in a move operation
});
It this the proposed answer to this problem? How would this be expressed in high-level code such as the above?
Is there some move-adapter, that moves the items when dereferenced, so that the move constructor will be applied?

Thanks again for the excellent work.

Reply ↓
Anders Sjögren on October 31, 2014 at 3:09 am said:

In section 3.3.10 of the proposal (“Allow Mutable-Only Iterables”), it is proposed to keep the current item in the range, rather than in the iterator.

Coming from the C# world, my mental concept of a range is something like an IEnumerable/ICollection (something that can be iterated through) and the iterators are what is currently doing the iteration (the IEnumerable:s). Thus, the range should not be changed by an enumeration, and it should be safe to iterate through it concurrently with different iterators (i.e. by different threads). In the istream example, it is read-once, so then it might be ok, but in other examples where the range is read-many (say my previous exemple of reading all items of a file, where a new stream can be opened for each enumeration), it’s not the case. I guess this question ties into my previous comment, seemingly indicating that some shoe-horning is still needed to fit read-once enumerations of data not present in memory into iterators.

Does this make sense? Any thoughts on how this could be made even smoother?

Reply ↓
- Eric Niebler on November 6, 2014 at 11:42 am said:
  
  I think that any range that mutates while it is being iterated is an Input iterable. I need to carefully think through the implications of that, and possibly bake that into the concepts.
  
  Reply ↓
  - Jeff on November 19, 2014 at 11:29 am said:
    
    Have you identified any implications of a mutating range?
    
    I’ve a sequence of road segments(which have lengths) and a total distance d from the beginning of the sequence. I need to determine which road segment I’m on in order to access the road segment’s additional properties. So what I need is a partial_sum adapter. As far as i can tell this requires storing mutable state (the previous partial sum).
    
    Reply ↓
    - Eric Niebler on November 19, 2014 at 5:02 pm said:
      
      Oh, interesting. Yes, you would need a partial_sum view, and I can’t see a way to do this efficiently without storing mutable state in the range itself. Which is fine, but the implication is that this is an input range — it can only be traversed once — and only on one thread.
      
      Reply ↓
pedroiqvp.wordpress.com on December 22, 2014 at 2:39 am said:

Pretty! This has been a really wonderful post.

Many thanks for providing this information.

Reply ↓
qw on January 24, 2015 at 12:51 am said:

The ranges library looks very similar to LINQ in C# and Streams in Java8. I haven’t used boost ranges before, so my comments may be completely wrong, but I think from an end-programmer’s point of view it would be nice to have ranges semantics be uniform/consistent. Using the sum-of-first-10-squares example in https://ericniebler.github.io/std/wg21/D4128.html:

int total = accumulate(view::iota(1) |
view::transform([](int x){return x*x;}) |
view::take(10), 0);

I think my biggest gripe in the example is mixing pipe semantics (the pipe operator) with the function-calling semantics (STL functions like std::accumulate). To me at least, the semantics would look more consistent if written this way:

int total = integer_range() // <– a lazy generator of numbers

first(10)
select( [](int x){return x*x;} )
sum() // <– can take a lambda if we are “summing” user-defined types; this is effective std::accumulate()

In fact, this semantic style is used in the cpplinq library (cpplinq.codeplex.com). Likewise, if we wanted to take the first 10 odd integers from a random list of integers, square the values, sort them, and save them to a new vector, the code would look like this:

vec randVec { /* long vector of random integers unsorted */ };
vector vec = from( randVec ) // <– from() returns a generator

where( [](int x){ return x%2 != 0; } )
first( 10 )
select( [](int x){return x*x;} )
to_vec();

which looks more consistent than calling std::sort() and std::filter() in the middle.

Is there consideration for this, or is it beause we want ranges to be implemented on top of iterators (or section 3.3.1.1 of D4128), that this is out of the question?

Reply ↓
- qw on January 24, 2015 at 12:59 am said:
  
  Sorry, I formatted the code in my post incorrectly:
  
  int total = integer_range() >> first(10) >> select( [](int x){return x*x;} ) >> sum();
  
  // sum() can take a lambda if we are “summing” user-defined types; this is effective std::accumulate()
  
  and
  
  vec randVec { /* long vector of random integers unsorted / }; vector vec = from( randVec ) >> where( [](int x){ return x%2 != 0; } ) >> first( 10 ) >> select( [](int x){return xx;} ) >> to_vec();
  
  Reply ↓

28 Replies to “N4128: Ranges for the Standard Library”

Leave a Reply to Joe GottmanCancel reply