Container Algorithms

The recent meeting of the C++ Standardization Committee in Urbana-Champaign was a watershed moment for my work on ranges. Ultimately, my presentation was well received (Herb Sutter used the phrase “palpable excitement” to describe the feeling in the room), but it wasn’t at all certain that things would go that way, and in fact an eleventh-hour addition pushed the proposal over the top: container algorithms.

Ranges, as of N4128

The existing algorithms in the C++ standard library operate eagerly. After std::transform returns, for instance, you can be sure that all the transform-y stuff is done. Some algorithms are also mutating. When you call std::sort, the data has been sorted — in place.

Not so with the range views that N4128 proposes. These are like lazily-evaluated, non-mutating algorithms that present custom views of data stored elsewhere. For instance, when you say:

std::vector<int> ints{1,2,3,4};
auto squared = ints
    | view::transform([](int i){return i*i;});

… not one iota of transforming has happened. You have just created a view that, when it’s iterated, does transformation on-the-fly, without mutating the underlying sequence.

The algorithms and the views differ in another important way: the views easily compose — filter a transformed slice? No problem! — but the algorithms don’t. Doing that sort of thing with the algorithms requires fiddling about with iterators and named temporaries, and takes several lines of chatty code.

The Missing Piece

So to sum up, in the world of N4128, we have this:

  1. Eager algorithms that can mutate but that don’t compose.
  2. Lazy algorithms that can’t mutate but do compose.
  3. ??!!!!

Whoops! Something is missing. If I want to read a bunch of ints, sort them, and make them unique, here’s what that would look like in N4128:

extern std::vector<int> read_ints();
std::vector<int> ints = read_ints();
std::sort(ints);
auto i = std::unique(ints);
ints.erase(i, ints.end());

Blech! A few people noticed this shortcoming of my proposal. A week before the meeting, I was seriously worried that this issue would derail the whole effort. I needed a solution, and quick.

Container Algorithms

The solution I presented in Urbana is container algorithms. These are composable algorithms that operate eagerly on container-like things, mutating them in-place, then forwarding them on for further processing. For instance, the read+sort+unique example looks like this with container algorithms:

std::vector<int> ints =
    read_ints() | cont::sort | cont::unique;

Much nicer. Since the container algorithm executes eagerly, it can take a vector and return a vector. The range views can’t do that.

A Moving Example

Move semantics makes all of this work smoothly. A temporary container is moved into a chain of mutating container algorithms, where it is munged and moved out, ready to be slurped up by the next container algorithm. (Naturally, performance would suffer if container algorithms were used with a container that wasn’t efficiently movable, like a big std::array. Don’t do that.)

Since container algorithms accept and return containers by value, I worried that people might do this and be surprised by the result:

std::vector<int> v{/*...*/};
// Oops, this doesn't sort v:
v | cont::sort;

The author of this code might expect this to sort v. Instead, v would be copied, the copy would be sorted, and then the result would be ignored.

Also, there’s a potential performance bug in code like below if we allow people to pass lvalues to container algorithms:

// Oops, this isn't very efficient:
std::vector<BigObject> bigvec{/*...*/};
bigvec = bigvec | cont::sort | cont::unique;

bigvec is copied when it’s passed to cont::sort by value. That’s bad! The alternative would be to have container algorithms do perfect forwarding — in which case what gets returned is a reference to bigvec. That then gets assigned back to bigvec! Assigning a container to itself is … weird. It’s guaranteed to work, but it’s not guaranteed to be efficient. An interface that makes it easy to make this mistake is a bad interface.

Instead, in my current thinking, the above code should fail to compile. The container algorithms require rvalue containers; you should move or copy a container into the chain. With range-v3, that looks like this:

using namespace ranges;
bigvec = std::move(bigvec) | cont::sort | cont::unique;

That fixes the performance problem, and also makes it pretty obvious that you ignore the return type of move(v) | cont::sort at your own peril.

I also offer this short form to apply a chain of mutating operations to a container:

bigvec |= cont::sort | cont::unique;

If you’re not a fan of the pipe syntax, this works too:

cont::unique(cont::sort(bigvec));

Both of these syntaxes will refuse to operate on temporary containers.

What is a Container?

Consider this line of code from above, which applies a chain of mutating operations to a container:

bigvec |= cont::sort | cont::unique;

How is this implemented? One simple answer is to make it a synonym for the following:

bigvec = std::move(bigvec) | cont::sort | cont::unique;

But not all containers are efficiently movable, so this would be needlessly inefficient. Instead, what gets passed around is a reference-wrapped container. Essentially, it’s implemented like this:

std::ref(bigvec) | cont::sort | cont::unique;

But cont::sort and cont::unique are container algorithms. Is a reference-wrapped container also a container, then? Un-possible!

Containers own their elements and copy them when the container is copied. A reference-wrapped container doesn’t have those semantics. It’s a range: an Iterable object that refers to elements stored elsewhere. But ref(v) | cont::sort sure seems like a reasonable thing to do.

In other words, container algorithms are misnamed! They work just fine when they are passed ranges, so long as the range provides the right operations. cont::sort needs an Iterable with elements it can permute, and that’s it. It doesn’t care at all who owns the elements.

cont::unique is also indifferent to element ownership, so long as it has a way to remove the non-unique elements. Rather than relying on an erase member function to do the erasing, we can define erase as a customization point — a free function — that any Iterable type can overload. With the appropriate overload of erase for reference-wrapped containers, std::ref(v) | cont::unique will Just Work.

The interesting (to me, at least) result of this is that containers are not interesting. Instead, we get much farther with refinements of the Iterable concept that add specific behaviors, like EraseableIterable. The container algorithms accept any Iterable that offers the right set of behaviors. They don’t care one whit who owns the elements.

Summary

Over the past month, I’ve been adding a full suite of container algorithms to my range-v3 library for things like sorting, removing elements, slicing, inserting, and more. These are eager algorithms that compose. I call them “container algorithms” since “eager, composable algorithms” doesn’t roll off the tongue — they are perfectly happy working ranges. If you want to send a non-owning slice view to cont::sort, knock yourself out.

Container algorithms fill a gaping hole in N4128. They went a long, long way to appeasing many of the committee members who dearly want ranges to solve the usability problems with the current standard algorithms. I can only assume that, had I left container algorithms out of my presentation, the reception in Urbana would have been a few degrees colder.

Acknowledgements

The design of container algorithms presented here benefited tremendously by feedback from Sean Parent.

UPDATE:

I’ve heard you! “Container algorithm” is a confusing name. They aren’t restricted to containers, and that isn’t the interesting bit anyway. The interesting bit is that they are eager, mutating, composable algorithms. There is no one pithy word that conveys all of that (AFAICT), but so far “action” has come closest. So we now have view::transform (lazy, non-mutating) and action::transform (eager, mutating). Not perfect, but better, certainly.

"\e"

121 Replies to “Container Algorithms”

    • You know, I guess that should teach me to reply after only having read half the post.

      However, I read your performance argument as terribly unconvincing.

      Firstly, performance is a QoI issue, and I think it would be trivial for implementations to simply compare the address of the LHS and the source container address to determine if it’s in-place or not.

      Secondly, this “performance problem” is a hypothetical. Somebody, somewhere, might pay a cost. And then they might just re-write their code to not operate that way- for example, move the container into another, and then view the temporary.

      “But it might not be as efficient as array access” is a terrible reason to gimp the interface. If people don’t want to use inefficient functions, they can just not call them. But by removing them from the interface, you break it for everybody, everywhere, even if they absolutely do not give a crap about maybe accidentally copying this vector one extra time in a cold path.

        • No. No, it is not enough. You should know that.

          Assembly and C are also both short, unambiguous, and efficient, but we don’t program in them. Hell, shortness can be a major downside- ask everyone who using namespace std or tried to program in Perl or APL or named their identifiers “a”, “b”, “c”.

          The primary goal of code is not to be short, nor to be efficient. Efficiency is a thing you only go for when you know you really need it in that hot path- a special case. Readability and maintainability is a primary concern. Shortness of source code is a thing nobody needs. Hell, shortness of binary code is a bit of a special case.

          You duplicated the algorithms and prevented perfectly logical and sane code from compiling for no benefit at all. That is not a win. That’s a fail.

      • There is one consideration of having things in place, although it’s not necessarily “performance.”

        As a real-world example, your website might accept CSV files and do some form of aggregation on them – let’s say add up the cost of order lines.

        The very naive approach would be to parse that CSV in memory and add all the lines to a vector, then use that vector as required. Next thing you know the client uploads a 100GB document…

        “Container algorithms” mean that you can lex the data as it streams in, filter out to the column you want and aggregate – the memory complexity would be O(1) (the total stack frame sizes of your selectors/predicates/lambdas).

        This is a big deal for C++. Some of us need to use the iterative approach up-front, because our customers tend to do things like 100GB documents and we simple can’t store that shit in memory. If streaming operations/container algorithms/linq are made trivial then there is a drastically smaller chance of a junior writing new OrderLine[size];

  1. Congratulations so far Eric, this is possibly the biggest feature Ill use in the upcoming C++ standard. As Sean Parent says, all programs are just transformations of data

    By the looks of it you are focusing on the core semantics of ranges and not the details of the algorithms the library will provide, right? Please let me vote to include an enumerating range wrapper that sticks an integral index onto each element (not the boost::range one either, which doesnt work with the new for syntax)

    • If I understand you correctly, you want something that takes a sequence like [a,b,c,d] and produces a sequence like [(a,0),(b,1),(c,2), (d,3)]. Is that right? If so, you can do this today, and it’s pretty trivial:

      auto rng = view::zip( v, view::ints );
      

  2. Let’s suppose we have a constructor for vector that takes an Iterable:

    std::vector vec_from_view(read_ints() | view::transform(…) | view::unique);

    std::vector vec_from_container(read_ints() | cont::transform(…) | cont::unique);

    These two examples, if I am not mistaken, yield the same value, but they operate in different ways: the first one will traverse the vector only once, and will make a copy of the view, leaving the first vector alone. The second one, will traverse it twice and will move the result.

    I see you cannot do that with sort, because you cannot have a view for sort, as in your example. So my question is:

    Is the point of “container composable algorithms” to allow to compose in cases where you wouldn’t be able to do it with views alone? But some operations can
    still can be done with both ‘view’ and with ‘cont’ in some situations, does this hold true (I guess yes)?
    In the face of choice, lazy compositions would have better performance? I am not sure about this. My intuition tells me that a single pass will be faster than several passes, when allowed.

    • Let’s suppose we have a constructor for vector that takes an Iterable:

      Actually, we do! (Sort of.) All the views are implicitly convertible to containers. So you can do this today, for instance:

      // Initialize v with {0,1,2,3,...99}
      vector&lt;int&gt; v = view::ints | view::take(100);
      

      Looking at your example:

      std::vector vec_from_view(read_ints() | view::transform(…) | view::unique);
      
      std::vector vec_from_container(read_ints() | cont::transform(…) | cont::unique);
      

      These two examples, if I am not mistaken, yield the same value, but they operate in different ways

      Not quite. The first example won’t compile. If read_ints is returning a vector of ints, then you can’t send that directly to a view. Views don’t work on temporary containers.

      the first one will traverse the vector only once, and will make a copy of the view, leaving the first vector alone. The second one, will traverse it twice and will move the result.

      This part is right.

      Is the point of “container composable algorithms” to allow to compose in cases where you wouldn’t be able to do it with views alone? But some operations can
      still can be done with both ‘view’ and with ‘cont’ in some situations, does this hold true (I guess yes)?

      Yes.

      In the face of choice, lazy compositions would have better performance? I am not sure about this. My intuition tells me that a single pass will be faster than several passes, when allowed.

      I haven’t done benchmarking, but I expect you’re right: a single pass would be better.

      • Not quite. The first example won’t compile. If read_ints is returning a vector of ints, then you can’t send that directly to a view. Views don’t work on temporary containers.

        Why don’t views work on temporary containers? I thought the whole point of “fat” ranges was to make this work.

          • What do you mean by “fat”?

            Sorry I meant that the algorithms were range based rather than using “fat” iterators.

            Ranges are lightweight, non-owning views of data.

            You mean views right? Because containers own their elements.

            They never own their elements, so they can’t safely refer to a temporary container.

            But why can’t a view never own its element? Is there something error prone or problematic by following the model:

            rvalue => own it
            lvalue => reference it

            To me it seems better, because we can detect if something is an rvalue or lvalue. We cannot detect whether a range owns its elements or not without help from a user.

          • Ranges are lightweight, non-owning views of data.

            You mean views right? Because containers own their elements.

            No, I mean ranges. Containers own their elements, ranges don’t. A container is not a range. Both containers and ranges are iterables, though, and most of the algorithms only care about iterables. That’s because the algorithms don’t care much about ownership semantics. They are eager. The views, however, very much care about ownership semantics, since they are lazy.

            Is there something error prone or problematic by following the model:

            * rvalue => own it
            * lvalue => reference it

            There is something wrong with it, but it’s subtle. Currently in the standard library, value category (whether an expression is an lvalue or an rvalue) is used to perform optimizations, but generally only in places where it doesn’t change observable semantics; e.g. to steal the guts of an object that is going away anyway. What you’re suggesting is to use value category to fundamentally change the semantics of an expression in an observable and potentially surprising way.

            If we allow a view to capture rvalue containers, then this works and is safe:

            auto read_ints() {
                return read_strings() | view::transform(to_int);
            }
            

            Someone might want to see the return value of read_strings() in a debugger, so they change the code to this:

            auto read_ints() {
                auto strings = read_string();
                return strings | view::transform(to_int);
            }
            

            This code also compiles, but now it’s a memory error. Oops. A simple refactoring has fundamentally changed the semantics and performance characteristics of the code. That makes me extremely uncomfortable.

            Range-v3 worked they way you suggested for the first 6 months of its life, and it continually rubbed me the wrong way. Once I switched my thinking to “views are ranges, ranges are lightweight and non-owning”, it cleared up a host of fuzzy semantics issues and made things crystal clear.

            I get that this answer isn’t satisfying for some people, and I understand why. Can you also see my point?

          • Yes that makes a lot of sense, thanks for explaining that. However, the terminology is really confusing for me, since a range has traditionally meant begin and end iterators.

            To me, it seems like it would be better to call:

            non-owning => a view or a weak range
            owning => a container

            Perhaps you already thought through this.

          • However, the terminology is really confusing for me, since a range has traditionally meant begin and end iterators.

            A begin and end iterator is exactly what I’m calling “Range”. It’s lightweight and non-owning, just like a pair of iterators. Your terminology also works, though. It might be worth mentioning in my next std paper.

    • I see you cannot do that with sort, because you cannot have a view for sort, as in your example.

      Actually you can – it’s quite amazing what you can do with views. The creation of a non-mutating lazy sorted view based on mergesort is a little bit tricky and mind-twisting, but absolutely possible. Gary Powell and I implemented this several years ago, and to our astonishment it worked.

      I’m not sure I’d really recommend to use it in practice, though.

      Btw, great job, Eric!

        • We implemented a set of views about 14 years ago. You can find the code and some documentation at http://www.zib.de/weiser/vtl/index.html. It’s been more an exploration of possibilities than a serious attempt to standards-compliant code, and wouldn’t live up to current standards.

          Apart from that, it’s been quite interesting work, at least for us.

          Just now that I looked it up I didn’t find the actual sorted view code I remember, but essentially all components are there (most notably the merge_view). Maybe somewhere I can dig up the test code for sorted views we wrote at that time.

          The core idea is the merge view. It takes to iterables (in your notation) and iterates over them by always presenting the smaller of the two current elements. What you need in addtion is something that recursively splits a random access iterable in two halves and creates a merge view of the sorted views of the two halves.

          Why wouldn’t I recommend it for practical use?

          It’s a quite complex tangled mess of code for such a simple task as sorting. Close to mental masturbation – fascinating what you can do, sorting a vector on the fly without moving any elements 🙂

          I don’t remember we did tests on the sorted view. The “naive” implementation I remember has probably n*n complexity, but it should be possible to achieve n log n with some result caching. The tree of views takes n log n memory, of course. Memory access patterns are probably suboptimal as well.

  3. P.S.: After reading your post, I think that cont:: is not an appealing namespace for these algos, after all, the fact that they operate on containers doesn’t matter, as you say. I am squeezing my mind to come up with a name, but I cannot find one 🙁

      • Let me know when you come up with something. I’ve been thinking, too.

        Replace when by if. 😀 I will try, I was doing some kind of brainstorming about the properties of ranged vs container algorithms.

        Container algos are eager, range-based are lazy… etc, but I think a too technical name would come out from there. I am not sure how to go on. Those are not container algorithms they are ______ <—- Something I don’t know here.

      • How about “inplace”? Though that may convey the wrong thing for 2-container algorithms, like “transform from here to there”.

      • Great work Eric – can’t wait to write code like this as standard C++.

        About the name – I had the thought that something evoking the movement and processing might work – like chomp, or digest. It’s not easy to come up with short words that get all the meaning …

  4. Last comment, sorry for spamming:
    I think this won’t work, but does it make sense?

    auto my_new_seq =
    some_generated_view | view::transform(…) | generate_a_vector | cont::sort | view::unique | view::transform(…));

    Because cont::sort is an rvalue, so applying a view will not work, I guess. I can save passes using view:: after cont::sort, because views operate lazily.

    Is this what I should do every time I want that?

    auto my_temp_seq =
    some_generated_view | view::transform(…) | generate_a_vector | cont::sort);

    auto my_new_seq = vector(my_temp_seq | view::unique | view::transform(…)));

    Great work, BTW, hope to see some of this into the next standard!

    • I think this won’t work, but does it make sense?

      auto my_new_seq =
          some_generated_view |
          view::transform(…) |
          generate_a_vector |
          cont::sort |
          view::unique |    // ERROR HERE
          view::transform(…);
      

      As I’ve annotated above, everything works until you try to pipe the results of cont::sort to view::unique, which can’t work on a temporary container. You need to save the result of cont::sort into a local.

      I can save passes using view:: after cont::sort, because views operate lazily.

      Is this what I should do every time I want that?

      You got it.

      Great work, BTW, hope to see some of this into the next standard!

      Thanks!

      • Maybe it’d be interesting to consider an analog, but opposite, evil twin of std::move. One which would allow the programmer to explicitly cast an rvalue into a lvalue. It would require some amount of care in its use, somewhat like std::move, but I suspect even more so.

    • By the way, your post gave me an “I should have had a V-8” moment. Of course there should be a generate_a_vector view. As of today, there is a view::to_vector adaptor, as well as the more general forms: view::to_<std::vector>() (element type is deduced), and view::to_<std::vector<long>>() (elements are converted as necessary). Seems pretty handy.

      • Glad it helps. Yes, I always wondered why you cannot generate a container from a view directly, since this has two good properties:

        - It continues the pipe flow with left to right reading.
        - Can use auto on the left hand side to deduce.

        BTW, where can I find how to markup posts?

    • Ha. Depends. When using the container algorithms presented here, you’ll get an error like “unable to find overloaded operator| with arguments <container-type-here> and ranges::cont::sort“, or some such. Then you’ll probably get a list of operator| overloads, and why they failed to match. Not great, but not horrible, either.

      With the range views, you’ll get the same error, but instead of “<container-type-here>”, you’ll get the type of the range at that point in the pipeline. Since range views nest, the further into the pipe you are, the more complicated the type. This can look pretty fugly, especially if the pipe expression involves lambdas. There’s not much I can do about that.

      On the up-side, there shouldn’t be any of the awful template instantiation backtraces we’ve come to expect from heavily template-metaprogrammed libraries. The reason is because I am making extensive use of a library to emulate Concepts Lite. All the type parameters are verified at the API boundary, so you shouldn’t (often) see errors from deep within the library’s guts.

  5. First of all, thank you a lot, Eric, for the great work you’re doing. Secondly, it’s probably a silly idea, but anyway… If, as far as I understand, from the user’s perspective the biggest difference between view:: and cont:: is that the first one is lazy and the second one is eager, maybe they could be named so, that is, lazy:: and eager:: ? At least, it should make the difference between similarly named algorithms like lazy::transform and eager::transform immediately apparent.

    • It’s something I’ve seriously considered. It definitely has its merits to name them lazy and eager. I hate parting with view though, because that effectively conveys “non-owning”. Worth considering, though.

      • I think lazy is the best name I’ve seen suggested. It’s technically correct (unlike cont), and actually means something specific (unlike touch/direct).

        In my mind lazy also means non-owning. I mean, owning a bunch of stuff takes work (to paraphrase Biggie Smalls).

        I would only hope that the names also make it explicit whether the algorithms work in-place or return a copy or a view of the input. As an end-user of libraries, having to look up what’s in-place and what makes a copy is really annoying.

        Oh and stuff like std::unique needing the user to explicitly erase the now-garbage end of the container is ridiculous. Thanks for moving in a direction where there is less of that.

        • Oh man. That just goes to show how important it is to get the names right. cont:: means “eager”. view:: means “lazy”. You got it completely backwards. Clearly, cont:: is confusing the issue. 🙂

          • lazy was the first word that sprang to my mind and now having seen others use it it’ll get my vote too. With some Scala knowledge the term lazy also implies the infinite stream possibility where ownership isn’t really viable. Then again, said background is making me read this with functional, i.e. immutable, usage in mind.

      • What about

        lazy::view::transform (or just view::transform for short).

        and

        eager::mutable::transform (or just mut::transform for short).

        Maybe having a top level namespace lazy and eager, would come in handy if this will ever be extended to such things as eager::view::transform or lazy::mutable::transform. Not sure those make any sense though.

        Cheers

  6. It’s something I’ve seriously considered. It definitely has its merits to name them lazy and eager. I hate parting with view though, because that effectively conveys “non-owning”. Worth considering, though.

    Sorry Eric, if you can, delete the comment before, no context there.

    view denotes non-owning, but cont are not container algorithms. Why should you then stick to something that means nonowning, when we already know that none of them mean ownership? lazy vs eager is a more important property in this case. There is also something strange about l vs r-valueness I think -> views work on lvalues. cont works on rvalues.

    • view denotes non-owning, but cont are not container algorithms. Why should you then stick to something that means nonowning, when we already know that none of them mean ownership?

      How can, in the space of two sentences, you say “view denotes non-owning,” and then, “none of them mean ownership”? view denotes non-owning. cont doesn’t.

      There is also something strange about l vs r-valueness I think -> views work on lvalues. cont works on rvalues.

      No. views do not require lvalues. They require lvalue containers, but they accept rvalue ranges. The have to; otherwise, you couldn’t pipeline them. In “v | view::foo | view::bar“, the result of “v | view::foo” is an rvalue, but view::bar accepts it because it is non-owning.

      Eager vs. lazy is interesting. Owning vs. non-owning is interesting. Value category isn’t interesting, except insofar as it’s needed to avoid memory and performance errors.

      • How can, in the space of two sentences, you say “view denotes non-owning,” and then, “none of them mean ownership”? view denotes non-owning. cont doesn’t.

        Sorry I didn’t explain clearly there. I mean: view denotes non-owning. But eager vs lazy don’t denote ownership. You are naming things related to ownership, not lazyness/eagerness. That was my point.

  7. I think these algorithms can be called “immediate”, since they are immediate both in time and space: they work exactly upon call and operate directly on the provided data.
    But the word itself is too long to type as a namespace.
    Or maybe “direct”?

  8. For someone who does not have deep knowledge of ranges, this is quite confusing. Couldn’t the difference be made simply by looking at the lvalue/rvalue-nes of the container? As I understand it, view algorithms only support lvalue containers and container algorithms only support rvalue container. Couldn’t the interface for view and container algorithms be unified and make the view/container distinction deduced?

    ints = ints | sort | unique; // view algorithms are used
    

    vs

    ints = move(ints) | sort | unique; // container algorithms are used
    

    • Lvalue/rvalue isn’t the interesting distinction. Eager vs. lazy is the axis of interest. You should have a way to say: I want to do a chain of operations, but I only want to make one pass over the data, and I don’t want to mutate the underlying elements. Or: I want to mutate these elements, and I want the operations to take effect immediately. They are both useful. They have different requirements on their input.

      Anything in the view namespace is lazy, lightweight, and non-owning. Since it’s non-owning, they refuse to operate when nobody has claimed ownership of the elements.

      Anything in the cont namespace is eager. They move whole containers in and out, so you have to move a container in.

      Reference semantics in C++ can be tricky because of the lifetime issues. A range abstraction has to walk a fine line: avoid memory errors by preventing dangling references, while also preventing performance errors by copying too liberally, while also giving people both the eager and lazy semantics they want, in a world where some sequences own their elements and others don’t. It’s harder than it looks. So hard, in fact, that it’s taken this long for a serious range proposal to make it this far.

      I hope this has clarified the issue a bit.

      • I understand that the distinction between eager and lazy is not the same as the distinction between rvalue and lvalue.

        My point is that since eager requires an rvalue and lazy requires an lvalue (if I understood correctly), can you demonstrate a concrete example where it would be wrong to implicitly deduce an lvalue as ‘lazy’ and an rvalue as ‘eager’?

        From what I understand, what you propose results in this :

        std::vector&lt;int&gt; ints = read_ints() | view::sort; // Error : Cannot use view on rvalue to avoid dangling references
        std::vector&lt;int&gt; ints = read_ints() | cont::sort; // OK

        std::vector&lt;int&gt; ints2 = ints | cont::sort; // Error : Cannot use cont on lvalue
        std::vector&lt;int&gt; ints2 = ints | view::sort; // OK

        From a user perspective, having to explicitly specify the view:: and cont:: namespaces (and choosing between the two) seems redundant. Maybe just defining those two namespaces as inline could do the job?

        • My point is that since eager requires an rvalue and lazy requires an lvalue (if I understood correctly)

          No, I should have been clearer. Lazy does not require lvalue. In fact, it can’t. Consider a lazy pipeline:

          auto rng = v | view::transform(...)
                       | view::filter(...);
          

          The result of v | view::transform(...) is necessarily an rvalue. The filter view accepts it because it is a (non-owning) range. But, assuming v is a vector, had you done “move(v) | view::transform(...)“, that’s a potential memory error, so it must be rejected.

          can you demonstrate a concrete example where it would be wrong to implicitly deduce an lvalue as ‘lazy’ and an rvalue as ‘eager’?

          From a pure library-design perspective, deciding something as fundamental as eager vs. lazy, value vs. reference based on something rather esoteric like an expression’s value category is likely to be more confusing rather than less. Performance characteristics and memory safety shouldn’t be affected by trivial source code transformations, like assigning the result of an intermediate computation to a named variable. That is, making both of these compile but do radically different things, and with different safety/performance implications, is dangerous:

          auto rng0 = foo() | filter(...);</p>
          
          <p>auto tmp = foo();
          auto rng1 = tmp | filter(...);
          

          Large expressions get broken up into smaller ones frequently during code refactorizations. If a simple code transformation like that fundamentally changes the meaning of your program — in a way that can’t be diagnosed at compile-time — that’s going to be a ripe source of bugs.


          • No, I should have been clearer. Lazy does not require lvalue. In fact, it can’t. Consider a lazy pipeline:

            The result of v | view::transform(…) is necessarily an rvalue.

            Uh, yes; I was talking about container lvalue (I know that view objects can perfectly be rvalue).


            Large expressions get broken up into smaller ones frequently during code refactorizations. If a simple code transformation like that fundamentally changes the meaning of your program — in a way that can’t be diagnosed at compile-time — that’s going to be a ripe source of bugs.

            Good point. It is the same problem that all libraries making use of expression templates have with auto. Ah well, I guess that until we can override operator auto (I think there was a proposal for it), it is probably wiser to do as you propose and require it to be explicit.

      • Very excited about this, great work.

        On naming, you say “they are both useful” referring to eager vs lazy, but in the description you name what seems to be two axes of interest — perhaps this is making naming harder?

        Axes of interest:
        1. “View”, lazy, has bonus of single pass v’s “Eager”, immediate, applies operation one pass per pipe.
        2. “View”, doesn’t modify underlying data v’s “Mutating”, inplace, modifies original

        The name “view” matches well with both lazy/eager axis and the mutability axis, but while “eager” says immediate, eager doesn’t really say inplace/mutating, and the opposite of “inplace/mutating” don’t really say immediate/multi-pass.

        Also, are there really four kinds we are looking at?
        1. Lazy, unmutating
        2. Lazy, mutating
        3. Eager, unmutating
        4. Eager, mutating

  9. Here are a few more ideas for an alternate name to “container” algorithms (in order of my preference):

    prompt
    live
    instant
    greedy

    I don’t like “greedy”, but it is often used for such purposes.

    And for a few laughs, I offer ‘voracious’! 😉

    • And for a few laughs, I offer ‘voracious’! 😉

      LOL. So far, ‘eager::‘ is winning in the battle for my heart and mind. It’s up for grabs whether I’ll change ‘view::‘ to ‘lazy::‘ though.

      Naming Is Hard™.

      • Some random musings:

        If the current use of view/views is as a noun, then sticking with a noun perhaps something along the line of mutation(which appears in the quickstart doc description), mutant(shorter:-)), product, etc. If view is used as a verb, then terms like copy, move, apply, transcribe, etc., seem more appropriate, IMHO of course.

        Mixing view (noun or verb) with an adjective like eager doesn’t quite sit well with me.

        Backing up a step or two: maybe the ‘things’ named filter, transform, sort, take, etc just embody filteredness, transformedness… and what needs to be different is the binary operator that feeds the result of the left hand side to the input of the right hand side. So the ‘>>’ below causes the realization of the container:

        using namespace view_or_cont;
        auto my_new_seq
        = some_generated_view
        | transform(…)
        &gt;&gt; sort
        &gt;&gt; unique
        | transform(…);

        Not sure about precedence issues here.

        Another thought I’ve had is that views & ranges (in the boost::range sense) have always focused on the data-source side. That is, we pass views/ranges eventually to algorithms which typically have an output_iterator to write into. Maybe that ‘source-ness’ needs to be explicitly acknowledged, and there needs to be some symmetry on the data-sink side.

        Anyway you’ve done outstanding work here Eric! Just wanted to throw out some ideas to aid in brainstorming.

        Jeff

        • Mixing view (noun or verb) with an adjective like eager doesn’t quite sit well with me.

          Yeah, me neither. Need to think more about your comment, but an idea popped into my head while reading it. The container algorithms are immediate, and the noun “action” seems to convey that. So, views and actions?

          • Well, if one is a “view”, then maybe the other can be a “site” (or “spot”, but that sounds like a dog’s name 🙂 ).

            So, some algorithms are performed “on view”, others “on site”.

          • “action” still seems a little vague to me.

            The noun “yield” comes closest to the concept of realizing what is being viewed.

            Grasping for other analogies, in computer graphics a view of a scene is realized by being rendered.

            Maybe view/rendition using nouns?

      • So far I think that is the best naming, but I am not sure is the absolute best we could find. Indeed, the fundamental property is lazy vs eager. Ownership, as I see it, is not that important.

  10. Delighted to hear that ranges got a good reception in Urbana.

    I’m not totally sold on these container algorithms yet. Can I attempt to defend the current way of doing things?

    Much of the ugliness in the original example is due to the std::unique algorithm, which requires a separate erase step. We could easily write a single function which did both things – indeed N4161 proposes just such a function for the similar case of std::remove / remove_if. The example would then reduce to something like:

    auto ints = read_ints();
    std::sort(ints);
    std::erase_duplicates(ints);

    Not so bad now.

    OK I can see benefits in being able to do this all on one line. It’s more functional; it could avoid the need to name a temporary object sometimes; it would allow the ints variable to be made const. Still the old style has advantages: chiefly, the semantics are immediately apparent. You don’t need to ask whether the operations are lazy or eager here: they have to be eager.

    I suppose my worry would be about the teachability of having two different “modes” for operator |. The rules for the lazily-evaluated range adapters, whether they can be applied to lvales or rvalues etc., are pretty complicated already. Now we’re going to add a new type of operation with different rules, and which gives you subtly different semantics when it is applied. Maybe it’s worth it but I’m not sure yet. I think there’s a lot of benefit in having operator | always mean lazy evaluation, and if you’ve got a rvalue container you’ll just make a temporary.

    • OK I can see benefits in being able to do this all on one line. It’s more functional; it could avoid the need to name a temporary object sometimes; it would allow the ints variable to be made const. Still the old style has advantages: chiefly, the semantics are immediately apparent. You don’t need to ask whether the operations are lazy or eager here: they have to be eager.

      I don’t think it is very difficult to teach that as long as things are namespaced. You have 3 sets of things, you can teach them one by one.

  11. Hi,

    I’ve not commented on your blog before, but I’ve been following it for a while, as well as your ranges work, with a lot of excitement.

    On the naming issue — What about distinguishing between the lazy and eager functions / algorithms by using the past participle form for the former? That is, view::transformed versus cont::transform. I think this reads a little better, esp. as it lets one use using directives for both forms. (Well, except for unique where there isn’t a convincing past tense — uniquified? Nah.)

    Then maybe use “as” instead of “view” and something like “do” or “eval” or “run” instead of “cont”. So, as::sorted versus eval::sort.

    Or alternately, use past participle for both forms but use “make” for “cont” — as::transformed versus make::transformed. Though it loses the ability to use using directives for both forms in one place. Just to throw out some ideas; feel free to use any or none.

    Best regards,
    Kevin

    • Boost.Range uses the past participle. But as you’ve noted, it doesn’t generalize well. Nobody wants “uniqued“. 😉 Thanks for the other suggestions. I’ll throw them in the pot with the rest and stew on it.

  12. You should have a way to say: I want to mutate these elements, and I want the operations to take effect immediately.

    Well, that sounds exactly like the behaviour of the already existing std:: algorithms. Would it be too surprising to overload these algorithms, so that you could just write:

    std::unique(std::sort(bigvec));

    In my opinion, the intent is very clear this way, and you can keep “view” for the lazy algorithms.

    • I have to say I have to agree that the pipe syntax should be kept for views only.

      For eager algorithms, if we get unified call syntax (N4165, N4174), it becomes:

      bigvec.sort().unique();

    • Many of the standard algorithms already have interesting and useful return values. To say, when passed an Iterable std::foo returns the Iterable for composability, but when passed two iterators it returns other useful information and doesn’t compose — it’s just weird. There’s no symmetry or consistency in that interface.

      In my proposal, the std:: algorithms all get overloads that accept Iterables, but they have exactly the same semantics — and return types — as the existing algorithms. I think that’s the right thing to do.

      These composable algorithms sometimes throw away useful information by returning the modified Iterable. The up-side is that they compose well. It’s a trade-off. They are provided in addition to the existing std:: algorithms so that users have a choice.

  13. Hello Eric,

    I am happy that you propose to offer container algorithms which mirror the views. I don’t see your zoo of algorithm classes (container/view) to be complete, though. Here is my list:

    1) holds reference to lvalue, lazy, output may have different type than input (view::XXX)
    2) aggregates rvalue, lazy, output may have different type than input (not supported)
    3) passes through rvalue, eager, output type same as input (cont::XXX)
    4) operates in-place on lvalue, eager, output type same as input (cont::XXX)
    5) operates eagerly on rvalue or lvalue, writes result into different container type

    Not all algorithms support all modes of operation. For example,

    transform supports (1), (2), and (5) and with restricted functionality (output type=input type) (3) and (4)
    filter supports all five
    sort supports (3), (4) and (5)

    5 classes is many. It would be nice to cut them down a bit. Let’s see:

    (2) can be derived generically from (1) by incorporating a generic rvalue keeper into the view implementations.
    (4) can be derived from (3) also generically.
    (5) can probably and with reasonable performance be obtained from (1)+a generic container maker (make_vector, etc.)

    This leaves us with two core algorithms to implement, namely the in-place eager ones (3) and the lazy ones (1).

    The caller, though, should be able to choose between (1) through (4), with (5) IMO done with a helper function.

    I would leave the naming up for discussion. If you feel like (1) and (2) should be distinguishable at the call site, I am fine with that. In practice, I don’t see it being a use case that you generically handle rvalues and lvalues with the same code. So you can decide whether you want to call (1) or (2). But I think this is no reason not to support (2) at all.

    Arno

    • Arno, it already is precisely as you describe, with the exception that (2) is not supported intentionally. That's because in my model, the views are always non-owning. (I feel you and I have had this conversation before.)

      (3) and (4) are both automatically generated from the same implementation. (5) is supported as of a few days ago with view::to_, which can be used as follows:

      auto v = view::ints
       | view::transform(...)
       | view::to_vector // HERE
       | cont::sort;
      

      More generally, view::to_ can be used like: view::to_<std::vector>() or view::to_<std::vector<long>>(). In the former case, the element type is deduced. In the later, it’s coerced.

      • I know that you feel strongly that views are non-owning. I know that you want people to have certain expectations about how views work, namely, that they hold references.

        I, on the other hand, just see that the “owning view” is a practical use case, because transforms, for example, cannot always be executed in-place at all due to different output types.

        I don’t think our two views are incompatible. You want the caller to get a lightweight “view” if she wants it, and not get an unexpectedly heavy aggregate. I want the aggregate to be available, because I think it is a useful thing to have, but don’t insist on calling it a view.

        Arno

          • It’s true to_vector isn’t very view-like, but not for the reason you suggest. All views operate on Iterables.

            EDIT: I feel like I should clarify, since this is an important point. The “view” in “view::” doesn’t mean that the input should be a view. It means that the output — the result — is a view. In that sense, you’re absolutely right. to_vector emphatically does not belong in the view:: namespace. Thanks for bringing my attention to that.

  14. The approach you have taken with container algorithms reminds me of linear types used in some functional programming languages. Pure functional language may allow mutable state as long as the mutation is observed only locally. So, there is a concept of linear value – value that can be used only once, that restriction allows destructive update under the hood, because no one sees the value being updated.

    To me that concept seems to be very close to container algorithms. When you call move on vector first time (or std::ref), you create a linear value, container algorithm takes that value and produces new linear value that can be used in next container algorithm.

    From such perspective it is quite reasonable to have sharp distinction between linear vs non-linear values. Maybe, it is better to have special function ( std::local_value ?) for creating linear value.

    • This is an interesting observation. Of course, there is already a special function for creating a linear value in this case: std::ref. That’s how you tell the compiler, “No really, I’m telling you it’s safe. Go ahead and mutate this object.”

  15. Hello,

    I’ve followed your blog about ranges and its very exciting how far this got.

    I’m not a native english speaker, but as i read the whole comments one thing from Qt and other Frameworks came to my mind.

    Model/View(/Controller).

    The model is the underlying data structure and the view is how you look on them (lazy non mutating as far as you dont edit but thats not the point here).

    so “model::transform” and “view::transform” are a good way for my thinking of the topic for a noun as the namespace. And i think it also is clear that a “model::algorithm” mutates the data structure (container).

    It may be an option 😉

    Greetings

    • That’s not quote the right mental model (no pun intended). Calling these algorithms “model” algorithms is wrong for the same reason calling them “container” algorithms is wrong. It’s not about who owns the data. It’s about whether the work is done eagerly or lazily.

      • I got your point on the owning thing, but then i have a problem with “view” also. Then I think we could not use a noun at all.

        But my point wasn’t about the owning of the data anyway.
        If you do your work eagerly you have to work on the owning data anyway even if you have some ranges or lazy work in between. Or do i get something wrong?

        auto v = view::ints
         | view::transform(...)
         | view::to_vector // HERE
         | cont::sort;
        

        “view::to_vector” here can not work lazy, can he?!

      • struct A {
          int n;
          std::string str;
        };
        
        std::vector<A> veca;
        

        Sort veca by order of member A::n:

        sort( view::transform( veca, std::mem_fn(&A::n) ) );
        

        The same thing applies to other algorithms swapping elements, like erase.

        At think-cell, we currently use

        sort( veca, compare_less( std::mem_fn(&A::n) ) );
        

        to express the same thing, but the inconsistency of using view::transform for algorithms like find, but not for sort, is unsatisfying.

        • sort( view::transform( veca,
                    std::mem_fn(&A::n) ) );
          

          This doesn’t sort the vector. It sorts all the A::n members, leaving the A::str members alone. It’s pretty meaningless.

          If you’re suggesting to change the meaning of iter_swap for transform view iterators, then the answer is no. iter_swap is an important primitive. Changing its semantics is sure to have nasty side-effects. I’m not even sure how one would overload it to consistently give the behavior you want. If you transform with a lambda instead of std::mem_fn, then iter_swap would have no way of knowing that a struct was being sliced. To have your code do something different after replacing a call to std::mem_fn with the equivalent lambda is surprising in the extreme.

          Also, maybe the transform is using the values of one sequence as keys into another sequence. And maybe the user is trying to sort that sequence. And why shouldn’t they? No, making transform view iterators magic with respect to iter_swap is not workable.

          Instead, range-v3 gives all the algorithms “projection” arguments to handle this very common case. You use it like this:

          sort( veca, std::less<>(), &A::n );
          
          • I do not quite get “if you transform with lambda…” part. Why would iter_swap need to know that something is sliced?
            Iter_swap for iterators of views only needs to swap iterators of underlying container, it does not need to know anything.

            And IF we allow views to be sorted, that’s the only approach that we can take.

            Iter_swap overloading was done in p-stade.oven (if I remember correctly). That does not help in the current standard, however, as std::sort is not required to use iter_swap.

            Similar appoach is used in range library developed by Andrei Alexandrescu for D language. At least it allows sorting of concatenated containers.

          • I do not quite get “if you transform with lambda…” part. Why would iter_swap need to know that something is sliced? Iter_swap for iterators of views only needs to swap iterators of underlying container, it does not need to know anything.

            OK, but why do we want to treat iterators of view specially? iter_swap in the standard is defined to be: swap(*a,*b). If we are to do something else, how are we to define it?

            And IF we allow views to be sorted, that’s the only approach that we can take.

            What makes you think that?

            Similar appoach is used in range library developed by Andrei Alexandrescu for D language. At least it allows sorting of concatenated containers.

            You can sort concatenated containers just fine, with no iter_swap tricks. I just tried it. Works just fine.

          • All you would need to do is to require std::swap to use std::iter_swap (very reasonable, I think) and define transform_range::iterator’s std::iter_swap to swap the elements of the base sequence. Then it does not matter whether you use a lambda or std::mem_fn. Not that crazy, I think.

            If you want to stick to the projection approach, I am not sure you need an extra projection parameter. A simple wrapper a la

            sort( vec, compare_less( &A::n ) );

            does not require more typing than

            sort( vec, std:less(), &A::n );

            but gives you more orthogonality: you don’t need an extra overload for each algorithm, just one wrapper.

          • All you would need to do is to require std::swap to use std::iter_swap (very reasonable, I think)

            How would that work? There are lots of things that are swappable that are not iter_swappable.

            define transform_range::iterator’s std::iter_swap to swap the elements of the base sequence

            Unconditionally? What about the scenario I presented above:

            Also, maybe the transform is using the values of one sequence as keys into another sequence. And maybe the user is trying to sort that sequence. And why shouldn’t they?

            A simple wrapper a la

            sort( vec, compare_less( &A::n ) );
            

            does not require more typing than

            sort( vec, std:less(), &A::n );
            

            By my count, it does. 😉 But that’s not the point. If you need compare_less, you also need compare_greater, compare_less_equal, etc. etc. And each one is a combination of one the function objects in <functional> and std::mem_fn (or std::bind). That isn’t an increase in orthogonality. It’s a decrease.

            [With a wrapper instead of a projection] you don’t need an extra overload for each algorithm, just one wrapper.

            You don’t need any extra overloads when using projections:

            template<typename I,
                     typename S,
                     typename C = ordered_less,
                     typename P = ident>
            I sort(
                I begin,
                S end,
                C pred = C{},
                P proj = P{})
            
  16. OK, but why do we want to treat iterators of view specially? iter_swap in the standard is defined to be: swap(*a,*b)

    I do not think that in current standard iter_swap is a really important primitive. It’s only mentioned a few times and most of STL algorithms are not required to use it. So, it should be possible to redefine it.

    Anyway, when STL was standartized there were no iterator adaptors or ranges so no one thought of the need to customize behavior of iter_swap for them. And in my opinion it should be custom. Iterators of ranges do not point directly to elements, they just adapt other iterators. Ranges do not have the elements, they may synthesize them on the fly (or get from some cache), so there’s no point of swapping range iterators the way iter_swap does now.

    And if changing the meaning of iter_swap it too much of a breaking change, it is still possible to have another function (indirect_swap?), that iterators may customize.

    How are we to define it?

    In order to define iter_swap we may introduce a notion of an underlying value of iterator. For an iterator of container its underlying value is just the value that it points to. For an iterator that adapts some other iterator its underlying value is underlying value of an inner iterator (That’s not the part of the definition, however. Iterator may free to specify something else as its underlying value.) Underlying value of iterator is returned by some function (let’s call it uvalue) found via ADL. Return type of uvalue is not required to be a reference type. It might be some reference-like type. Iter_swap should swap underlying values of its parameteres.

    <

    pre>
    // it’s just a reminder that ADL is involved
    using std::swap; using std::uvalue;

    auto&& f = uvalue(a);
    auto&& s = uvalue(b);
    swap(f,s);

    <

    pre>

    It is true that elements of concatenated containers might be sorted now. But the approach proposed above allows sorting zipped containers, transformed containers (with sane semantics).

    About the example you mentioned. As far as I understood there are two containers, one containing keys and the other that maps keys to values. And user wants to sort the second container. But usually the container that plays the role of map is either already sorted (boost::flat_map, std::map) or not sortable at all (unordered_map). And if not then user has constructed range that has to perform linear search every time element is accessed. Rather unlikely situation.

  17. Do not read the part about your example. I was not thinking. If user wants to sort values of (multi_)map for some keys it still can be done. We just need some iterator adaptor that has uvalue(it) defined as *it.

    • This feels like adding a hack to undo your earlier hack. Why not go the other way? When you want to sort the underlying sequence, define an adaptor that returns the underlying values, by making *it return the result of *it.base()? (All the adapted views have iterators with a base() member.) That way, you always know what is getting sorted.

      The algorithms should not treat the iterators of adapted views specially. When you sort the range [first,last), you sort the elements in the range denoted by first and last. This is how the STL works. I still see no reason to change it.

      Sorting a transform view makes no sense. You can sort the underlying sequence, but that is no guarantee that the elements of the transform view are sorted. What does sort mean if this isn’t guaranteed to be true:

      std::sort(rng);
      assert(std::is_sorted(rng));
      

      ???

      I can’t think of a use for sorting a zip view. But there’s nothing wrong in theory with doing it. A zip of a vector<int> and a vector<short> has tuple<int &, short &> as its reference type. These actually can be sorted, once I change swap to handle rvalues. (That’s on my TODO list since the Palo Alto proposal allows proxy random-access iterators.)

      How do you propose to implement iter_swap for a zip view? What is the underlying sequence that gets sorted?

  18. The idea was to make sorting a transformed view to produce the same result as sorting underlying sequence with projection. That is, during the process of sorting we swap underlying values, but comparison is performed for transformed values.

    And in that case is_sorted(rng) should return true after sort. (Under assumption that sequence was tranformed with pure function. Then I do not see how is_sorted may return something else than true).

    For zip_view of containers of ints and shorts underlying value should be tuple<int&, short&> if swap works correctly on those, so the result would be the same.

    • Let’s say we have this:

      std::vector<int> v{5,4,3,2,1,0};
      auto rng = v | view::transform([](int i) { return i%2==0 ? i : -i; });
      

      We both agree that when we traverse rng, we see the elements [-5,4,-3,2,-1,0], right? Now, you want to be able to sort rng, and have it sort the underlying sequence:

      sort(rng);
      

      If we allowed that, v would contain [0,1,2,3,4,5], and when we iterate rng, we see the elements [0,-1,2,-3,4,-5]. Right?

      So, is the sequence [0,-1,2,-3,4,-5] sorted? Of course not.

      You say you would like is_sorted to check if the underlying sequence is sorted. But that’s not useful. If I am writing generic code that accepts a range of elements and does a bulk insert into a database (let’s say), I need to know that the elements in the sequence are sorted. I shouldn’t have to worry about whether the range I’m passed is actually a transform view, and is_sorted is going to lie to me. I want to know that I’m not going to corrupt my database by bulk inserting non-sorted data.

      Please, please desist from this iter_swap madness. It’s a terrible idea.

  19. Basically we do not sort underlying sequence, we sort range, changing the order of elements of underlying sequence. That might be bad idea, but it’s not that bad as you describe.

  20. All you would need to do is to require std::swap to use std::iter_swap (very reasonable, I think)

    Sorry, I meant std::sort of course.

    Re. projection, you may not need overloads, but you have to implement the projection in each algorithm. compare_less/compare_greater you are going to implement twice, that’s it.

    When designing a library, I think that for every functionality which can be decomposed into smaller functionality, which is useful by itself, the smaller functionality should be offered as well. This is clearly the case for compare_less/compare_greater, which can be used in all places where comparators are used and where projections are currently not offered. std::set comes to mind, but also countless user-defined functions, where the original implementer did not bother to offer projection.

    In fact, compare_less/greater can in turn be implemented in terms of transform_args, which, unlike bind, transforms every argument of a function the same way.

    • Re. projection, you may not need overloads, but you have to implement the projection in each algorithm.

      Support for projections falls out nicely if you are supporting invokables.

      When designing a library, I think that for every functionality which can be decomposed into smaller functionality, which is useful by itself, the smaller functionality should be offered as well.

      I agree, which is why compare_less should be decomposed into less and bind. In fact, it’s already done. 🙂

      One of the reasons why it’s desirable to keep the projection and the comparison separate for the sake of (potentially) asymmetric comparisons, like that done by lower_bound. Consider this example, that I got from Sean Parent that sorts a sequence by a member, and then uses lower_bound to find an element:

      std::sort(
        a,
        [](const employee& x,
           const employee& y)
        { return x.last < y.last; });
      auto p = std::lower_bound(
        a,
        "Parent",
        [](const employee& x,
           const string& y)
        { return x.last < y; });
      

      Notice how, without projections, you had to write the comparison twice in two different ways, since the comparison for sort needs to be symmetric, but the one for lower_bound needs to be asymmetric. With projections, it looks like this:

      std::sort(
        a,
        std::less<>(),
        &employee::last);
      auto p = std::lower_bound(
        a,
        "Parent",
        std::less<>(),
        &employee::last);
      

      With projections, we gain symmetry. What does this example look like with compare_less?

      • compare_less:
        actually, it decomposes into std::less and transform_args, which transforms both arguments the same way, which bind doesn’t do.

        Re. lower_bound:
        We write these cases as

        lower_bound( transform(rng,func), x).base()

        The asymmetry with

        sort( rng, compare_less(func) )

        irks me, but not enough to require changing all functions taking a comparator to taking a comparator and a projection. A (user-defined) algorithm taking only a comparator is a fine algorithm. I’d like to have the tools to use it with projections, even if the implementor did not build the projection capability into his implementation.

  21. I’m still on the fence about this new strong divide beetwen view/action. On the plus side, it’s much clearer now what is lazy and what is eager.

    But on the other hand :
    1) It introduces a whole new class of error, if user try to pipe view and action together, or try to pipe a container into a view etc. The compiler errors generated by those kind of mistake are baffling. (see later)
    2) Seem less powerful overall (?) : I remember that 6 months ago nearly all algorithms were pipeable. Now only a few selected views and actions are pipeable.

    To see how far we can go with range-v3, I tried to port this D article over to to C++ :
    http://wiki.dlang.org/Component_programming_with_ranges
    My implementation is here :
    https://github.com/Arzar/RangeFunV3
    I love this article because it show the whole zoo of rangy things, like complicated view such as groupBy view (so a view that return view), lot of range piping, range chaining etc. But it also solve a practical case : How to display a calendar. And a calendar is an human-made concept with lots of little quirks, so It feel closer to day-to-day programming, when you often have to fiddle with imperfect things, than all those pure mathematical examples that functional circles seems so fond of…

    Anyway I found it very hard to port it to C++ with range-v3. Here are some difficulty I encountered :

    1) One of the biggest pain point, how to register new algo/view/action into the pipe system.
    In D, it’s almost magical, you just have to add empty(), front() and popFront() to any classes, then your class is considered a range and get automatically plugged to the whole range ecosystem (all algorithm + range adaptor). I do think most C++ folks underestimate how wonderfully composable D range are.

    It’s much more difficult in range-v3. For example I tried to code a pythonic “join” algorithm and make it pipeable. But join return a value (an accumulation over the range) and I didn’t figure out how to have a pipeable entity that return a value. In the current range-v3 there is only “view” that take a range by rvalue reference and return a view, or “action” that take a range by reference and return void. So I ended up coding an half-assed operator| for my join algo.

    2) Compile error when using view/action.
    6 months ago, I remember using all the time ranges::for_each as some kind of printf for range. So with the current range-v3, first thing I did was to tried something similar :

    std::vector&amp;lt;int&amp;gt; v = {1, 2, 3, 4, 5};
    v | view::for_each([](int i){std::cout &amp;lt;&amp;lt; i &amp;lt;&amp;lt; std::endl;});

    Well it don’t compile, and the compiler errors are utterly baffling. There are lots of mention of “transform_view”, “flatten_view”, “view::all” in the compiler backtrace that doesn’t even seem related to for_each view. Also there is a lot of concept check in the backtrace but they are quite useless and even a bit misleading. For example this one :

    CONCEPT_ASSERT(Iterable&amp;lt;concepts::Invokable::result_t&amp;lt;F, range_value_t&amp;lt;Rng&amp;gt;&amp;gt;&amp;gt;());

    It confused me a lot at first, because I thought that Rng = std::vector and F is the lamba, so all seems good. I realized later that at this point we are already quite deep into the library and Rng is actually a really complicated type, something like transform_view<flatten_view<std::vector&, lambda(int)>>. So the concept check fail but the type is too complicated for me to understand why.

    Now that range-v3 clearly distinguish view from action, I wonder if you could enhance the library to detect and catch thoses typical mistake early and produce some user-friendly error message ?
    “You can’t pipe a view after a container”
    “You can’t pipe a view after an action”
    “You can’t pipe an action after a view, suggestion : pipe a to_vector between them”

    3) Mysterious compile error. I wrote a forward iterator with boost iterator (in day.hpp, day_iterator). When using it with range-v3 views/algo I got a lot of weird compiler errors in the internals of the libray.

    v3/size.hpp return size(detail::forward(rng));
    /utility/basic_iterator.hpp return range_access::current(pos());

    After lot of effort I found that the compiler error went away when adding a to_distance() function inside my iterator. I don’t know why.

    • Thanks for taking the time to give all this feedback. It’s very valuable that i know the sorts of problems people are running into. I’ll investigate and get back to you. If you send me an email, I’ll follow up with you personally once i have answers for you.

      EDIT 1: I’m not ready to address your whole comment yet (it probably deserves its own blog post), but I can address parts. First, this:

      6 months ago, I remember using all the time ranges::for_each as some kind of printf for range. So with the current range-v3, first thing I did was to tried something similar :

      std::vector<int> v = {1, 2, 3, 4, 5};
      v | view::for_each([](int i)
      {
          std::cout << i << std::endl;
      });
      

      Well, it seems you’ve mixed up the algorithms (eager) and the views (lazy). It’s true, 6 months ago, the algorithms were pipeable this way. Now the pipes are for views and actions, and the ordinary algorithms are not pipeable. (It was an experiment that was doomed to failure– the algorithms don’t have return types that lend themselves to chaining.) If you want to use ranges::for_each for printf, you still can:

      ranges::for_each(v, [](int i)
      {
           std::cout << i << std::endl;
      });
      

      Even better, all ranges in range-v3 now have ostream inserters, so you can write it like this:

      std::cout << view::all(v) << std::endl;
      // prints: [1,2,3,4,5]
      

      If you’re wondering what view::for_each is for, check out my blog post on Range Comprehensions.

      Your point about bad compiler errors is well-taken, though. I am trying to trade off a few things, and I may not have the balance quite right. See, I’m emulating Concepts Lite using SFINAE in function signatures. That means that when a concept check fails, you get a rather unhelpful error like “no overload found that matches [complex-type-1] and [complex-type-2]. Here are the 50 overloads that failed to match and why. [compiler vomit]” That’s not friendly.

      Instead, I could move more concept checks inside the functions. That way, the call will resolve but then generate an error that is more likely to be helpful, like a static assertion that a type is copyable, or that two types are convertible, or whatever. This is hard and takes a lot of thought, though.

      I do take bad compiler errors seriously, though. If you file a bug, I’ll address it.

      EDIT 2:

      1) One of the biggest pain point, how to register new algo/view/action into the pipe system. In D, it’s almost magical, you just have to add empty(), front() and popFront() to any classes

      You’ve seen the c_string_range example from the Quick Start, right? With range_facade, you can implement ranges with done(), current(), and next() members. It’s not hard.

      For example I tried to code a pythonic “join” algorithm and make it pipeable. […] So I ended up coding an half-assed operator| for my join algo.

      You’re having difficulty because you’re fighting the idioms of the library. You’re trying to implement an algorithm, not a view or an action. (It looks like an intersperse followed by a fold, in Haskell-ese.) Your algorithm computes a value instead of producing another iterable. Both views and actions produce iterables, which is why chaining makes sense for them. It’s doesn’t make sense for algorithms that are simply computing a value.

      I would implement an intersperse view and use it to intersperse whitespace into a range of strings, and then pass the resulting view to ranges::accumulate, which would concatenate all the strings together.

      This calendar example is a treasure trove, by the way. Thanks for pointing me to it. One of the other problems you’re having with range-v3 is that it’s still evolving. Your job would have been much easier if range-v3 already had intersperse and split (aka D’s chunkBy). This is pretty basic functionality that I still haven’t gotten around to implementing. So much to do.

    • Do you aggregate rvalue ranges in range adaptors?

      Certainly. Otherwise, view chaining would be broken.

      What about lvalue ranges?

      They are aggregated by value, which is how it should be IMO. They are proxies; there’s no reason to hold them by reference.

          • I distinguish between ranges and containers. Ranges are always aggregated by value, containers only if they are rvalues (and I wouldn’t mind if this behavior would be available only explicitly, but I would like to have it). And such adaptors aggregating rvalue containers by value are themselves containers in terms of deep constness etc.


          • I distinguish between ranges and containers. Ranges are always aggregated by value, containers only if they are rvalues (and I wouldn’t mind if this behavior would be available only explicitly, but I would like to have it). And such adaptors aggregating rvalue containers by value are themselves containers in terms of deep constness etc.

            This description sounds exactly like what I expected. And I wouldn’t mind as well if this would only be available explicitly.

          • For reasons I’ve given here and elsewhere, using the rvalue-ness of the container to decide whether to aggregate it by value or reference is a dangerous and confusing interface. Giving users a way to opt in would be far safer. I could imagine an interface like this:

            extern std::vector<int> read_ints();
            auto rng = by_value(read_ints())
                     | view::remove_if(...)
                     | view::transform(...);
            

            Of course, then view is the wrong name. And I personally like the fact that I can count on views always being lightweight objects with reference semantics. So I don’t love the above, but I guess I don’t hate it either. I’ll try not to make any decisions that preclude it, so that if we end up wanting to add it later, we can.

  22. I have one concern about the fact that ranges never “own” values and it is about return values.

    It is a common pattern in our production code to have a function return a “type-erased iterable” (typically for big queries that take a lot of time to execute). Notice that I wrote “iterable” and not “range”. When calling the function, the user doesn’t have to care about the following:

    1. Where the data comes from (database, memory, file on disk, network, etc)
    2. Is there any lazy evaluation done when iterating over what is returned (it ‘could’ be the case)
    3. Does the returned value own any underlying data (it ‘could’ be the case)

    Because of (3), returning any_range is not an option, since a range can never own anything. With range-v3, what would be the alternative? Did you ever consider something like any_iterable ?

    any_iterable<int> get_prime_numbers(int up_to)
    {
        vector<int> v;
        ...
        return {move(v), view::foo | view::bar};
    }
    
    auto x = get_prime_numbers(100);
    // x contains:
    // - the moved vector v (it is now owning it)
    // - the result of view::foo | view::bar (could be seen as any_range)
    
    • In your application, are these type-erased iterables mutable or immutable? It sounds like they’re immutable. In this case, you can have your cake and eat it by defining a range type that stores a shared_ptr<vector<int> const>. Then any_range will just work for you, and you’ll still get O(1) copy/assign.

      • Yes they are immutable.

        Uh, shared_ptr? I guess it could work, but at the cost of an extra allocation, and an extra indirection when accessing the range… I’m not sure it sounds like a satisfying solution, at least from a performance perspective.

        Also, isn’t holding the vector in the range with a shared_ptr kind of like saying that the range is an owning one? Or a least, we can say that it is kind of ’emulating’ an owning one. If this indeed works, we can expect implementations of ‘owning versions’ of ranges to spread in a lot of code bases. Don’t you think it would be worth trying to support them in a clean way from the beginning? (I’m not suggesting that it is easy to do…)

        • Think for a minute about how you would implement your any_iterable type if it could hold either a range or a container. If it’s holding a container, then when you copy the any_iterable, you need to copy the container. Copying containers is expensive. And you’re worried about the cost of a shared_ptr?

          Perhaps I’ve confused things by talking about element ownership. The real issue is value semantics vs reference semantics. Ranges have reference semantics — copies create aliases. Containers have value semantics — copies are independent. A range that holds a shared reference to a container has reference semantics even though the ownership is shared among the ranges.

          When your data is immutable, you have referential transparency. There’s no difference between value semantics and reference semantics when your data is immutable. Sure, you could create a bunch of copies of the same immutable value, but why? It’s unnecessary and terribly inefficient, especially if your value is something expensive to copy like a container. Functional language share immutable data extensively. Mostly they use garbage collection. In C++, we use reference counting.

          So yes, shared_ptr is your friend here. Don’t look down your nose at it. Sometimes it’s the right tool — especially when it’s spelled shared_ptr< T const >.

          EDIT: If the extra indirection bugs you, you might look into boost::shared_array.

          • I understand your point that reference semantics allow fast copy/assign. I am just not convinced that it is always essential to have fast copy/assign. For return values, we just need move to be efficient. I don’t see expensive copies as a reason to discard owning ranges.

            Yes, I’m worried about the cost of a shared_pointer because if I knew that a range could be an owning one, I would simply not copy it around. The same way that I don’t copy vectors or any container (unless when it is really needed) because I know that it is expensive. So if I manage to avoid copies, the biggest overhead I’ll be left with will be the one from the shared_ptr.

            About boost::shared_array, it makes the solution a bit better as you say. At first sight it seems to be similar to shared_ptr<T[]> (N4077).. (note that some extra work might be needed to make it an iterable; it doesn’t seem to be one)
            Downside :
            – It still requires an extra allocation
            – A vector (or any sequence container) could not be moved into an owning range implemented with a shared_array

          • If we’re talking about extra allocations. Do begin() and end() of any_range require allocation or not?

          • Yes, as far as i can tell. It needs to return something that type-erases an iterator, and type-erasure means dynamic allocation (unless you implement the Small Object Optimization, i suppose)

  23. So, I’m later to this one, but let me see if I’ve got this right…

    An “iterable” is anything that can have iterator ranges referencing it. An iterable may itself be an iterator range, or it may be the thing that actually owns the values — be it a container, an array, or something less obvious.

    “View algorithms” take iterables and return iterables that also happen to be iterator ranges: their input may be the actual, owning set of values or it may be a chain of other iterator ranges going back to an actual, owning set of values eventually. The major reason for unifying owning and non-owning iterables (non-owning iterables themselves being iterator ranges) under one “iterable” interface is so that iterator ranges can be made in this way: either directly from their end thing or from any of their brethren in between them and it. “View algorithms” are named with the noun, “view”, because a “view” is this intermediate thing with reference semantics that they spit out; they certainly don’t do anything themselves other than generate that view to spit out and that will be used later, which later use is where the real action will finally be done (during the access/dereferencing through that reference). There isn’t a good verb name for “view algorithms” because their whole point is to give you an intermediate object that will do the doing later, rather than do the doing themselves.

    In contrast, “container algorithms” perform transformation on the various non-view types of iterables and spit out the transformed value-set. “Container” doesn’t really belong in their name since the iterable they work with might be an array. Heck, they might even take input as something really weird like an istream iterator reading a finite stream length, as long as they can take ownership of the value and process the whole thing before spitting it back out. They don’t have a good noun name based on the thing they spit out (like a “view” is what’s obtained from a “view algorithm”) because we don’t have a good unified noun name for “output of iteration that, while it’s almost certainly also iterable, is an actual thing — value semantics — and not an iterator range with reference semantics”. But, if my understanding up to this point is correct, they shouldn’t even have a noun name if a good one were available — what they spit out isn’t the interesting part, what they do is, so they should have a verb name. They should be named for the fact that they do the doing (the transformations), in contrast to “view algorithms” that don’t do the doing, merely giving you an intermediate “view” that will do the doing when you need it to.

    So, if I haven’t made any mistakes, the alternative to “view” really should be a verb rather than a noun because the contrast between verb and noun reflects the difference of action implied by the difference between the reference semantics and the value semantics. If I’m right about that, it would further explain why “container algorithms” has been such a confusing name (I’m pretty sure the sort of confusion we’ve witnessed isn’t limited to thinking that they act on containers and not other types). And it would largely explain why most of the suggestions for new names aren’t nouns, and some of the best are verbs.

    I’m for “mutate” personally, because it’s less vague than “action” and fits better with the move semantics you’ve also argued for rather than copy semantics — for copy semantics, “make” would be better than “mutate”.

    All that’s my two cents’ worth, and hoping I got the premises to my analyses correct. Gotta say, I like the range-based-on-iterators idea and the views idea in the first place and am looking forward to seeing those get into the standard library even if I stick to the traditional algorithms (in their range rather than iterator pair versions) for most of my needs.

  24. Hi,
    I understand if you dont want to be my personal SO, but this may be even beneficial to you(wrt to efficiency of your ranges proposal).
    Recently I was looking at some theoretical append_to_vector(input_vector*, append_me_container& ) algorithm and I noticed one lame thing about vector insert overloads. If you want to append container to vector only way it can preallocate memory if container iters are RA(since std::distance is O(1)), but problem is that if for example you want to append keys of a map or set or your vector is of pair<A.B> and you append map<A,B> there is no way go give vector insert number of elements to be inserted so it can reallocate vector if elements of container will not fit in vector.capacity()-vector.size() space.
    You can obviously call reserve in your append_to_vector but that is reimplementing vector growth strategy algorithm :/ . So do you think we would need overload added to http://en.cppreference.com/w/cpp/container/vector/insert
    One with (iterator insert( const_iterator pos, InputIt first, size_t n ); That copies n elements starting from first, obviously UB if that would go over edge of the world(.end() of container from where first is).

    • It’s a legitimate point. That’s why in my proposal, I separate the concept of “SizedRange” from ordinary Range, and even “SizedIteratorRange” for iterator/sentinel pairs that know their size even if the iterator is not random access. (Consider a range made from an iterator and a count.)

      In the future, this problem will be fixed by making the containers aware of these concepts. In the shorter term, we’ll need to do something more hack-ish, like what you suggest.

Leave a Reply to Arno Schoedl Cancel reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.