The recent meeting of the C++ Standardization Committee in Urbana-Champaign was a watershed moment for my work on ranges. Ultimately, my presentation was well received (Herb Sutter used the phrase “palpable excitement” to describe the feeling in the room), but it wasn’t at all certain that things would go that way, and in fact an eleventh-hour addition pushed the proposal over the top: container algorithms.
Ranges, as of N4128
The existing algorithms in the C++ standard library operate eagerly. After std::transform
returns, for instance, you can be sure that all the transform-y stuff is done. Some algorithms are also mutating. When you call std::sort
, the data has been sorted — in place.
Not so with the range views that N4128 proposes. These are like lazily-evaluated, non-mutating algorithms that present custom views of data stored elsewhere. For instance, when you say:
std::vector<int> ints{1,2,3,4}; auto squared = ints | view::transform([](int i){return i*i;});
… not one iota of transforming has happened. You have just created a view that, when it’s iterated, does transformation on-the-fly, without mutating the underlying sequence.
The algorithms and the views differ in another important way: the views easily compose — filter a transformed slice? No problem! — but the algorithms don’t. Doing that sort of thing with the algorithms requires fiddling about with iterators and named temporaries, and takes several lines of chatty code.
The Missing Piece
So to sum up, in the world of N4128, we have this:
- Eager algorithms that can mutate but that don’t compose.
- Lazy algorithms that can’t mutate but do compose.
- ??!!!!
Whoops! Something is missing. If I want to read a bunch of ints, sort them, and make them unique, here’s what that would look like in N4128:
extern std::vector<int> read_ints(); std::vector<int> ints = read_ints(); std::sort(ints); auto i = std::unique(ints); ints.erase(i, ints.end());
Blech! A few people noticed this shortcoming of my proposal. A week before the meeting, I was seriously worried that this issue would derail the whole effort. I needed a solution, and quick.
Container Algorithms
The solution I presented in Urbana is container algorithms. These are composable algorithms that operate eagerly on container-like things, mutating them in-place, then forwarding them on for further processing. For instance, the read+sort+unique example looks like this with container algorithms:
std::vector<int> ints = read_ints() | cont::sort | cont::unique;
Much nicer. Since the container algorithm executes eagerly, it can take a vector and return a vector. The range views can’t do that.
A Moving Example
Move semantics makes all of this work smoothly. A temporary container is moved into a chain of mutating container algorithms, where it is munged and moved out, ready to be slurped up by the next container algorithm. (Naturally, performance would suffer if container algorithms were used with a container that wasn’t efficiently movable, like a big std::array
. Don’t do that.)
Since container algorithms accept and return containers by value, I worried that people might do this and be surprised by the result:
std::vector<int> v{/*...*/}; // Oops, this doesn't sort v: v | cont::sort;
The author of this code might expect this to sort v
. Instead, v
would be copied, the copy would be sorted, and then the result would be ignored.
Also, there’s a potential performance bug in code like below if we allow people to pass lvalues to container algorithms:
// Oops, this isn't very efficient: std::vector<BigObject> bigvec{/*...*/}; bigvec = bigvec | cont::sort | cont::unique;
bigvec
is copied when it’s passed to cont::sort
by value. That’s bad! The alternative would be to have container algorithms do perfect forwarding — in which case what gets returned is a reference to bigvec
. That then gets assigned back to bigvec
! Assigning a container to itself is … weird. It’s guaranteed to work, but it’s not guaranteed to be efficient. An interface that makes it easy to make this mistake is a bad interface.
Instead, in my current thinking, the above code should fail to compile. The container algorithms require rvalue containers; you should move or copy a container into the chain. With range-v3, that looks like this:
using namespace ranges; bigvec = std::move(bigvec) | cont::sort | cont::unique;
That fixes the performance problem, and also makes it pretty obvious that you ignore the return type of move(v) | cont::sort
at your own peril.
I also offer this short form to apply a chain of mutating operations to a container:
bigvec |= cont::sort | cont::unique;
If you’re not a fan of the pipe syntax, this works too:
cont::unique(cont::sort(bigvec));
Both of these syntaxes will refuse to operate on temporary containers.
What is a Container?
Consider this line of code from above, which applies a chain of mutating operations to a container:
bigvec |= cont::sort | cont::unique;
How is this implemented? One simple answer is to make it a synonym for the following:
bigvec = std::move(bigvec) | cont::sort | cont::unique;
But not all containers are efficiently movable, so this would be needlessly inefficient. Instead, what gets passed around is a reference-wrapped container. Essentially, it’s implemented like this:
std::ref(bigvec) | cont::sort | cont::unique;
But cont::sort
and cont::unique
are container algorithms. Is a reference-wrapped container also a container, then? Un-possible!
Containers own their elements and copy them when the container is copied. A reference-wrapped container doesn’t have those semantics. It’s a range: an Iterable object that refers to elements stored elsewhere. But ref(v) | cont::sort
sure seems like a reasonable thing to do.
In other words, container algorithms are misnamed! They work just fine when they are passed ranges, so long as the range provides the right operations. cont::sort
needs an Iterable with elements it can permute, and that’s it. It doesn’t care at all who owns the elements.
cont::unique
is also indifferent to element ownership, so long as it has a way to remove the non-unique elements. Rather than relying on an erase
member function to do the erasing, we can define erase
as a customization point — a free function — that any Iterable type can overload. With the appropriate overload of erase
for reference-wrapped containers, std::ref(v) | cont::unique
will Just Work.
The interesting (to me, at least) result of this is that containers are not interesting. Instead, we get much farther with refinements of the Iterable concept that add specific behaviors, like EraseableIterable. The container algorithms accept any Iterable that offers the right set of behaviors. They don’t care one whit who owns the elements.
Summary
Over the past month, I’ve been adding a full suite of container algorithms to my range-v3 library for things like sorting, removing elements, slicing, inserting, and more. These are eager algorithms that compose. I call them “container algorithms” since “eager, composable algorithms” doesn’t roll off the tongue — they are perfectly happy working ranges. If you want to send a non-owning slice view to cont::sort
, knock yourself out.
Container algorithms fill a gaping hole in N4128. They went a long, long way to appeasing many of the committee members who dearly want ranges to solve the usability problems with the current standard algorithms. I can only assume that, had I left container algorithms out of my presentation, the reception in Urbana would have been a few degrees colder.
Acknowledgements
The design of container algorithms presented here benefited tremendously by feedback from Sean Parent.
UPDATE:
I’ve heard you! “Container algorithm” is a confusing name. They aren’t restricted to containers, and that isn’t the interesting bit anyway. The interesting bit is that they are eager, mutating, composable algorithms. There is no one pithy word that conveys all of that (AFAICT), but so far “action” has come closest. So we now have view::transform
(lazy, non-mutating) and action::transform
(eager, mutating). Not perfect, but better, certainly.
Why not simply assign the view back to the container? Like ints = unique(sort(ints)).
Take your time and read the whole thing! 🙂
You know, I guess that should teach me to reply after only having read half the post.
However, I read your performance argument as terribly unconvincing.
Firstly, performance is a QoI issue, and I think it would be trivial for implementations to simply compare the address of the LHS and the source container address to determine if it’s in-place or not.
Secondly, this “performance problem” is a hypothetical. Somebody, somewhere, might pay a cost. And then they might just re-write their code to not operate that way- for example, move the container into another, and then view the temporary.
“But it might not be as efficient as array access” is a terrible reason to gimp the interface. If people don’t want to use inefficient functions, they can just not call them. But by removing them from the interface, you break it for everybody, everywhere, even if they absolutely do not give a crap about maybe accidentally copying this vector one extra time in a cold path.
How is the interface “gimped” and “broken”? There is a very short syntax that is unambiguous and efficient. Isn’t that acceptable?
No. No, it is not enough. You should know that.
Assembly and C are also both short, unambiguous, and efficient, but we don’t program in them. Hell, shortness can be a major downside- ask everyone who using namespace std or tried to program in Perl or APL or named their identifiers “a”, “b”, “c”.
The primary goal of code is not to be short, nor to be efficient. Efficiency is a thing you only go for when you know you really need it in that hot path- a special case. Readability and maintainability is a primary concern. Shortness of source code is a thing nobody needs. Hell, shortness of binary code is a bit of a special case.
You duplicated the algorithms and prevented perfectly logical and sane code from compiling for no benefit at all. That is not a win. That’s a fail.
There is one consideration of having things in place, although it’s not necessarily “performance.”
As a real-world example, your website might accept CSV files and do some form of aggregation on them – let’s say add up the cost of order lines.
The very naive approach would be to parse that CSV in memory and add all the lines to a vector, then use that vector as required. Next thing you know the client uploads a 100GB document…
“Container algorithms” mean that you can lex the data as it streams in, filter out to the column you want and aggregate – the memory complexity would be O(1) (the total stack frame sizes of your selectors/predicates/lambdas).
This is a big deal for C++. Some of us need to use the iterative approach up-front, because our customers tend to do things like 100GB documents and we simple can’t store that shit in memory. If streaming operations/container algorithms/linq are made trivial then there is a drastically smaller chance of a junior writing new OrderLine[size];
Regular view-based algorithms can also achieve this, and in a much more direct effective way.
Right, the pull model you describe seems to be a better fit for a (lazy) range view, not an eager approach.
Congratulations so far Eric, this is possibly the biggest feature Ill use in the upcoming C++ standard. As Sean Parent says, all programs are just transformations of data
By the looks of it you are focusing on the core semantics of ranges and not the details of the algorithms the library will provide, right? Please let me vote to include an enumerating range wrapper that sticks an integral index onto each element (not the boost::range one either, which doesnt work with the new for syntax)
If I understand you correctly, you want something that takes a sequence like [a,b,c,d] and produces a sequence like [(a,0),(b,1),(c,2), (d,3)]. Is that right? If so, you can do this today, and it’s pretty trivial:
Let’s suppose we have a constructor for vector that takes an Iterable:
std::vector vec_from_view(read_ints() | view::transform(…) | view::unique);
std::vector vec_from_container(read_ints() | cont::transform(…) | cont::unique);
These two examples, if I am not mistaken, yield the same value, but they operate in different ways: the first one will traverse the vector only once, and will make a copy of the view, leaving the first vector alone. The second one, will traverse it twice and will move the result.
I see you cannot do that with sort, because you cannot have a view for sort, as in your example. So my question is:
Is the point of “container composable algorithms” to allow to compose in cases where you wouldn’t be able to do it with views alone? But some operations can
still can be done with both ‘view’ and with ‘cont’ in some situations, does this hold true (I guess yes)?
In the face of choice, lazy compositions would have better performance? I am not sure about this. My intuition tells me that a single pass will be faster than several passes, when allowed.
Actually, we do! (Sort of.) All the views are implicitly convertible to containers. So you can do this today, for instance:
Looking at your example:
Not quite. The first example won’t compile. If
read_ints
is returning a vector of ints, then you can’t send that directly to a view. Views don’t work on temporary containers.This part is right.
Yes.
I haven’t done benchmarking, but I expect you’re right: a single pass would be better.
Why don’t views work on temporary containers? I thought the whole point of “fat” ranges was to make this work.
What do you mean by “fat”? Ranges are lightweight, non-owning views of data. They never own their elements, so they can’t safely refer to a temporary container.
Sorry I meant that the algorithms were range based rather than using “fat” iterators.
You mean views right? Because containers own their elements.
But why can’t a view never own its element? Is there something error prone or problematic by following the model:
rvalue => own it
lvalue => reference it
To me it seems better, because we can detect if something is an rvalue or lvalue. We cannot detect whether a range owns its elements or not without help from a user.
No, I mean ranges. Containers own their elements, ranges don’t. A container is not a range. Both containers and ranges are iterables, though, and most of the algorithms only care about iterables. That’s because the algorithms don’t care much about ownership semantics. They are eager. The views, however, very much care about ownership semantics, since they are lazy.
There is something wrong with it, but it’s subtle. Currently in the standard library, value category (whether an expression is an lvalue or an rvalue) is used to perform optimizations, but generally only in places where it doesn’t change observable semantics; e.g. to steal the guts of an object that is going away anyway. What you’re suggesting is to use value category to fundamentally change the semantics of an expression in an observable and potentially surprising way.
If we allow a view to capture rvalue containers, then this works and is safe:
Someone might want to see the return value of
read_strings()
in a debugger, so they change the code to this:This code also compiles, but now it’s a memory error. Oops. A simple refactoring has fundamentally changed the semantics and performance characteristics of the code. That makes me extremely uncomfortable.
Range-v3 worked they way you suggested for the first 6 months of its life, and it continually rubbed me the wrong way. Once I switched my thinking to “views are ranges, ranges are lightweight and non-owning”, it cleared up a host of fuzzy semantics issues and made things crystal clear.
I get that this answer isn’t satisfying for some people, and I understand why. Can you also see my point?
Yes that makes a lot of sense, thanks for explaining that. However, the terminology is really confusing for me, since a range has traditionally meant
begin
andend
iterators.To me, it seems like it would be better to call:
non-owning => a view or a weak range
owning => a container
Perhaps you already thought through this.
A
begin
andend
iterator is exactly what I’m calling “Range”. It’s lightweight and non-owning, just like a pair of iterators. Your terminology also works, though. It might be worth mentioning in my next std paper.Actually you can – it’s quite amazing what you can do with views. The creation of a non-mutating lazy sorted view based on mergesort is a little bit tricky and mind-twisting, but absolutely possible. Gary Powell and I implemented this several years ago, and to our astonishment it worked.
I’m not sure I’d really recommend to use it in practice, though.
Btw, great job, Eric!
I’m curious to see the code. Why don’t you recommend it? Does it have bad algorithmic complexity?
We implemented a set of views about 14 years ago. You can find the code and some documentation at http://www.zib.de/weiser/vtl/index.html. It’s been more an exploration of possibilities than a serious attempt to standards-compliant code, and wouldn’t live up to current standards.
Apart from that, it’s been quite interesting work, at least for us.
Just now that I looked it up I didn’t find the actual sorted view code I remember, but essentially all components are there (most notably the merge_view). Maybe somewhere I can dig up the test code for sorted views we wrote at that time.
The core idea is the merge view. It takes to iterables (in your notation) and iterates over them by always presenting the smaller of the two current elements. What you need in addtion is something that recursively splits a random access iterable in two halves and creates a merge view of the sorted views of the two halves.
Why wouldn’t I recommend it for practical use?
It’s a quite complex tangled mess of code for such a simple task as sorting. Close to mental masturbation – fascinating what you can do, sorting a vector on the fly without moving any elements 🙂
I don’t remember we did tests on the sorted view. The “naive” implementation I remember has probably n*n complexity, but it should be possible to achieve n log n with some result caching. The tree of views takes n log n memory, of course. Memory access patterns are probably suboptimal as well.
P.S.: After reading your post, I think that cont:: is not an appealing namespace for these algos, after all, the fact that they operate on containers doesn’t matter, as you say. I am squeezing my mind to come up with a name, but I cannot find one 🙁
Let me know when you come up with something. I’ve been thinking, too.
Replace when by if. 😀 I will try, I was doing some kind of brainstorming about the properties of ranged vs container algorithms.
Container algos are eager, range-based are lazy… etc, but I think a too technical name would come out from there. I am not sure how to go on. Those are not container algorithms they are ______ <—- Something I don’t know here.
How about “inplace”? Though that may convey the wrong thing for 2-container algorithms, like “transform from here to there”.
Great work Eric – can’t wait to write code like this as standard C++.
About the name – I had the thought that something evoking the movement and processing might work – like chomp, or digest. It’s not easy to come up with short words that get all the meaning …
Last comment, sorry for spamming:
I think this won’t work, but does it make sense?
auto my_new_seq =
some_generated_view | view::transform(…) | generate_a_vector | cont::sort | view::unique | view::transform(…));
Because cont::sort is an rvalue, so applying a view will not work, I guess. I can save passes using view:: after cont::sort, because views operate lazily.
Is this what I should do every time I want that?
auto my_temp_seq =
some_generated_view | view::transform(…) | generate_a_vector | cont::sort);
auto my_new_seq = vector(my_temp_seq | view::unique | view::transform(…)));
Great work, BTW, hope to see some of this into the next standard!
As I’ve annotated above, everything works until you try to pipe the results of
cont::sort
toview::unique
, which can’t work on a temporary container. You need to save the result ofcont::sort
into a local.You got it.
Thanks!
Maybe it’d be interesting to consider an analog, but opposite, evil twin of std::move. One which would allow the programmer to explicitly cast an rvalue into a lvalue. It would require some amount of care in its use, somewhat like std::move, but I suspect even more so.
By the way, your post gave me an “I should have had a V-8” moment. Of course there should be a
generate_a_vector
view. As of today, there is aview::to_vector
adaptor, as well as the more general forms:view::to_<std::vector>()
(element type is deduced), andview::to_<std::vector<long>>()
(elements are converted as necessary). Seems pretty handy.Glad it helps. Yes, I always wondered why you cannot generate a container from a view directly, since this has two good properties:
- It continues the pipe flow with left to right reading.
- Can use auto on the left hand side to deduce.
BTW, where can I find how to markup posts?
If you wrap your code in [code lang=”cpp”]code here[/code], it will get syntax highlighted. It seems to screw up angle brackets, though. Sorry, blame WordPress.
So what do the error messages look like if you do something wrong in a long pipe? 😉
Ha. Depends. When using the container algorithms presented here, you’ll get an error like “unable to find overloaded
operator|
with arguments <container-type-here> andranges::cont::sort
“, or some such. Then you’ll probably get a list ofoperator|
overloads, and why they failed to match. Not great, but not horrible, either.With the range views, you’ll get the same error, but instead of “<container-type-here>”, you’ll get the type of the range at that point in the pipeline. Since range views nest, the further into the pipe you are, the more complicated the type. This can look pretty fugly, especially if the pipe expression involves lambdas. There’s not much I can do about that.
On the up-side, there shouldn’t be any of the awful template instantiation backtraces we’ve come to expect from heavily template-metaprogrammed libraries. The reason is because I am making extensive use of a library to emulate Concepts Lite. All the type parameters are verified at the API boundary, so you shouldn’t (often) see errors from deep within the library’s guts.
First of all, thank you a lot, Eric, for the great work you’re doing. Secondly, it’s probably a silly idea, but anyway… If, as far as I understand, from the user’s perspective the biggest difference between view:: and cont:: is that the first one is lazy and the second one is eager, maybe they could be named so, that is, lazy:: and eager:: ? At least, it should make the difference between similarly named algorithms like lazy::transform and eager::transform immediately apparent.
It’s something I’ve seriously considered. It definitely has its merits to name them
lazy
andeager
. I hate parting withview
though, because that effectively conveys “non-owning”. Worth considering, though.touch::
andview::
seem to convey their respective semanticsI’m afraid I don’t get
touch::
.v | touch::sort
doesn’t speak to me the wayv | eager::sort
does. Admittedly, naming is subjective.I think lazy is the best name I’ve seen suggested. It’s technically correct (unlike cont), and actually means something specific (unlike touch/direct).
In my mind lazy also means non-owning. I mean, owning a bunch of stuff takes work (to paraphrase Biggie Smalls).
I would only hope that the names also make it explicit whether the algorithms work in-place or return a copy or a view of the input. As an end-user of libraries, having to look up what’s in-place and what makes a copy is really annoying.
Oh and stuff like std::unique needing the user to explicitly erase the now-garbage end of the container is ridiculous. Thanks for moving in a direction where there is less of that.
Oh man. That just goes to show how important it is to get the names right.
cont::
means “eager”.view::
means “lazy”. You got it completely backwards. Clearly,cont::
is confusing the issue. 🙂lazy was the first word that sprang to my mind and now having seen others use it it’ll get my vote too. With some Scala knowledge the term lazy also implies the infinite stream possibility where ownership isn’t really viable. Then again, said background is making me read this with functional, i.e. immutable, usage in mind.
What about
lazy::view::transform (or just view::transform for short).
and
eager::mutable::transform (or just mut::transform for short).
Maybe having a top level namespace lazy and eager, would come in handy if this will ever be extended to such things as eager::view::transform or lazy::mutable::transform. Not sure those make any sense though.
Cheers
Sorry Eric, if you can, delete the comment before, no context there.
view denotes non-owning, but cont are not container algorithms. Why should you then stick to something that means nonowning, when we already know that none of them mean ownership? lazy vs eager is a more important property in this case. There is also something strange about l vs r-valueness I think -> views work on lvalues. cont works on rvalues.
How can, in the space of two sentences, you say “view denotes non-owning,” and then, “none of them mean ownership”?
view
denotes non-owning.cont
doesn’t.No. views do not require lvalues. They require lvalue containers, but they accept rvalue ranges. The have to; otherwise, you couldn’t pipeline them. In “
v | view::foo | view::bar
“, the result of “v | view::foo
” is an rvalue, butview::bar
accepts it because it is non-owning.Eager vs. lazy is interesting. Owning vs. non-owning is interesting. Value category isn’t interesting, except insofar as it’s needed to avoid memory and performance errors.
Sorry I didn’t explain clearly there. I mean: view denotes non-owning. But eager vs lazy don’t denote ownership. You are naming things related to ownership, not lazyness/eagerness. That was my point.
I think these algorithms can be called “immediate”, since they are immediate both in time and space: they work exactly upon call and operate directly on the provided data.
But the word itself is too long to type as a namespace.
Or maybe “direct”?
I really like “immediate”, but yeah. It’s a bit too long. <sigh>
If immediate is too long, then immed is only one char longer than view or cont.
True. But as another commenter has pointed out,
view
is used here as a noun, so another noun is desirable. What do you think ofaction
as a noun that carries the “eager” connotation?I’m not a big fan of ‘action’. I was winding about ‘now’ for the immediate transforms, but it doesn’t pair any better with ‘view’, and it’s obvious counterpart ‘later’ isn’t quite right either. Naming is hard. 🙂
For someone who does not have deep knowledge of ranges, this is quite confusing. Couldn’t the difference be made simply by looking at the lvalue/rvalue-nes of the container? As I understand it, view algorithms only support lvalue containers and container algorithms only support rvalue container. Couldn’t the interface for view and container algorithms be unified and make the view/container distinction deduced?
vs
Lvalue/rvalue isn’t the interesting distinction. Eager vs. lazy is the axis of interest. You should have a way to say: I want to do a chain of operations, but I only want to make one pass over the data, and I don’t want to mutate the underlying elements. Or: I want to mutate these elements, and I want the operations to take effect immediately. They are both useful. They have different requirements on their input.
Anything in the
view
namespace is lazy, lightweight, and non-owning. Since it’s non-owning, they refuse to operate when nobody has claimed ownership of the elements.Anything in the
cont
namespace is eager. They move whole containers in and out, so you have to move a container in.Reference semantics in C++ can be tricky because of the lifetime issues. A range abstraction has to walk a fine line: avoid memory errors by preventing dangling references, while also preventing performance errors by copying too liberally, while also giving people both the eager and lazy semantics they want, in a world where some sequences own their elements and others don’t. It’s harder than it looks. So hard, in fact, that it’s taken this long for a serious range proposal to make it this far.
I hope this has clarified the issue a bit.
I understand that the distinction between eager and lazy is not the same as the distinction between rvalue and lvalue.
My point is that since eager requires an rvalue and lazy requires an lvalue (if I understood correctly), can you demonstrate a concrete example where it would be wrong to implicitly deduce an lvalue as ‘lazy’ and an rvalue as ‘eager’?
From what I understand, what you propose results in this :
std::vector<int> ints = read_ints() | view::sort; // Error : Cannot use view on rvalue to avoid dangling references
std::vector<int> ints = read_ints() | cont::sort; // OK
std::vector<int> ints2 = ints | cont::sort; // Error : Cannot use cont on lvalue
std::vector<int> ints2 = ints | view::sort; // OK
From a user perspective, having to explicitly specify the view:: and cont:: namespaces (and choosing between the two) seems redundant. Maybe just defining those two namespaces as inline could do the job?
No, I should have been clearer. Lazy does not require lvalue. In fact, it can’t. Consider a lazy pipeline:
The result of
v | view::transform(...)
is necessarily an rvalue. The filter view accepts it because it is a (non-owning) range. But, assumingv
is a vector, had you done “move(v) | view::transform(...)
“, that’s a potential memory error, so it must be rejected.From a pure library-design perspective, deciding something as fundamental as eager vs. lazy, value vs. reference based on something rather esoteric like an expression’s value category is likely to be more confusing rather than less. Performance characteristics and memory safety shouldn’t be affected by trivial source code transformations, like assigning the result of an intermediate computation to a named variable. That is, making both of these compile but do radically different things, and with different safety/performance implications, is dangerous:
Large expressions get broken up into smaller ones frequently during code refactorizations. If a simple code transformation like that fundamentally changes the meaning of your program — in a way that can’t be diagnosed at compile-time — that’s going to be a ripe source of bugs.
No, I should have been clearer. Lazy does not require lvalue. In fact, it can’t. Consider a lazy pipeline:
…
The result of v | view::transform(…) is necessarily an rvalue.
Uh, yes; I was talking about container lvalue (I know that view objects can perfectly be rvalue).
Large expressions get broken up into smaller ones frequently during code refactorizations. If a simple code transformation like that fundamentally changes the meaning of your program — in a way that can’t be diagnosed at compile-time — that’s going to be a ripe source of bugs.
Good point. It is the same problem that all libraries making use of expression templates have with auto. Ah well, I guess that until we can override operator auto (I think there was a proposal for it), it is probably wiser to do as you propose and require it to be explicit.
Very excited about this, great work.
On naming, you say “they are both useful” referring to eager vs lazy, but in the description you name what seems to be two axes of interest — perhaps this is making naming harder?
Axes of interest:
1. “View”, lazy, has bonus of single pass v’s “Eager”, immediate, applies operation one pass per pipe.
2. “View”, doesn’t modify underlying data v’s “Mutating”, inplace, modifies original
The name “view” matches well with both lazy/eager axis and the mutability axis, but while “eager” says immediate, eager doesn’t really say inplace/mutating, and the opposite of “inplace/mutating” don’t really say immediate/multi-pass.
Also, are there really four kinds we are looking at?
1. Lazy, unmutating
2. Lazy, mutating
3. Eager, unmutating
4. Eager, mutating
Here are a few more ideas for an alternate name to “container” algorithms (in order of my preference):
prompt
live
instant
greedy
I don’t like “greedy”, but it is often used for such purposes.
And for a few laughs, I offer ‘voracious’! 😉
LOL. So far, ‘
eager::
‘ is winning in the battle for my heart and mind. It’s up for grabs whether I’ll change ‘view::
‘ to ‘lazy::
‘ though.Naming Is Hard.
Some random musings:
If the current use of view/views is as a noun, then sticking with a noun perhaps something along the line of mutation(which appears in the quickstart doc description), mutant(shorter:-)), product, etc. If view is used as a verb, then terms like copy, move, apply, transcribe, etc., seem more appropriate, IMHO of course.
Mixing view (noun or verb) with an adjective like eager doesn’t quite sit well with me.
Backing up a step or two: maybe the ‘things’ named filter, transform, sort, take, etc just embody filteredness, transformedness… and what needs to be different is the binary operator that feeds the result of the left hand side to the input of the right hand side. So the ‘>>’ below causes the realization of the container:
using namespace view_or_cont;
auto my_new_seq
= some_generated_view
| transform(…)
>> sort
>> unique
| transform(…);
Not sure about precedence issues here.
Another thought I’ve had is that views & ranges (in the boost::range sense) have always focused on the data-source side. That is, we pass views/ranges eventually to algorithms which typically have an output_iterator to write into. Maybe that ‘source-ness’ needs to be explicitly acknowledged, and there needs to be some symmetry on the data-sink side.
Anyway you’ve done outstanding work here Eric! Just wanted to throw out some ideas to aid in brainstorming.
Jeff
Yeah, me neither. Need to think more about your comment, but an idea popped into my head while reading it. The container algorithms are immediate, and the noun “action” seems to convey that. So, views and actions?
Well, if one is a “view”, then maybe the other can be a “site” (or “spot”, but that sounds like a dog’s name 🙂 ).
So, some algorithms are performed “on view”, others “on site”.
“action” still seems a little vague to me.
The noun “yield” comes closest to the concept of realizing what is being viewed.
Grasping for other analogies, in computer graphics a view of a scene is realized by being rendered.
Maybe view/rendition using nouns?
So far I think that is the best naming, but I am not sure is the absolute best we could find. Indeed, the fundamental property is lazy vs eager. Ownership, as I see it, is not that important.
Delighted to hear that ranges got a good reception in Urbana.
I’m not totally sold on these container algorithms yet. Can I attempt to defend the current way of doing things?
Much of the ugliness in the original example is due to the std::unique algorithm, which requires a separate erase step. We could easily write a single function which did both things – indeed N4161 proposes just such a function for the similar case of std::remove / remove_if. The example would then reduce to something like:
auto ints = read_ints();
std::sort(ints);
std::erase_duplicates(ints);
Not so bad now.
OK I can see benefits in being able to do this all on one line. It’s more functional; it could avoid the need to name a temporary object sometimes; it would allow the ints variable to be made const. Still the old style has advantages: chiefly, the semantics are immediately apparent. You don’t need to ask whether the operations are lazy or eager here: they have to be eager.
I suppose my worry would be about the teachability of having two different “modes” for operator |. The rules for the lazily-evaluated range adapters, whether they can be applied to lvales or rvalues etc., are pretty complicated already. Now we’re going to add a new type of operation with different rules, and which gives you subtly different semantics when it is applied. Maybe it’s worth it but I’m not sure yet. I think there’s a lot of benefit in having operator | always mean lazy evaluation, and if you’ve got a rvalue container you’ll just make a temporary.
I don’t think it is very difficult to teach that as long as things are namespaced. You have 3 sets of things, you can teach them one by one.
Hi,
I’ve not commented on your blog before, but I’ve been following it for a while, as well as your ranges work, with a lot of excitement.
On the naming issue — What about distinguishing between the lazy and eager functions / algorithms by using the past participle form for the former? That is, view::transformed versus cont::transform. I think this reads a little better, esp. as it lets one use using directives for both forms. (Well, except for unique where there isn’t a convincing past tense — uniquified? Nah.)
Then maybe use “as” instead of “view” and something like “do” or “eval” or “run” instead of “cont”. So, as::sorted versus eval::sort.
Or alternately, use past participle for both forms but use “make” for “cont” — as::transformed versus make::transformed. Though it loses the ability to use using directives for both forms in one place. Just to throw out some ideas; feel free to use any or none.
Best regards,
Kevin
Boost.Range uses the past participle. But as you’ve noted, it doesn’t generalize well. Nobody wants “
uniqued
“. 😉 Thanks for the other suggestions. I’ll throw them in the pot with the rest and stew on it.Well, that sounds exactly like the behaviour of the already existing std:: algorithms. Would it be too surprising to overload these algorithms, so that you could just write:
std::unique(std::sort(bigvec));
In my opinion, the intent is very clear this way, and you can keep “view” for the lazy algorithms.
I have to say I have to agree that the pipe syntax should be kept for views only.
For eager algorithms, if we get unified call syntax (N4165, N4174), it becomes:
bigvec.sort().unique();
Many of the standard algorithms already have interesting and useful return values. To say, when passed an Iterable std::foo returns the Iterable for composability, but when passed two iterators it returns other useful information and doesn’t compose — it’s just weird. There’s no symmetry or consistency in that interface.
In my proposal, the
std::
algorithms all get overloads that accept Iterables, but they have exactly the same semantics — and return types — as the existing algorithms. I think that’s the right thing to do.These composable algorithms sometimes throw away useful information by returning the modified Iterable. The up-side is that they compose well. It’s a trade-off. They are provided in addition to the existing
std::
algorithms so that users have a choice.Hello Eric,
I am happy that you propose to offer container algorithms which mirror the views. I don’t see your zoo of algorithm classes (container/view) to be complete, though. Here is my list:
1) holds reference to lvalue, lazy, output may have different type than input (view::XXX)
2) aggregates rvalue, lazy, output may have different type than input (not supported)
3) passes through rvalue, eager, output type same as input (cont::XXX)
4) operates in-place on lvalue, eager, output type same as input (cont::XXX)
5) operates eagerly on rvalue or lvalue, writes result into different container type
Not all algorithms support all modes of operation. For example,
transform supports (1), (2), and (5) and with restricted functionality (output type=input type) (3) and (4)
filter supports all five
sort supports (3), (4) and (5)
5 classes is many. It would be nice to cut them down a bit. Let’s see:
(2) can be derived generically from (1) by incorporating a generic rvalue keeper into the view implementations.
(4) can be derived from (3) also generically.
(5) can probably and with reasonable performance be obtained from (1)+a generic container maker (make_vector, etc.)
This leaves us with two core algorithms to implement, namely the in-place eager ones (3) and the lazy ones (1).
The caller, though, should be able to choose between (1) through (4), with (5) IMO done with a helper function.
I would leave the naming up for discussion. If you feel like (1) and (2) should be distinguishable at the call site, I am fine with that. In practice, I don’t see it being a use case that you generically handle rvalues and lvalues with the same code. So you can decide whether you want to call (1) or (2). But I think this is no reason not to support (2) at all.
Arno
Arno, it already is precisely as you describe, with the exception that (2) is not supported intentionally. That's because in my model, the views are always non-owning. (I feel you and I have had this conversation before.)
(3) and (4) are both automatically generated from the same implementation. (5) is supported as of a few days ago with
view::to_
, which can be used as follows:More generally,
view::to_
can be used like:view::to_<std::vector>()
orview::to_<std::vector<long>>()
. In the former case, the element type is deduced. In the later, it’s coerced.I know that you feel strongly that views are non-owning. I know that you want people to have certain expectations about how views work, namely, that they hold references.
I, on the other hand, just see that the “owning view” is a practical use case, because transforms, for example, cannot always be executed in-place at all due to different output types.
I don’t think our two views are incompatible. You want the caller to get a lightweight “view” if she wants it, and not get an unexpectedly heavy aggregate. I want the aggregate to be available, because I think it is a useful thing to have, but don’t insist on calling it a view.
Arno
BTW, I wouldn’t lump to_vector into the view namespace. You can do it on any Iterable. Converting one container type to another is not different, really.
It’s true to_vector isn’t very view-like, but not for the reason you suggest. All views operate on Iterables.
EDIT: I feel like I should clarify, since this is an important point. The “view” in “
view::
” doesn’t mean that the input should be a view. It means that the output — the result — is a view. In that sense, you’re absolutely right.to_vector
emphatically does not belong in theview::
namespace. Thanks for bringing my attention to that.Sure. As long as we don’t make the views aggregating, I have no problem with it. As always, naming and syntax are the hard part.
My two bikeshed cents: “alter”. It’s not a noun, but it’s at least consistent with treating “view” as a verb.
The approach you have taken with container algorithms reminds me of linear types used in some functional programming languages. Pure functional language may allow mutable state as long as the mutation is observed only locally. So, there is a concept of linear value – value that can be used only once, that restriction allows destructive update under the hood, because no one sees the value being updated.
To me that concept seems to be very close to container algorithms. When you call move on vector first time (or std::ref), you create a linear value, container algorithm takes that value and produces new linear value that can be used in next container algorithm.
From such perspective it is quite reasonable to have sharp distinction between linear vs non-linear values. Maybe, it is better to have special function ( std::local_value ?) for creating linear value.
This is an interesting observation. Of course, there is already a special function for creating a linear value in this case:
std::ref
. That’s how you tell the compiler, “No really, I’m telling you it’s safe. Go ahead and mutate this object.”Hello,
I’ve followed your blog about ranges and its very exciting how far this got.
I’m not a native english speaker, but as i read the whole comments one thing from Qt and other Frameworks came to my mind.
Model/View(/Controller).
The model is the underlying data structure and the view is how you look on them (lazy non mutating as far as you dont edit but thats not the point here).
so “model::transform” and “view::transform” are a good way for my thinking of the topic for a noun as the namespace. And i think it also is clear that a “model::algorithm” mutates the data structure (container).
It may be an option 😉
Greetings
That’s not quote the right mental model (no pun intended). Calling these algorithms “model” algorithms is wrong for the same reason calling them “container” algorithms is wrong. It’s not about who owns the data. It’s about whether the work is done eagerly or lazily.
I got your point on the owning thing, but then i have a problem with “view” also. Then I think we could not use a noun at all.
But my point wasn’t about the owning of the data anyway.
If you do your work eagerly you have to work on the owning data anyway even if you have some ranges or lazy work in between. Or do i get something wrong?
“view::to_vector” here can not work lazy, can he?!
You’re right; i was wrong to put
as_vector
in theview
namespace. It’s definitely not a view. That’s already fixed in the latest code.Can you sort a view::transformed vector because the view::transform range iterators overload std::iter_swap?
No. I don’t even know what that would mean.
Sort
veca
by order of memberA::n
:The same thing applies to other algorithms swapping elements, like erase.
At think-cell, we currently use
to express the same thing, but the inconsistency of using
view::transform
for algorithms likefind
, but not forsort
, is unsatisfying.This doesn’t sort the vector. It sorts all the
A::n
members, leaving theA::str
members alone. It’s pretty meaningless.If you’re suggesting to change the meaning of
iter_swap
for transform view iterators, then the answer is no.iter_swap
is an important primitive. Changing its semantics is sure to have nasty side-effects. I’m not even sure how one would overload it to consistently give the behavior you want. If you transform with a lambda instead ofstd::mem_fn
, theniter_swap
would have no way of knowing that a struct was being sliced. To have your code do something different after replacing a call tostd::mem_fn
with the equivalent lambda is surprising in the extreme.Also, maybe the transform is using the values of one sequence as keys into another sequence. And maybe the user is trying to sort that sequence. And why shouldn’t they? No, making transform view iterators magic with respect to
iter_swap
is not workable.Instead, range-v3 gives all the algorithms “projection” arguments to handle this very common case. You use it like this:
I do not quite get “if you transform with lambda…” part. Why would iter_swap need to know that something is sliced?
Iter_swap for iterators of views only needs to swap iterators of underlying container, it does not need to know anything.
And IF we allow views to be sorted, that’s the only approach that we can take.
Iter_swap overloading was done in p-stade.oven (if I remember correctly). That does not help in the current standard, however, as std::sort is not required to use iter_swap.
Similar appoach is used in range library developed by Andrei Alexandrescu for D language. At least it allows sorting of concatenated containers.
OK, but why do we want to treat iterators of view specially?
iter_swap
in the standard is defined to be:swap(*a,*b)
. If we are to do something else, how are we to define it?What makes you think that?
You can sort concatenated containers just fine, with no
iter_swap
tricks. I just tried it. Works just fine.All you would need to do is to require std::swap to use std::iter_swap (very reasonable, I think) and define transform_range::iterator’s std::iter_swap to swap the elements of the base sequence. Then it does not matter whether you use a lambda or std::mem_fn. Not that crazy, I think.
If you want to stick to the projection approach, I am not sure you need an extra projection parameter. A simple wrapper a la
sort( vec, compare_less( &A::n ) );
does not require more typing than
sort( vec, std:less(), &A::n );
but gives you more orthogonality: you don’t need an extra overload for each algorithm, just one wrapper.
How would that work? There are lots of things that are swappable that are not iter_swappable.
Unconditionally? What about the scenario I presented above:
By my count, it does. 😉 But that’s not the point. If you need
compare_less
, you also needcompare_greater
,compare_less_equal
, etc. etc. And each one is a combination of one the function objects in<functional>
andstd::mem_fn
(orstd::bind
). That isn’t an increase in orthogonality. It’s a decrease.You don’t need any extra overloads when using projections:
I do not think that in current standard iter_swap is a really important primitive. It’s only mentioned a few times and most of STL algorithms are not required to use it. So, it should be possible to redefine it.
Anyway, when STL was standartized there were no iterator adaptors or ranges so no one thought of the need to customize behavior of iter_swap for them. And in my opinion it should be custom. Iterators of ranges do not point directly to elements, they just adapt other iterators. Ranges do not have the elements, they may synthesize them on the fly (or get from some cache), so there’s no point of swapping range iterators the way iter_swap does now.
And if changing the meaning of iter_swap it too much of a breaking change, it is still possible to have another function (indirect_swap?), that iterators may customize.
In order to define iter_swap we may introduce a notion of an underlying value of iterator. For an iterator of container its underlying value is just the value that it points to. For an iterator that adapts some other iterator its underlying value is underlying value of an inner iterator (That’s not the part of the definition, however. Iterator may free to specify something else as its underlying value.) Underlying value of iterator is returned by some function (let’s call it uvalue) found via ADL. Return type of uvalue is not required to be a reference type. It might be some reference-like type. Iter_swap should swap underlying values of its parameteres.
<
pre>
// it’s just a reminder that ADL is involved
using std::swap; using std::uvalue;
auto&& f = uvalue(a);
auto&& s = uvalue(b);
swap(f,s);
<
pre>
It is true that elements of concatenated containers might be sorted now. But the approach proposed above allows sorting zipped containers, transformed containers (with sane semantics).
About the example you mentioned. As far as I understood there are two containers, one containing keys and the other that maps keys to values. And user wants to sort the second container. But usually the container that plays the role of map is either already sorted (boost::flat_map, std::map) or not sortable at all (unordered_map). And if not then user has constructed range that has to perform linear search every time element is accessed. Rather unlikely situation.
Do not read the part about your example. I was not thinking. If user wants to sort values of (multi_)map for some keys it still can be done. We just need some iterator adaptor that has uvalue(it) defined as *it.
This feels like adding a hack to undo your earlier hack. Why not go the other way? When you want to sort the underlying sequence, define an adaptor that returns the underlying values, by making
*it
return the result of*it.base()
? (All the adapted views have iterators with abase()
member.) That way, you always know what is getting sorted.The algorithms should not treat the iterators of adapted views specially. When you sort the range
[first,last)
, you sort the elements in the range denoted byfirst
andlast
. This is how the STL works. I still see no reason to change it.Sorting a transform view makes no sense. You can sort the underlying sequence, but that is no guarantee that the elements of the transform view are sorted. What does sort mean if this isn’t guaranteed to be true:
???
I can’t think of a use for sorting a zip view. But there’s nothing wrong in theory with doing it. A zip of a
vector<int>
and avector<short>
hastuple<int &, short &>
as its reference type. These actually can be sorted, once I changeswap
to handle rvalues. (That’s on my TODO list since the Palo Alto proposal allows proxy random-access iterators.)How do you propose to implement
iter_swap
for a zip view? What is the underlying sequence that gets sorted?The idea was to make sorting a transformed view to produce the same result as sorting underlying sequence with projection. That is, during the process of sorting we swap underlying values, but comparison is performed for transformed values.
And in that case is_sorted(rng) should return true after sort. (Under assumption that sequence was tranformed with pure function. Then I do not see how is_sorted may return something else than true).
For zip_view of containers of ints and shorts underlying value should be tuple<int&, short&> if swap works correctly on those, so the result would be the same.
Let’s say we have this:
We both agree that when we traverse
rng
, we see the elements [-5,4,-3,2,-1,0], right? Now, you want to be able to sortrng
, and have it sort the underlying sequence:If we allowed that,
v
would contain [0,1,2,3,4,5], and when we iteraterng
, we see the elements [0,-1,2,-3,4,-5]. Right?So, is the sequence [0,-1,2,-3,4,-5] sorted? Of course not.
You say you would like
is_sorted
to check if the underlying sequence is sorted. But that’s not useful. If I am writing generic code that accepts a range of elements and does a bulk insert into a database (let’s say), I need to know that the elements in the sequence are sorted. I shouldn’t have to worry about whether the range I’m passed is actually a transform view, andis_sorted
is going to lie to me. I want to know that I’m not going to corrupt my database by bulk inserting non-sorted data.Please, please desist from this
iter_swap
madness. It’s a terrible idea.I do not understand why after sort(rng) v contains [0,1,2,3,4,5]. Of course it should contain [5,3,1,0,2,4].
Basically we do not sort underlying sequence, we sort range, changing the order of elements of underlying sequence. That might be bad idea, but it’s not that bad as you describe.
OK, glad we sorted out (har har) the misunderstanding. At a gut level, I still feel like changing
iter_swap
is the wrong thing to do, but I need to think about why.Sorry, I meant std::sort of course.
Re. projection, you may not need overloads, but you have to implement the projection in each algorithm. compare_less/compare_greater you are going to implement twice, that’s it.
When designing a library, I think that for every functionality which can be decomposed into smaller functionality, which is useful by itself, the smaller functionality should be offered as well. This is clearly the case for compare_less/compare_greater, which can be used in all places where comparators are used and where projections are currently not offered. std::set comes to mind, but also countless user-defined functions, where the original implementer did not bother to offer projection.
In fact, compare_less/greater can in turn be implemented in terms of transform_args, which, unlike bind, transforms every argument of a function the same way.
Support for projections falls out nicely if you are supporting invokables.
I agree, which is why
compare_less
should be decomposed intoless
andbind
. In fact, it’s already done. 🙂One of the reasons why it’s desirable to keep the projection and the comparison separate for the sake of (potentially) asymmetric comparisons, like that done by
lower_bound
. Consider this example, that I got from Sean Parent that sorts a sequence by a member, and then useslower_bound
to find an element:Notice how, without projections, you had to write the comparison twice in two different ways, since the comparison for
sort
needs to be symmetric, but the one forlower_bound
needs to be asymmetric. With projections, it looks like this:With projections, we gain symmetry. What does this example look like with
compare_less
?compare_less:
actually, it decomposes into std::less and transform_args, which transforms both arguments the same way, which bind doesn’t do.
Re. lower_bound:
We write these cases as
lower_bound( transform(rng,func), x).base()
The asymmetry with
sort( rng, compare_less(func) )
irks me, but not enough to require changing all functions taking a comparator to taking a comparator and a projection. A (user-defined) algorithm taking only a comparator is a fine algorithm. I’d like to have the tools to use it with projections, even if the implementor did not build the projection capability into his implementation.
I’m still on the fence about this new strong divide beetwen view/action. On the plus side, it’s much clearer now what is lazy and what is eager.
But on the other hand :
1) It introduces a whole new class of error, if user try to pipe view and action together, or try to pipe a container into a view etc. The compiler errors generated by those kind of mistake are baffling. (see later)
2) Seem less powerful overall (?) : I remember that 6 months ago nearly all algorithms were pipeable. Now only a few selected views and actions are pipeable.
To see how far we can go with range-v3, I tried to port this D article over to to C++ :
http://wiki.dlang.org/Component_programming_with_ranges
My implementation is here :
https://github.com/Arzar/RangeFunV3
I love this article because it show the whole zoo of rangy things, like complicated view such as groupBy view (so a view that return view), lot of range piping, range chaining etc. But it also solve a practical case : How to display a calendar. And a calendar is an human-made concept with lots of little quirks, so It feel closer to day-to-day programming, when you often have to fiddle with imperfect things, than all those pure mathematical examples that functional circles seems so fond of…
Anyway I found it very hard to port it to C++ with range-v3. Here are some difficulty I encountered :
1) One of the biggest pain point, how to register new algo/view/action into the pipe system.
In D, it’s almost magical, you just have to add empty(), front() and popFront() to any classes, then your class is considered a range and get automatically plugged to the whole range ecosystem (all algorithm + range adaptor). I do think most C++ folks underestimate how wonderfully composable D range are.
It’s much more difficult in range-v3. For example I tried to code a pythonic “join” algorithm and make it pipeable. But join return a value (an accumulation over the range) and I didn’t figure out how to have a pipeable entity that return a value. In the current range-v3 there is only “view” that take a range by rvalue reference and return a view, or “action” that take a range by reference and return void. So I ended up coding an half-assed operator| for my join algo.
2) Compile error when using view/action.
6 months ago, I remember using all the time ranges::for_each as some kind of printf for range. So with the current range-v3, first thing I did was to tried something similar :
std::vector&lt;int&gt; v = {1, 2, 3, 4, 5};
v | view::for_each([](int i){std::cout &lt;&lt; i &lt;&lt; std::endl;});
Well it don’t compile, and the compiler errors are utterly baffling. There are lots of mention of “transform_view”, “flatten_view”, “view::all” in the compiler backtrace that doesn’t even seem related to for_each view. Also there is a lot of concept check in the backtrace but they are quite useless and even a bit misleading. For example this one :
CONCEPT_ASSERT(Iterable&lt;concepts::Invokable::result_t&lt;F, range_value_t&lt;Rng&gt;&gt;&gt;());
It confused me a lot at first, because I thought that Rng = std::vector and F is the lamba, so all seems good. I realized later that at this point we are already quite deep into the library and Rng is actually a really complicated type, something like transform_view<flatten_view<std::vector&, lambda(int)>>. So the concept check fail but the type is too complicated for me to understand why.
Now that range-v3 clearly distinguish view from action, I wonder if you could enhance the library to detect and catch thoses typical mistake early and produce some user-friendly error message ?
“You can’t pipe a view after a container”
“You can’t pipe a view after an action”
“You can’t pipe an action after a view, suggestion : pipe a to_vector between them”
3) Mysterious compile error. I wrote a forward iterator with boost iterator (in day.hpp, day_iterator). When using it with range-v3 views/algo I got a lot of weird compiler errors in the internals of the libray.
v3/size.hpp return size(detail::forward(rng));
/utility/basic_iterator.hpp return range_access::current(pos());
After lot of effort I found that the compiler error went away when adding a to_distance() function inside my iterator. I don’t know why.
Thanks for taking the time to give all this feedback. It’s very valuable that i know the sorts of problems people are running into. I’ll investigate and get back to you. If you send me an email, I’ll follow up with you personally once i have answers for you.
EDIT 1: I’m not ready to address your whole comment yet (it probably deserves its own blog post), but I can address parts. First, this:
Well, it seems you’ve mixed up the algorithms (eager) and the views (lazy). It’s true, 6 months ago, the algorithms were pipeable this way. Now the pipes are for views and actions, and the ordinary algorithms are not pipeable. (It was an experiment that was doomed to failure– the algorithms don’t have return types that lend themselves to chaining.) If you want to use
ranges::for_each
forprintf
, you still can:Even better, all ranges in range-v3 now have ostream inserters, so you can write it like this:
If you’re wondering what
view::for_each
is for, check out my blog post on Range Comprehensions.Your point about bad compiler errors is well-taken, though. I am trying to trade off a few things, and I may not have the balance quite right. See, I’m emulating Concepts Lite using SFINAE in function signatures. That means that when a concept check fails, you get a rather unhelpful error like “no overload found that matches [complex-type-1] and [complex-type-2]. Here are the 50 overloads that failed to match and why. [compiler vomit]” That’s not friendly.
Instead, I could move more concept checks inside the functions. That way, the call will resolve but then generate an error that is more likely to be helpful, like a static assertion that a type is copyable, or that two types are convertible, or whatever. This is hard and takes a lot of thought, though.
I do take bad compiler errors seriously, though. If you file a bug, I’ll address it.
EDIT 2:
You’ve seen the
c_string_range
example from the Quick Start, right? Withrange_facade
, you can implement ranges withdone()
,current()
, andnext()
members. It’s not hard.You’re having difficulty because you’re fighting the idioms of the library. You’re trying to implement an algorithm, not a view or an action. (It looks like an
intersperse
followed by afold
, in Haskell-ese.) Your algorithm computes a value instead of producing another iterable. Both views and actions produce iterables, which is why chaining makes sense for them. It’s doesn’t make sense for algorithms that are simply computing a value.I would implement an
intersperse
view and use it to intersperse whitespace into a range of strings, and then pass the resulting view toranges::accumulate
, which would concatenate all the strings together.This calendar example is a treasure trove, by the way. Thanks for pointing me to it. One of the other problems you’re having with range-v3 is that it’s still evolving. Your job would have been much easier if range-v3 already had
intersperse
andsplit
(aka D’schunkBy
). This is pretty basic functionality that I still haven’t gotten around to implementing. So much to do.Something else:
I know you do not aggregate rvalue containers in range adaptors. Do you aggregate rvalue ranges in range adaptors? What about lvalue ranges?
Certainly. Otherwise, view chaining would be broken.
They are aggregated by value, which is how it should be IMO. They are proxies; there’s no reason to hold them by reference.
We agree 🙂
Do you distinguish between containers and ranges in your library, or do you also aggregate lvalue containers by value?
I distinguish between ranges and containers. Ranges are always aggregated by value, containers only if they are rvalues (and I wouldn’t mind if this behavior would be available only explicitly, but I would like to have it). And such adaptors aggregating rvalue containers by value are themselves containers in terms of deep constness etc.
I distinguish between ranges and containers. Ranges are always aggregated by value, containers only if they are rvalues (and I wouldn’t mind if this behavior would be available only explicitly, but I would like to have it). And such adaptors aggregating rvalue containers by value are themselves containers in terms of deep constness etc.
This description sounds exactly like what I expected. And I wouldn’t mind as well if this would only be available explicitly.
For reasons I’ve given here and elsewhere, using the rvalue-ness of the container to decide whether to aggregate it by value or reference is a dangerous and confusing interface. Giving users a way to opt in would be far safer. I could imagine an interface like this:
Of course, then
view
is the wrong name. And I personally like the fact that I can count on views always being lightweight objects with reference semantics. So I don’t love the above, but I guess I don’t hate it either. I’ll try not to make any decisions that preclude it, so that if we end up wanting to add it later, we can.I have one concern about the fact that ranges never “own” values and it is about return values.
It is a common pattern in our production code to have a function return a “type-erased iterable” (typically for big queries that take a lot of time to execute). Notice that I wrote “iterable” and not “range”. When calling the function, the user doesn’t have to care about the following:
Because of (3), returning any_range is not an option, since a range can never own anything. With range-v3, what would be the alternative? Did you ever consider something like any_iterable ?
In your application, are these type-erased iterables mutable or immutable? It sounds like they’re immutable. In this case, you can have your cake and eat it by defining a range type that stores a
shared_ptr<vector<int> const>
. Thenany_range
will just work for you, and you’ll still get O(1) copy/assign.Yes they are immutable.
Uh, shared_ptr? I guess it could work, but at the cost of an extra allocation, and an extra indirection when accessing the range… I’m not sure it sounds like a satisfying solution, at least from a performance perspective.
Also, isn’t holding the vector in the range with a shared_ptr kind of like saying that the range is an owning one? Or a least, we can say that it is kind of ’emulating’ an owning one. If this indeed works, we can expect implementations of ‘owning versions’ of ranges to spread in a lot of code bases. Don’t you think it would be worth trying to support them in a clean way from the beginning? (I’m not suggesting that it is easy to do…)
Think for a minute about how you would implement your
any_iterable
type if it could hold either a range or a container. If it’s holding a container, then when you copy theany_iterable
, you need to copy the container. Copying containers is expensive. And you’re worried about the cost of ashared_ptr
?Perhaps I’ve confused things by talking about element ownership. The real issue is value semantics vs reference semantics. Ranges have reference semantics — copies create aliases. Containers have value semantics — copies are independent. A range that holds a shared reference to a container has reference semantics even though the ownership is shared among the ranges.
When your data is immutable, you have referential transparency. There’s no difference between value semantics and reference semantics when your data is immutable. Sure, you could create a bunch of copies of the same immutable value, but why? It’s unnecessary and terribly inefficient, especially if your value is something expensive to copy like a container. Functional language share immutable data extensively. Mostly they use garbage collection. In C++, we use reference counting.
So yes,
shared_ptr
is your friend here. Don’t look down your nose at it. Sometimes it’s the right tool — especially when it’s spelledshared_ptr< T const >
.EDIT: If the extra indirection bugs you, you might look into
boost::shared_array
.I understand your point that reference semantics allow fast copy/assign. I am just not convinced that it is always essential to have fast copy/assign. For return values, we just need move to be efficient. I don’t see expensive copies as a reason to discard owning ranges.
Yes, I’m worried about the cost of a shared_pointer because if I knew that a range could be an owning one, I would simply not copy it around. The same way that I don’t copy vectors or any container (unless when it is really needed) because I know that it is expensive. So if I manage to avoid copies, the biggest overhead I’ll be left with will be the one from the shared_ptr.
About boost::shared_array, it makes the solution a bit better as you say. At first sight it seems to be similar to shared_ptr<T[]> (N4077).. (note that some extra work might be needed to make it an iterable; it doesn’t seem to be one)
Downside :
– It still requires an extra allocation
– A vector (or any sequence container) could not be moved into an owning range implemented with a shared_array
If we’re talking about extra allocations. Do begin() and end() of any_range require allocation or not?
Yes, as far as i can tell. It needs to return something that type-erases an iterator, and type-erasure means dynamic allocation (unless you implement the Small Object Optimization, i suppose)
So, I’m later to this one, but let me see if I’ve got this right…
An “iterable” is anything that can have iterator ranges referencing it. An iterable may itself be an iterator range, or it may be the thing that actually owns the values — be it a container, an array, or something less obvious.
“View algorithms” take iterables and return iterables that also happen to be iterator ranges: their input may be the actual, owning set of values or it may be a chain of other iterator ranges going back to an actual, owning set of values eventually. The major reason for unifying owning and non-owning iterables (non-owning iterables themselves being iterator ranges) under one “iterable” interface is so that iterator ranges can be made in this way: either directly from their end thing or from any of their brethren in between them and it. “View algorithms” are named with the noun, “view”, because a “view” is this intermediate thing with reference semantics that they spit out; they certainly don’t do anything themselves other than generate that view to spit out and that will be used later, which later use is where the real action will finally be done (during the access/dereferencing through that reference). There isn’t a good verb name for “view algorithms” because their whole point is to give you an intermediate object that will do the doing later, rather than do the doing themselves.
In contrast, “container algorithms” perform transformation on the various non-view types of iterables and spit out the transformed value-set. “Container” doesn’t really belong in their name since the iterable they work with might be an array. Heck, they might even take input as something really weird like an istream iterator reading a finite stream length, as long as they can take ownership of the value and process the whole thing before spitting it back out. They don’t have a good noun name based on the thing they spit out (like a “view” is what’s obtained from a “view algorithm”) because we don’t have a good unified noun name for “output of iteration that, while it’s almost certainly also iterable, is an actual thing — value semantics — and not an iterator range with reference semantics”. But, if my understanding up to this point is correct, they shouldn’t even have a noun name if a good one were available — what they spit out isn’t the interesting part, what they do is, so they should have a verb name. They should be named for the fact that they do the doing (the transformations), in contrast to “view algorithms” that don’t do the doing, merely giving you an intermediate “view” that will do the doing when you need it to.
So, if I haven’t made any mistakes, the alternative to “view” really should be a verb rather than a noun because the contrast between verb and noun reflects the difference of action implied by the difference between the reference semantics and the value semantics. If I’m right about that, it would further explain why “container algorithms” has been such a confusing name (I’m pretty sure the sort of confusion we’ve witnessed isn’t limited to thinking that they act on containers and not other types). And it would largely explain why most of the suggestions for new names aren’t nouns, and some of the best are verbs.
I’m for “mutate” personally, because it’s less vague than “action” and fits better with the move semantics you’ve also argued for rather than copy semantics — for copy semantics, “make” would be better than “mutate”.
All that’s my two cents’ worth, and hoping I got the premises to my analyses correct. Gotta say, I like the range-based-on-iterators idea and the views idea in the first place and am looking forward to seeing those get into the standard library even if I stick to the traditional algorithms (in their range rather than iterator pair versions) for most of my needs.
Hi,
I understand if you dont want to be my personal SO, but this may be even beneficial to you(wrt to efficiency of your ranges proposal).
Recently I was looking at some theoretical append_to_vector(input_vector*, append_me_container& ) algorithm and I noticed one lame thing about vector insert overloads. If you want to append container to vector only way it can preallocate memory if container iters are RA(since std::distance is O(1)), but problem is that if for example you want to append keys of a map or set or your vector is of pair<A.B> and you append map<A,B> there is no way go give vector insert number of elements to be inserted so it can reallocate vector if elements of container will not fit in vector.capacity()-vector.size() space.
You can obviously call reserve in your append_to_vector but that is reimplementing vector growth strategy algorithm :/ . So do you think we would need overload added to http://en.cppreference.com/w/cpp/container/vector/insert
One with (iterator insert( const_iterator pos, InputIt first, size_t n ); That copies n elements starting from first, obviously UB if that would go over edge of the world(.end() of container from where first is).
It’s a legitimate point. That’s why in my proposal, I separate the concept of “SizedRange” from ordinary Range, and even “SizedIteratorRange” for iterator/sentinel pairs that know their size even if the iterator is not random access. (Consider a range made from an iterator and a count.)
In the future, this problem will be fixed by making the containers aware of these concepts. In the shorter term, we’ll need to do something more hack-ish, like what you suggest.