This post describes a fun piece of hackery that went into my Range-v3 library recently: a Python-like range slicing facility with cute, short syntax. It’s nothing earth-shattering from a functionality point of view, but it’s a fun little case study in library design, and it nicely illustrates my philosophy of library design.
Python Slicing
In Python, you can slice a container — that is, create a view of a contiguous subrange — using a very concise syntax. For instance:
>>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g'] >>> letters ['a', 'b', 'c', 'd', 'e', 'f', 'g'] >>> # access a subrange with a slice operation >>> letters[2:5] ['c', 'd', 'e'] >>> # replace some values >>> letters[2:5] = ['C', 'D', 'E'] >>> letters ['a', 'b', 'C', 'D', 'E', 'f', 'g']
On line 5, we access elements of the list letters
in the half-open sequence [2,5) using the syntax letters[2:5]
. Short and sweet. On line 8, we assign through the slice, which mutates the underlying letters
list. That proves that Python slices have reference semantics.
That’s not all that the Python slice operator can do. You can leave off slice offsets, in which case Python takes a smart default:
>>> # A missing first offset means "from the beginning" >>> letters[:5] ['a','b','C', 'D', 'E'] >>> # A missing end offset means "to the end" >>> letters[5:] ['f','g']
You can even slice from the end with negative offsets:
>>> # Take the last two elements: >>> letters[-2:]
This is all pretty handy and really cool.
Old-Style Slicing in C++ with Range-v3
My range-v3 library has had a slice operation for a long time now, but it wasn’t as powerful and the syntax wasn’t as cool:
using namespace ranges; auto letters = view::iota('a','g'); std::cout << letters << '\n'; // prints: {a,b,c,d,e,f,g} std::cout << (letters | view::slice(2,5)) << '\n'; // prints: {c,d,e}
In the above code, view::iota
is a view that generates all the characters from 'a'
to 'g'
(inclusive), and view::slice
is a view of the elements from offset 2 through 5 (exclusive). As with Python’s slice, this slice is lightweight and non-owning.
This syntax is not terrible per se, but it’s certainly not as fun as Python’s. And view::slice
didn’t accept negative offsets to slice from the end, so it wasn’t as powerful, either.
New-Style Slicing in C++ with Range-v3
First, I wanted to find a nice short-form for creating slices, so I took a page from the array_view
proposal, which has a really, really clever syntax for indexing into a multi-dimensional array. Here’s an example lifted straight from the proposal:
char a[3][1][4] {{{'H', 'i'}}}; auto av = array_view<char, 3>{a}; // the following assertions hold: assert((av.bounds() == bounds<3>{3, 1, 4})); assert((av[{0, 0, 0}] == 'H'));
Lines 1-2 declare a 3-D array of characters and then creates a 3-D view of it. Line 5 is where the magic happens. It accesses the element at the (0,0,0) position with the slightly alien-looking av[{0,0,0}]
syntax. What the heck is this?!
It’s really very simple: a novel use of uniform initialization syntax. Consider this type:
struct indices { std::size_t i, j, k; }; struct my_array_view { double & operator[](indices x); };
Now I can index into a my_array_view
object with the av[{0,0,0}]
syntax. Neat-o!
I realized I could use this trick to give people a super-short and cute syntax for slicing ranges. Check it out:
using namespace ranges; auto letters = view::iota('a','g'); std::cout << letters << '\n'; // prints: {a,b,c,d,e,f,g} std::cout << letters[{2,5}] << '\n'; // prints: {c,d,e}
Hey, that’s not half bad!
Slicing From the End, A Dilemma
But that’s not enough. I want the handy slice-from-the-end functionality. But here’s where things get a bit … interesting … from a library design perspective. Not all range types support slicing from the end. To see what I mean, consider a range of ints read from an istream
. This is an input range. You don’t know the end until you reach it, which means that you don’t know the last-minus-N element until you’re N elements past it!
In other words, the following code makes no sense:
using namespace ranges; // An input range of ints read from cin auto ints = istream<int>(std::cin); // I'm sorry, I can't do that, Dave: std::cout << ints[{0,-2}] << '\n';
The istream range returned by istream
totally knows at compile time that it can’t be sliced from the end. But whether the offsets are negative or positive is a runtime property, so it can’t be checked at compile time. That would make this a runtime failure. Ugh.
To make matters worse, the rules about what categories of ranges accept negative offsets are surprisingly subtle. Consider this variation of the code above:
using namespace ranges; // Take the first 10 ints read from cin: auto ints = istream<int>(std::cin) | view::take(10); // This should work! It should take the first 8 ints: std::cout << ints[{0,-2}] << '\n';
In this case, we’ve taken the first 10 integers from an istream. The ints
range is still an input range, but it’s a sized input range. Now we can slice from the end because we know where the end is.
And if we have a forward range, we can always slice from the end, even if we don’t know where that is (e.g. a null-terminated string), by computing the length of the sequence and then advancing distance-minus-N from the front (although that’s not always the most efficient way to do it).
And you should never specify a negative offset if the range is infinite. Never, ever, ever.
It gets even more subtle still: if both offsets are negative, or if both offsets are non-negative, then the resulting slice knows its size in O(1); otherwise, it only knows its size if the underlying range knows its size. When the O(1)-sized-ness of a range is part of the type system, it enables all sorts of optimizations. If we don’t know the sign of the offsets until runtime, we can’t ever return a type that advertises itself as sized.
My point is that the rules for when it’s OK to slice from the end are subtle — far too subtle to leave the error reporting until runtime. And doing so leaves valuable optimizations on the floor.
Slicing From the End, A Solution
The solution I came up with was to disallow negative offsets with an unconditional assert. But wait before you flame me! I added an alternate syntax for denoting an offset from the end. Check it out:
using namespace ranges; auto letters = view::iota('a','g'); std::cout << letters << '\n'; // prints: {a,b,c,d,e,f,g} std::cout << letters[{2,end-2}] << '\n'; // prints: {c,d,e}
Instead of using a negative offset, we say end-2
to mean the 2nd from the end. What is end
here? It’s the same end
function that you call to get the end of an Iterable (think std::end
), only in my library it’s not a function; it’s a function object. (For more about why I chose to make begin
and end
global function objects instead of free functions, see my blog post about Customization Point Design.) Since end
is an object, I can define an overloaded operator-
that takes end
on the left-hand-side and an int on the right. That can return an object of some type that makes the from-the-end-ness of the offset a part of the type system.
struct from_end { int i; }; from_end operator-( decltype(ranges::end), int i ) { assert(i >= 0); // No funny business, please return {i}; }
Now I can define an overloaded operator[]
on my range type that accepts a std::pair<int,from_end>
:
struct my_range { // callable as rng[{2,end-2}] slice_view<my_range> operator[](std::pair<int, from_end> p) { // ... slicing happens here } };
Voilà! Now I get slicing from the end with a short, readable syntax and compile-time type checking without leaving any optimization opportunities on the floor.
Yes, But…
That’s great and all, but code like “rng[{2,-2}]
” still compiles and fails at runtime. How is the situation any better? The difference now is that passing a negative offset to slice is always a runtime error. There is no situation in which it will succeed and do what you want, even if the range type could conceivably support it. Users will quickly learn that that isn’t the way to do it.
Had we allowed negative offsets in a way that sometimes worked and sometimes didn’t, it makes the interface far more dangerous. Users will try it, meet with some success, and conclude incorrectly that it will always work. They’ll discover their error the hard way after their application has been deployed.
Which brings me to my Philosophy of Library Design:
I can’t keep people from writing bad code. But I’m guilty of collusion if I make it easy.
And a corollary that relates to this problem:
If you can’t make something succeed consistently, it’s better to make it fail consistently.
Hope you enjoyed this little case study in library design.
Acknowledgements
I’d like to thank Chandler Carruth for drawing my attention to the pithy coolness of Python’s slice operator.
Footnote:
In the C++ containers, the indexing operation is only allowed for random-access containers, where the element can be accessed in O(1). Here, I’m allowing users to slice ranges with an indexing-like notation, even though it could be an O(N) operation. I’m currently undecided if slicing is sufficiently different from indexing to justify this decision. Thoughts welcome.
Pingback: 1p – A Slice of Python in C++ – Offeryour.com Blog
Would it not be possible to constexpr-overload the indice constructor to catch negative values that can be caught at the compile time?
Unfortunately,
constexpr
doesn’t work that way. The arguments to aconstexpr
function are neverconstexpr
themselves within the function.Why not just take an unsigned int to avoid making an assertion at runtime?
See my reply to Joe just below.
Why not use unsigned integer types? Then no assert would be needed in [] or operator-
Because signed negative numbers wrap to really huge positive unsigned numbers. If someone tries to pass -1 to a function that takes a
std::size_t
, then that function might not be able to easily detect the mistake. The compiler may or may not warn, the user may or may not notice the warning in a mountain of other warning, and people are so used to seeing warnings about signed/unsigned integer mismatch that they just add a cast and don’t think.Far better IMO to provide an interface where all integers are signed. And an assert is much harder to ignore than a compiler warning.
I thought that might be the reason. I always use warnings as errors and a fairly high warning level in new code so signed/unsigned errors have become pretty rare.
Using size_t for slicing is kind of consistent with indexing of standard containers. Most probably wrapping to huge unsigned will result in segfault anyway when trying to access that location which is roughly the same effect as assert. And you can only access half the range of size_t with signed types. And it is very easy to ignore assert if you never build in debug mode (for example if application becomes just too slow there to actually debug anything). And with this choice you take burden from people who ignore warnings and use ints as indexes, to those who behave and use size_t container indexes and are aware of how signed/unsigned wrapping works.
Otherwise I really love what you are doing in Range-v3. Hope to see it standardized soon (hopefully before/instead of string/array_view) .
The standard library’s use of unsigned integers for indexing is somewhat inconsistent with its use of signed integers for iterator distance. You can even “index” a random access iterator with
it[n]
, andn
is required to be signed. So I’m consistent with the signed indexing of standard random-access iterators, but not with the containers. <shurg>You can’t be serious. You’d rather allow buffer overruns and hope for the best than catch the bug with an assert?
IMO, if you don’t run in debug mode ever, you get what you deserve.
Fair point, but the burden of signed/unsigned conflict is not just carried by those who care enough to get it right. It’s carried by everybody, including the teaming hoards of people who ignore the warnings and add casts without testing for potential over- or underflow. That’s the vast majority of programmers, IMO.
The standard library’s use of unsigned integers for indexing is somewhat inconsistent with its use of signed integers for iterator distance
But that is different! There an `index’ can in fact be negative. It is rather an index difference, so that deferefencing by itself is a bit evil.
You can’t be serious. You’d rather allow buffer overruns and hope for the best than catch the bug with an assert?
Now that you put it in a bit different words that sounds wild indeed 🙂 – In general I would not, but it is all about the tradeoffs. Actually, if you know the end position (and you do seem to need knowing it for this to work), you can still harness the element access with assert, also with unsigned indexes. And that will catch even wider range of bugs.
IMO, if you don’t run in debug mode ever, you get what you deserve.
Talking about the teaming hoards I happen to stuck in one which does not do debug builds/runs. Although I admit it is not so common as ignoring warnings and skipping overrun checks.
It’s certainly an advantage that an unsigned integer can represent bigger numbers when the usage requires that the number be positive. But that has to be weighed against the downsides. Will this index ever be the result of arithmetic, potentially on signed numbers? How important is it for the called function to be able to check its preconditions; e.g. that the index isn’t negative? Does the interface have signed integers in other places that will force users to cast to unsigned?
Here’s a for instance: Have you ever had an iterator that you needed to convert to an offset; e.g. to return to a user so they can index into some array? You call distance with two iterators and get back a signed int. Now you cast it to unsigned. If you accidentally called
distance(end,begin)
instead ofdistance(begin,end)
you got a negative number which you just cast to a huge positive number. Now you buffer overrun. Oops. See?Would it work equally well to take a size_t for slicing, and assert(index < size_t{-1})? It’s the same actual range either way, but in the off chance someone has a really huge range, it would assert in debug builds (and work fine in release builds), rather than invoke undefined signed overflow. The point, though, is that it would clearly document the non-negativity requirement.
(In exchange, it makes the upper bound less obvious, but that’s much less likely an actual issue.)
That assert would trigger only when
index
was exactly equal tosize_t{-1}
. That doesn’t seem very useful. I must be misunderstanding your suggestion.For all those suggesting unsigned instead of signed, please watch http://channel9.msdn.com/Events/GoingNative/2013/Interactive-Panel-Ask-Us-Anything with Bjarne Stroustrup, Herb Sutter, Andrei Alexandrescu, Scott Meyers, Sean Parent, Chandler Carruth, Michael Wong, and Stephan T. Lavavej.
Signed vs unsigned comes up 3 times (12min, 42min, 63min) and 3 times they tell you to always use signed unless you are doing bit manipulation or really really want modulo arithmetic. The fact that STL uses size_t for indexing is basically a “mistake” (and Chandler says “sorry” at the 63min mark) 🙂
The trick with end function object is neat. However, I have some reservations about overloading subscript operator for slicing ranges.
First of all it cannot be done for containers, as they use subscript operator for different purpose. Even if we decide to implement slicing for containers, for some containers the proposed syntax already has meaning (flat_map<tuple<int,int>,int>, for example), while they may be effectively sliced and may benefit from lightweight syntax for slicing.
So, there’s teaching issue. It should be explained to user that it can use [] for slicing some iterables, but not all of them, and if user wants to use such syntax for container he has to create range from it first.
Finally, operator [] cannot be overloaded non-intrusively. Inside library it is possible to add subsript operator to every range, but what about user-defined ranges? They should be as good as the ones provided with the library. Should user derive (sic!) his range from some base class to add support for slicing?
Ouch. Actually, you can’t use
{0,0}
to construct a tuple because its constructor is explicit, but I get your point.I’m not sure it’s such a huge deal. I’m not saying that the cute slicing operation is a required part of the Iterable or Range concepts, it’s just a convenience that some types provide.
There are lots of convenience accessors that a range type may want to provide in addition to
begin
andend
. Straight indexing for random-access ranges,front
,back
, etc., etc. All the range types in my library provide them if they can by inheriting from a CRTP base calledrange_iterface
. An end user can derive from this type, too — or they could provide slicing themselves or not provide it if they don’t want to.I wish the commitee would adopt the unified call syntax proposal by Herb Sutter. Then the most of convenience methods could be just free template functions (free but constrained :)).
Well, you have started to have a “slice of Python” in C++ and you have ended up with having a “slice of Tcl”. Reasonable people always end up with reasonable solutions :D.
(see “man lrange” in POSIX system, as long as you installed tcl-doc package).
I’m curious whether you have been inspired by that, or just found out that independently. 😀
I have never been exposed to Tcl. But from this I see that they also went for the
end-2
syntax. Interesting! 🙂I think C++ can take some useful lessons from Tcl, especially when it comes to modules.
But even though this “slicing” syntax lrange command uses can be useful in C++, I’m generally not a fan of how ranges are implemented in Tcl – because they are not “gap-oriented” as in STL-based C++ standard library, which makes it not possible to define a universal definition of a range (especially an empty range – in Tcl actually the only way to get an “empty range” is to use “end+1 end+1” or “-1 -1”, which are simply invalid ranges and lrange or string range commands result in returning an empty list in these cases). I mean, the case when you have to return from a function a reference to a container and a pair of iterators that comprise the range (which may happen to be empty in some special cases).
Why use the type
int
instead of some unsigned integral type, such asunsigned int
orstd::size_t
?See my response to Joe above. It’s so that I can catch more errors.
Principle of least surprise: don’t let anyone slice with O(n), it is confusing. My suggestion is to have a common named as in std::advance and just give the nice syntax to what it is cheap.
<sigh> You’re probably right. I may have gotten carried away. FWIW, creating the slice is O(1). Moving to the first element might be O(N). That doesn’t fundamentally change the equasion, though.
I’m very sorry I couldn’t help but
Python slices also allow an optional 3rd argument: ‘step’.
E.g.: [1,2,3,4,5,6][::2] -> [1,3,5].
Please consider providing it.
And as the above example shows, Python slices can omit arguments, falling back on defaults [0,size,1]. Maybe you could use a similar trick as your ‘end’, to allow things like: a[{start,end,2}]. More verbose than “[::2]”, but we probably can’t do better.(?)
I’ve considered it. Python lets you stride with a negative offset, which in effect reverses the traversal. The problem for striding is the same as for slicing: not all range types support it. There would need to be another clever syntax, and then the number of overloads of
operator []
explodes. I’ve left it out for now, but may as it later.By the way, you don’t need to use
begin
for the start offset of a slice.0
always means the same thing, and it’s shorter.Thanks for thinking about it, and even thinking about negative offsets! I hope you’ll get to that one day…
Yep, 0 instead of ‘start’ or ‘begin’ would work, but [{0, end}] just looks inconsistent to me! 😛
I like having slices and I think they should be very easy to do, (they deserve special treatment) but I think using operator[] for slicing is more confusing than just providing a “slice” function.
letters[{2, 5}];
looks more confusing to me than
letters.slice(2, 5);
If something can have a short name, I almost always prefer naming it over using an operator overload.
I’ve also found that when people use operator overloading, they try to cram everything into additional overloads. Where if you start using names, it’s all of a sudden easy to just create additional named functions. For example if you’re already using names, it doesn’t seem too bad to create
letters.sliceFrom(4) // would be letters[4:] in python
letters.sliceFrom(letters.end() – 4) //would be letters[-4:] in python
letters.sliceTo(4) // would be letters[:4] in python
letters.sliceTo(letters.end() – 4) // would be letters[:-4] in python
Where with operator overloads the first design would be to somehow get all of this functionality into overloads of one function.
So for example creating a slice with a stride can also get its own function:
letters.stride(2) // would be letters[::2]
letters.strideBackward(2) // would be letters[::-2]
letters.slice(2, letters.end() – 2).stride(4) // would be letters[2:-2:4]
Yes its more verbose, but not very. If it was just
letters.slice(2, -2).stride(4)
then that is actually more readable than the Python version.
So if slices should get special treatment, I would add several named member functions instead of overloading operator[].
Looks like you are a Java guy trapped in C++ programmer body 😉
Take that back.
I agree with this. I really like the ‘end’ solution, and this use of operator[] is clever, but I think it is much less clear. Overloading operators is nice when the use is intuitive. In this case I think the word ‘slice’ is more clear.
OK, that seems to be a popular sentiment. Perhaps I was being too clever. But I hesitate to make
slice
a member. Why slice and not the others? Where does it stop?So one really big benefit of operator overloading is that you can do assignment like this:
letters[2:5] = [‘C’, ‘D’, ‘E’]
Which is from your Python example. I think the fact that you didn’t bring this example up in your C++ code was what made me think that I would prefer a normal member function. And then once it’s a normal member function it might as well be a nonmember function.
If you do not support assignment like in the above example, I’d say just make it a nonmember function. If you do support assignment, I would actually almost prefer it to be an operator overload again.
That being said assignment using slice syntax is not used that often in Python. I have seen code that genuinely had to do this:
range[1::2] = -range[1::2]
but this is the exception rather than the rule. In fact it happens so rarely that I would bet that most Python programmers have to think really hard to tell you what the above statement does. (If you want to run this: this was using a numpy array which has operator- overloaded) Here is a C++ raw loop that does the same thing:
for (size_t i = 1; i < range.size(); i += 2)
range[i] = -range[i];
Python lets you do that, but not my range library. The semantics get weird. For all kinds of reasons, I want ranges to have reference semantics. Allowing assignment through a range would be like saying that pointer assignment should change the value of the thing pointed to. Obviously, that’s not true.
<nod>
I don’t see a ‘reply’ as the threads get longer. This response actually goes with Eric’s reply:
I might need to think a bit more about your reference semantics comment, but the first two thoughts that come to mind are:
I would expect a view to be a mutable reference to the objects that it is ‘viewing’. Just like std::tie.
It seems to me that assignment of a range would be like assignment of ptr (as you describe):
ptr1 = ptr2;
Doesn’t affect the things pointed at by ptr1.
… and while letters[2:5] might return a range view object… it looks a lot more like:
ptr1[2] = 42;
I think I would be a little surprised if:
range[2] = 42;
behaved differently.
I’m sure my mental view will be adjusted (o;
You have it right. A range has reference semantics like a pointer, and if the range’s reference type permits mutation, then you can mutate through the range by assigning to an element of the range. The question that was being discussed is whether assigning to the range itself should mutate the elements to which the range refers. The answer is no, it does not. So for example:
I think I got that right. 🙂 In particular, note the assignment to
rng
on line 7. It does not mutate the vectorv
. Hope that helps.Pingback: Непросмотренные ссылки – 6 | Откомпилируй Это
I am curious what problems of the views::slice you are trying to resolve with subscrib syntax. The main disadvantage of this syntax is that it is ambiguus in case of several containers:
1. Already mentonied map<std::pair<int, int>, int> m; The syntax m[{1,2}] is aready well definied.
2. Even your article mentoins that the trick you are using is also used for idexing of array_view. And as you know the array_view is also a range, so
array_view<int, 2> ar; ar[{1,2}] – would be aslo amibugus. It now gives elements at index [1][2] but will you meaning could return a range of rows.
Altought I really like the end-n apporach for indexing a range for the end, and I think it should be extended to slice, so we can for example wirite slice(end-5,end-2). I would also extend it for consitency with overload of begin, so it would be possible to write slice(begin, end-2). In addition this will give an nice way to create a strided version of slice: slice(begin, end, 2) would skip one elements.
Ugh. I’m beginning to agree with you. Crud.
Yep, you already can.
You’re the second person that’s asked for that. I initially had it but took it out because it seemed superfluous. Maybe I should put it back.
Yeah, but see my response to Gerald here.
For adding stride for the slice: The slice with a stride and slice without a slice could produce a different objects (overload on 2 or 3 arguments), so then it would be possible to generate a compiler error if the views::slice(begin, end, 2) is applied on range that does not support strides.
That much is true, and it’s not the problem. The problem is that not all range types support negative strides, because not all ranges can be iterated in reverse. So we need a way to distinguish between
view::slice(from, to, 2)
andview::slice(from, to, -2)
. We could useend-2
for the step, but it doesn’t read as well. And if we permit 3 different types in the first position, 3 in the second, and an optional third with 2 different types, the number of overloads explodes.I see. I think that the problems lies in the fact that I want to cluther to much functionality in the single view. With great compasability of the views, I can just use more than one view to achieve desired functionality:
slice(from, to, 2) -> slice(from, to) | stride(2)
slice(from, to, -2) -> slice(from, to) | reverse | stride(2)
And each of this components has clear requirements on the input range. Just another example how important is to stick to SRP.
Right, and that’s how it works today. There are separate
stride
andreverse
views that are single-responsibility types. It might be nice forstride
to conditionally support reverse iteration, but it’s not critical.Are you planing to include special overload for the plain end? If I write range | slice(2, end), there is no need to iterate over whole range and value of actual end iterator, even if range is using sentinel. Such slice would be also compatible with input ranges. Altought I think we can just stick to range | slice(2) for such situation. Otherwise there will difference in behaviour of slice(2, end) and slice(2, end-0). Other option would be to disallow using 0 in end-n syntax in addition to negative numbers.
There already is such an overload.
view::slice(2,end)
is equivalent toview::drop(2)
. BTW, all the view operations are lazy, slice and drop included. Creating the views is O(1) always. For non-random-access ranges, you only pay when you move to the front of the range.Wouldn’t allowing stride with negative numbers strigthen the requirments on this view to accept BidirectionalRanges only? With the positive only indexes it should accepts InputRanges. How would you resolve such problem? By throwing exception if non revesable range was passed to stride with negative number?
I like the reverse | stride combination because all concept checking can be done at compile time.
That’s why we would need a special syntax for specifying negative strides, like
end-2
. You just read this article, right? 😉I have red the article, but writting stride(end-2) does not actualy make sense (in contrast to using end as a position for slice). Alternatively something like stride(reverse(2)) or stride(reverse, 2) /* would require stride(forward, 2) */ does not provide any benefit in comparision to reverse | stride(2), so for me only option that actualy make sense is to use stride(-2).
“There already is such an overload. view::slice(2,end) is equivalent to view::drop(2). BTW, all the view operations are lazy, slice and drop included. Creating the views is O(1) always. For non-random-access ranges, you only pay when you move to the front of the range.”
I would like you to clarify things a bit for me. If I have a biderection non-sized finite range with a sentinel used as end (linked list, with only head stored).
If I write following code:
range | view::slice(n, end)
The complexity of the creating a new range from the view application would be O(n) we just need to move front iterator.
If I write following one instead:
range | view::slice(n, end + k)
The complexity of creating range would be O(size(range) + k) plus the value determined before O(n), because for the slice I must actually iterate front until the iterator is equal to the sentinel value, and then move it backwards K times.
That would mean that the cost of using view::sliced(2, end) and view::sliced(2, end + 0) is different. The fact that ranges are lazy created does not change this fact, becaue finally the view will be created.
It’s true. Fortunately, I don’t see many people saying
end-0
, but still it might be nice to point this out in the docs.One other thing. Creating the view is always O(1). You don’t pay the O(N) until you call begin or end. There’s no need to compute the end if you do this, for instance:
front(view::slice(2, end-2))
. Views are very lazy.Interesting thing is that forcing the begin+n, end-k syntax would allow you to avoid most of ambiguity problemes: the possibility that container will have an subsrib operator that accepts type of std::begin or your custom from_begin is minimal.
Altought I am still not convinced to use operator[], the other argument aside of possible ambiguity is that the declartion of operator[] is intrusive. That means that pipe and named function syntax can be applied to any object that models appropariate container while the subsribt will work only for claesses were it was already implemented or you are able to modify it source.
If we have a slice, we are expected to be able to index it. So if indexing takes O(n), we have a problem. It would look to me as if we had += instead of std::advance -> it does not warn me about the cost of the operation, even if the slice is O(1). Maybe you can return a slice with [{a,b}] but the kind of slice you return for lists/maps cannot be indexed, but the ones for arrays can. That would be a solution.
This doesn’t follow. Do you mean, “if we have slice with operator[] we are expected to be able to index it”? That is something I might be able to agree with. I think I’m reaching a point where
operator[]
to slice might have been a mistake unconditionally, not just for non-random access ranges. I should probably just take it out.Sorry, I didn’t say correctly there. What I mean actually is: an operator[] for a slice is expected to be random-access. Otherwise, return a slice that does not support that. Because operator[] is expected to be cheap, I would restrict it to random-access.
This question spans over the context of several post, so I will place it them. Lets consider the following range construct that uses sentinal as an end: double linked list, that marks its both ends via making the value of next/prev as null. The iterator for this conteiner is just raw pointer to the node, and has following operations:
dereference: it -> it->value;
next: it -> it->next;
prior: it -> it->prev;
And sentinel for this range is just the null pointer. If we have set of linked nodes [first, last] we can forwrad and backward iterate from every node in that range to the sentinel and preceding/asceding nodes. From the first glance this range looks like BidirectionalRange.
But, it actually does not work with the algorithms that requires BidirectionalRanges. The problems lays in the fact that current assumption about the types ranges is that we can backward iterate from the end iterator to the last node. In our case, the end iterator value is also null pointer, so we cannot reach the last node from it. Passing such range to algorithms like reserve will cause segmentation fault: the iterator returned by next_to(first, null-sentinel) will return null pointer. The one solution would be to chech the emptiness of the ragne first and then use function like prior_to(first, null-sentinel) that return value of iterator that when is incremented will return iterator equal to setinel. However that would broke generic reverse iterator from stl that has following semantics:
dereference: it -> dereference(prior(it))
next: it -> prior(it)
prior: it -> next(it)
So the rbegin is pointing to actual end and moving backward from it.
Do the requirments to be able to traverse from end (determined from sentinel) are essential for STL algorithm or it would be possible to support such langes? If we have a list without specialy alocated sentinel end node to provide traverse, then it would be possible to have noexcept move/default constructor – no allocation would be needed for empty list.
This design runs foul of the requirements for BidirectionalIterator. It goes like this: begin and end return iterators. If begin is not equal to end, it is incrementable and dereferencable. After you increment a bidirectional iterator, it is decrementable, and the result of decrementing the iterator is dereferencable. If those axioms don’t hold, you don’t have a bidirectional iterator. Sorry.
What you have instead is two forward sequences, one that is always the reverse of the other. That’s still an interesting data structure, and you can design its interface around that. For example:
forward()
returns a Range that traverses the elements forward;backward()
is a Range that goes the other way. HTH.That fine. But this lead me to another equestion. How do you determine the category of delimter range. I mean if you get a range wher begin() is Iterator of category C and has a sentinel, then for the algorithms and views, you assume that the category of the range is C? Can sentinel type impact the category of the range.
In the structure I have presented, both forward_iter and reverse_iter should support increment and decrement and convertible to each other. And if I take a subrange of the list that does not include the end element I will always get a bidirectional range. So it is with your current design to create ranges in the way that:
forward_iterator, forward_sentinel will model a ForwardRange, while forward_iterator and forward_iterator will mode BidirectionalRange.
It need not. A delimited random access range still has random access iterators. It’s the user’s responsibility not to advance past the delimiter. Think: null-terminated string.
For the reasons I mentioned, they can’t be. You could provide functions
to_forward_iter
andto_reverse_iter
that takes both an iterator and thebidi_list
and does the conversion for you. That’s the best you can do.“For the reasons I mentioned, they can’t be. You could provide functions to_forward_iter and to_reverse_iter that takes both an iterator and the bidi_list and does the conversion for you. That’s the best you can do.”
If only pass the iterator that points to actual node (not nulls) I can freely convert between without passing the bidrectional model.
But from broader prespective, if looks like I have found that the half-open range [begin, end) cannot reprent every range that could be represented as closed range [begin, end], and vice versa (the second one cannot represent empty range). And STL (even with your extension) support only half open ranges.
Yes, but then the function incompletely covers its domain.
If you’re saying that a half-open range of bidirectional iterators cannot represent this range, then I agree. Half-open forward iterators cover the range perfectly well.
Yes.
“If you’re saying that a half-open range of bidirectional iterators cannot represent this range, then I agree. Half-open forward iterators cover the range perfectly well.”
Yes, I felt a bit dumb, after I finally found that I created a structure that can be represented as bidirectional closed range but only forward half open range and still was trying to fit into STL design.
Not sure why you’d feel dumb. This is a finer point of the STL’s design. Once you grok this, you have a deep understanding of how the iterator categories work. EDIT: In fact, this is worthy of a blog post. 🙂
I was from begining aware of the fact that STL supports only half-open ranges, but the bidirectionalism of this structure seem to obvious, and I was trying to bend and tweak the rules to get it in iterator representation. And after it got to me that it only applies to the closed range, I felt like a child trying to put cube into triangle hole.
As we discussed above in the structure that I have presented the range [first, last) cannot be represented as bidirectional range, while its every subrange [x, y) actually can be.
That led me to following question, does ranges v3 allow to provide your own overloads for slice view so the:
tk_bidi_list | view::slice(begin+k,end-n)
Would retturn a bidirectional range, even if the tk_bidi_list is forward one? That is actually true for described structure if k > 0 and n > 0.
Also have you considered changing the requirments in the way that for the begin+n,end-n argument of the range, n would be required to be greater than 0 (instead of greater equal). That would allow implementation to detect at compile time is end is included in created subrange:
1. end-n – end never included
2. end – end included
Also this would resolve problem with suprising difference in complexity in slice(0, end-0) and slice(0,end) for ranges with sentinels.
Object slicing has been explain well here. Thanks for the information! You can find more on object slicing with examples here below.
Object slicing in Cpp with examples
Using an
int
for an index always makes me sad. What if on a 64-bit machine we will use an index greater thanINT_MAX
? It will just silently wrap around. And the fact that it will rarely happen makes the error only much worse. If it is signed it should definitely beptrdiff_t
or an equivalenttypedef
/using
.As an old APL aficionado I would like to see somthing like:
vector v = {0,1,2,3,4,5};
Note iota as in APL.
v[iota(3)] giving 0, 1, 2
Slicing would be done like this
v[iota(3) + 2] giving 2, 3, 4
Striding would be accomplished by
v[2 * iota[3] + 1] giving 1, 3, 5
Reverse iterating could be done by
v[4 – iota(5)] giving 4, 3, 2, 1, 0
Pretty easy to read and not a lot of different classes and arguments to cope with.
I could imagine a range library that works that way, but that’s not my range library. The APL syntax is pretty foreign in a C++ context.
Your statement about Python slices having reference semantics is incorrect. Most containers (including lists and bytearrays) copy on slice. The reason slice assignment works for them is because it is a special syntax that calls a different method.
If you want slices that are truly a view rather than a copy, you have to use memoryview (or a custom data structure).
Thanks!
New here.
ranges::view::slice() works as expected when supplied with compile-time constants.
Is there a way to slice when the parameters are int variables computed at runtime?
It should work just fine when the parameters are known only at runtime.
Strange. I’ll see what’s up with my compiler, then. Given the error message I was getting (something like no function found whenever I used variables as input params), I’d assumed the params needed to be constexpr.
Thanks, Eric. I think it is fantastic work. I’ll be sad if the committee doesn’t go with your proposal for C++17. 🙂
The error message I was getting was “error: no matching function for call to object of type ‘const ranges::v3::view::view'”, without more specifics.
Turns out the end parameter was vector::size(), which is is size_t (aka unsigned long on my compiler), and ranges::view::slice() overloads require signed parameters.
I know there have been discussions about this and your library, and the lack of consistency in the Standard Library, but I wanted to let others know, in case they run into similar issues. (By the way, I’m much happier with “Consistent Failure” than arbitrary behavior, so thank you for picking a position and sticking with it throughout the design of your library.)
As for me, I’ll cast the returned value to signed, along with an assert that the returned value’s unsigned size does not exceed 2^31-1.
The link to “Customization Point Design” is broken. Thank me later o7
I find it sad you’re using “auto” everywhere. Right now I’m trying to figure out how to store a range and a slice inside a struct, i.e. I want the type. I have no idea where to find this information, in all examples the type is explicit.
I know this is a very old article, but slices aren’t exactly new; array slices were available in Ada in the early 1980s, if not the late 1970s. However, in Ada, if you ask for a range from 2 to 5, you get the range from 2 to 5, not from 2 to 4.