A Slice of Python in C++

Posted on December 7, 2014 by Eric Niebler — 79 Comments ↓

This post describes a fun piece of hackery that went into my Range-v3 library recently: a Python-like range slicing facility with cute, short syntax. It’s nothing earth-shattering from a functionality point of view, but it’s a fun little case study in library design, and it nicely illustrates my philosophy of library design.

Python Slicing

In Python, you can slice a container — that is, create a view of a contiguous subrange — using a very concise syntax. For instance:

>>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> letters
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> # access a subrange with a slice operation
>>> letters[2:5]
['c', 'd', 'e']
>>> # replace some values
>>> letters[2:5] = ['C', 'D', 'E']
>>> letters
['a', 'b', 'C', 'D', 'E', 'f', 'g']

On line 5, we access elements of the list letters in the half-open sequence [2,5) using the syntax letters[2:5]. Short and sweet. On line 8, we assign through the slice, which mutates the underlying letters list. That proves that Python slices have reference semantics.

That’s not all that the Python slice operator can do. You can leave off slice offsets, in which case Python takes a smart default:

>>> # A missing first offset means "from the beginning"
>>> letters[:5]
['a','b','C', 'D', 'E']
>>> # A missing end offset means "to the end"
>>> letters[5:]
['f','g']

You can even slice from the end with negative offsets:

>>> # Take the last two elements:
>>> letters[-2:]

This is all pretty handy and really cool.

Old-Style Slicing in C++ with Range-v3

My range-v3 library has had a slice operation for a long time now, but it wasn’t as powerful and the syntax wasn’t as cool:

using namespace ranges;
auto letters = view::iota('a','g');
std::cout << letters << '\n';
// prints: {a,b,c,d,e,f,g}
std::cout << (letters | view::slice(2,5)) << '\n';
// prints: {c,d,e}

In the above code, view::iota is a view that generates all the characters from 'a' to 'g' (inclusive), and view::slice is a view of the elements from offset 2 through 5 (exclusive). As with Python’s slice, this slice is lightweight and non-owning.

This syntax is not terrible per se, but it’s certainly not as fun as Python’s. And view::slice didn’t accept negative offsets to slice from the end, so it wasn’t as powerful, either.

New-Style Slicing in C++ with Range-v3

First, I wanted to find a nice short-form for creating slices, so I took a page from the array_view proposal, which has a really, really clever syntax for indexing into a multi-dimensional array. Here’s an example lifted straight from the proposal:

char a[3][1][4] {{{'H', 'i'}}};
auto av = array_view<char, 3>{a};
// the following assertions hold:
assert((av.bounds() == bounds<3>{3, 1, 4}));
assert((av[{0, 0, 0}] == 'H'));

Lines 1-2 declare a 3-D array of characters and then creates a 3-D view of it. Line 5 is where the magic happens. It accesses the element at the (0,0,0) position with the slightly alien-looking av[{0,0,0}] syntax. What the heck is this?!

It’s really very simple: a novel use of uniform initialization syntax. Consider this type:

struct indices
{
    std::size_t i, j, k;
};
struct my_array_view
{
    double & operator[](indices x);
};

Now I can index into a my_array_view object with the av[{0,0,0}] syntax. Neat-o!

I realized I could use this trick to give people a super-short and cute syntax for slicing ranges. Check it out:

using namespace ranges;
auto letters = view::iota('a','g');
std::cout << letters << '\n';
// prints: {a,b,c,d,e,f,g}
std::cout << letters[{2,5}] << '\n';
// prints: {c,d,e}

Hey, that’s not half bad!

Slicing From the End, A Dilemma

But that’s not enough. I want the handy slice-from-the-end functionality. But here’s where things get a bit … interesting … from a library design perspective. Not all range types support slicing from the end. To see what I mean, consider a range of ints read from an istream. This is an input range. You don’t know the end until you reach it, which means that you don’t know the last-minus-N element until you’re N elements past it!

In other words, the following code makes no sense:

using namespace ranges;
// An input range of ints read from cin
auto ints = istream<int>(std::cin);
// I'm sorry, I can't do that, Dave:
std::cout << ints[{0,-2}] << '\n';

The istream range returned by istream totally knows at compile time that it can’t be sliced from the end. But whether the offsets are negative or positive is a runtime property, so it can’t be checked at compile time. That would make this a runtime failure. Ugh.

To make matters worse, the rules about what categories of ranges accept negative offsets are surprisingly subtle. Consider this variation of the code above:

using namespace ranges;
// Take the first 10 ints read from cin:
auto ints = istream<int>(std::cin) | view::take(10);
// This should work! It should take the first 8 ints:
std::cout << ints[{0,-2}] << '\n';

In this case, we’ve taken the first 10 integers from an istream. The ints range is still an input range, but it’s a sized input range. Now we can slice from the end because we know where the end is.

And if we have a forward range, we can always slice from the end, even if we don’t know where that is (e.g. a null-terminated string), by computing the length of the sequence and then advancing distance-minus-N from the front (although that’s not always the most efficient way to do it).

And you should never specify a negative offset if the range is infinite. Never, ever, ever.

It gets even more subtle still: if both offsets are negative, or if both offsets are non-negative, then the resulting slice knows its size in O(1); otherwise, it only knows its size if the underlying range knows its size. When the O(1)-sized-ness of a range is part of the type system, it enables all sorts of optimizations. If we don’t know the sign of the offsets until runtime, we can’t ever return a type that advertises itself as sized.

My point is that the rules for when it’s OK to slice from the end are subtle — far too subtle to leave the error reporting until runtime. And doing so leaves valuable optimizations on the floor.

Slicing From the End, A Solution

The solution I came up with was to disallow negative offsets with an unconditional assert. But wait before you flame me! I added an alternate syntax for denoting an offset from the end. Check it out:

using namespace ranges;
auto letters = view::iota('a','g');
std::cout << letters << '\n';
// prints: {a,b,c,d,e,f,g}
std::cout << letters[{2,end-2}] << '\n';
// prints: {c,d,e}

Instead of using a negative offset, we say end-2 to mean the 2nd from the end. What is end here? It’s the same end function that you call to get the end of an Iterable (think std::end), only in my library it’s not a function; it’s a function object. (For more about why I chose to make begin and end global function objects instead of free functions, see my blog post about Customization Point Design.) Since end is an object, I can define an overloaded operator- that takes end on the left-hand-side and an int on the right. That can return an object of some type that makes the from-the-end-ness of the offset a part of the type system.

struct from_end { int i; };

from_end operator-( decltype(ranges::end), int i )
{
    assert(i >= 0); // No funny business, please
    return {i};
}

Now I can define an overloaded operator[] on my range type that accepts a std::pair<int,from_end>:

struct my_range
{
    // callable as rng[{2,end-2}]
    slice_view<my_range>
    operator[](std::pair<int, from_end> p)
    {
        // ... slicing happens here
    }
};

Voilà! Now I get slicing from the end with a short, readable syntax and compile-time type checking without leaving any optimization opportunities on the floor.

Yes, But…

That’s great and all, but code like “rng[{2,-2}]” still compiles and fails at runtime. How is the situation any better? The difference now is that passing a negative offset to slice is always a runtime error. There is no situation in which it will succeed and do what you want, even if the range type could conceivably support it. Users will quickly learn that that isn’t the way to do it.

Had we allowed negative offsets in a way that sometimes worked and sometimes didn’t, it makes the interface far more dangerous. Users will try it, meet with some success, and conclude incorrectly that it will always work. They’ll discover their error the hard way after their application has been deployed.

Which brings me to my Philosophy of Library Design:

I can’t keep people from writing bad code. But I’m guilty of collusion if I make it easy.

And a corollary that relates to this problem:

If you can’t make something succeed consistently, it’s better to make it fail consistently.

Hope you enjoyed this little case study in library design.

Acknowledgements

I’d like to thank Chandler Carruth for drawing my attention to the pithy coolness of Python’s slice operator.

Footnote:

In the C++ containers, the indexing operation is only allowed for random-access containers, where the element can be accessed in O(1). Here, I’m allowing users to slice ranges with an indexing-like notation, even though it could be an O(N) operation. I’m currently undecided if slicing is sufficiently different from indexing to justify this decision. Thoughts welcome.

"\e"

"\e"

79 Replies to “A Slice of Python in C++”

Pingback: 1p – A Slice of Python in C++ – Offeryour.com Blog
krzaq on December 7, 2014 at 11:37 pm said:

Would it not be possible to constexpr-overload the indice constructor to catch negative values that can be caught at the compile time?

Reply ↓
- Eric Niebler on December 8, 2014 at 9:13 am said:
  
  Unfortunately, constexpr doesn’t work that way. The arguments to a constexpr function are never constexpr themselves within the function.
  
  Reply ↓
peroket on December 8, 2014 at 12:17 am said:

Why not just take an unsigned int to avoid making an assertion at runtime?

Reply ↓
- Eric Niebler on December 8, 2014 at 9:33 am said:
  
  See my reply to Joe just below.
  
  Reply ↓
Joe on December 8, 2014 at 1:53 am said:

Why not use unsigned integer types? Then no assert would be needed in [] or operator-

Reply ↓
- Eric Niebler on December 8, 2014 at 9:19 am said:
  
  Because signed negative numbers wrap to really huge positive unsigned numbers. If someone tries to pass -1 to a function that takes a std::size_t, then that function might not be able to easily detect the mistake. The compiler may or may not warn, the user may or may not notice the warning in a mountain of other warning, and people are so used to seeing warnings about signed/unsigned integer mismatch that they just add a cast and don’t think.
  
  Far better IMO to provide an interface where all integers are signed. And an assert is much harder to ignore than a compiler warning.
  
  Reply ↓
  - Joe on December 8, 2014 at 11:53 am said:
    
    I thought that might be the reason. I always use warnings as errors and a fairly high warning level in new code so signed/unsigned errors have become pretty rare.
    
    Reply ↓
  - Slava on December 9, 2014 at 5:41 am said:
    
    Using size_t for slicing is kind of consistent with indexing of standard containers. Most probably wrapping to huge unsigned will result in segfault anyway when trying to access that location which is roughly the same effect as assert. And you can only access half the range of size_t with signed types. And it is very easy to ignore assert if you never build in debug mode (for example if application becomes just too slow there to actually debug anything). And with this choice you take burden from people who ignore warnings and use ints as indexes, to those who behave and use size_t container indexes and are aware of how signed/unsigned wrapping works.
    Otherwise I really love what you are doing in Range-v3. Hope to see it standardized soon (hopefully before/instead of string/array_view) .
    
    Reply ↓
    - Eric Niebler on December 9, 2014 at 9:27 am said:
      
      Using size_t for slicing is kind of consistent with indexing of standard containers.
      
      The standard library’s use of unsigned integers for indexing is somewhat inconsistent with its use of signed integers for iterator distance. You can even “index” a random access iterator with it[n], and n is required to be signed. So I’m consistent with the signed indexing of standard random-access iterators, but not with the containers. <shurg>
      
      Most probably wrapping to huge unsigned will result in segfault anyway when trying to access that location which is roughly the same effect as assert.
      
      You can’t be serious. You’d rather allow buffer overruns and hope for the best than catch the bug with an assert?
      
      And it is very easy to ignore assert if you never build in debug mode
      
      IMO, if you don’t run in debug mode ever, you get what you deserve.
      
      And with this choice you take burden from people who ignore warnings and use ints as indexes, to those who behave and use size_t container indexes and are aware of how signed/unsigned wrapping works.
      
      Fair point, but the burden of signed/unsigned conflict is not just carried by those who care enough to get it right. It’s carried by everybody, including the teaming hoards of people who ignore the warnings and add casts without testing for potential over- or underflow. That’s the vast majority of programmers, IMO.
      
      Reply ↓
      - Slava on December 9, 2014 at 10:23 am said:
        
        The standard library’s use of unsigned integers for indexing is somewhat inconsistent with its use of signed integers for iterator distance
        
        But that is different! There an `index’ can in fact be negative. It is rather an index difference, so that deferefencing by itself is a bit evil.
        
        You can’t be serious. You’d rather allow buffer overruns and hope for the best than catch the bug with an assert?
        
        Now that you put it in a bit different words that sounds wild indeed 🙂 – In general I would not, but it is all about the tradeoffs. Actually, if you know the end position (and you do seem to need knowing it for this to work), you can still harness the element access with assert, also with unsigned indexes. And that will catch even wider range of bugs.
        
        IMO, if you don’t run in debug mode ever, you get what you deserve.
        
        Talking about the teaming hoards I happen to stuck in one which does not do debug builds/runs. Although I admit it is not so common as ignoring warnings and skipping overrun checks.
      - Eric Niebler on December 10, 2014 at 10:01 am said:
        
        The standard library’s use of unsigned integers for indexing is somewhat inconsistent with its use of signed integers for iterator distance
        
        But that is different! There an `index’ can in fact be negative. It is rather an index difference, so that deferefencing by itself is a bit evil.
        
        It’s certainly an advantage that an unsigned integer can represent bigger numbers when the usage requires that the number be positive. But that has to be weighed against the downsides. Will this index ever be the result of arithmetic, potentially on signed numbers? How important is it for the called function to be able to check its preconditions; e.g. that the index isn’t negative? Does the interface have signed integers in other places that will force users to cast to unsigned?
        
        Here’s a for instance: Have you ever had an iterator that you needed to convert to an offset; e.g. to return to a user so they can index into some array? You call distance with two iterators and get back a signed int. Now you cast it to unsigned. If you accidentally called distance(end,begin) instead of distance(begin,end) you got a negative number which you just cast to a huge positive number. Now you buffer overrun. Oops. See?
      - Todd Greer on December 12, 2014 at 8:18 am said:
        
        Would it work equally well to take a size_t for slicing, and assert(index < size_t{-1})? It’s the same actual range either way, but in the off chance someone has a really huge range, it would assert in debug builds (and work fine in release builds), rather than invoke undefined signed overflow. The point, though, is that it would clearly document the non-negativity requirement.
        
        (In exchange, it makes the upper bound less obvious, but that’s much less likely an actual issue.)
      - Eric Niebler on December 12, 2014 at 9:35 am said:
        
        That assert would trigger only when index was exactly equal to size_t{-1}. That doesn’t seem very useful. I must be misunderstanding your suggestion.
      - Tony Van Eerd on February 23, 2015 at 4:53 pm said:
        
        For all those suggesting unsigned instead of signed, please watch http://channel9.msdn.com/Events/GoingNative/2013/Interactive-Panel-Ask-Us-Anything with Bjarne Stroustrup, Herb Sutter, Andrei Alexandrescu, Scott Meyers, Sean Parent, Chandler Carruth, Michael Wong, and Stephan T. Lavavej.
        Signed vs unsigned comes up 3 times (12min, 42min, 63min) and 3 times they tell you to always use signed unless you are doing bit manipulation or really really want modulo arithmetic. The fact that STL uses size_t for indexing is basically a “mistake” (and Chandler says “sorry” at the 63min mark) 🙂
Gregory on December 8, 2014 at 2:10 am said:

The trick with end function object is neat. However, I have some reservations about overloading subscript operator for slicing ranges.

First of all it cannot be done for containers, as they use subscript operator for different purpose. Even if we decide to implement slicing for containers, for some containers the proposed syntax already has meaning (flat_map<tuple<int,int>,int>, for example), while they may be effectively sliced and may benefit from lightweight syntax for slicing.

So, there’s teaching issue. It should be explained to user that it can use [] for slicing some iterables, but not all of them, and if user wants to use such syntax for container he has to create range from it first.

Finally, operator [] cannot be overloaded non-intrusively. Inside library it is possible to add subsript operator to every range, but what about user-defined ranges? They should be as good as the ones provided with the library. Should user derive (sic!) his range from some base class to add support for slicing?

Reply ↓
- Eric Niebler on December 8, 2014 at 9:26 am said:
  
  Even if we decide to implement slicing for containers, for some containers the proposed syntax already has meaning (flat_map<tuple<int,int>,int>, for example)
  
  Ouch. Actually, you can’t use {0,0} to construct a tuple because its constructor is explicit, but I get your point.
  
  I’m not sure it’s such a huge deal. I’m not saying that the cute slicing operation is a required part of the Iterable or Range concepts, it’s just a convenience that some types provide.
  
  Finally, operator [] cannot be overloaded non-intrusively. Inside library it is possible to add subsript operator to every range, but what about user-defined ranges? They should be as good as the ones provided with the library. Should user derive (sic!) his range from some base class to add support for slicing?
  
  There are lots of convenience accessors that a range type may want to provide in addition to begin and end. Straight indexing for random-access ranges, front, back, etc., etc. All the range types in my library provide them if they can by inheriting from a CRTP base called range_iterface. An end user can derive from this type, too — or they could provide slicing themselves or not provide it if they don’t want to.
  
  Reply ↓
  - Gregory on December 8, 2014 at 12:21 pm said:
    
    I wish the commitee would adopt the unified call syntax proposal by Herb Sutter. Then the most of convenience methods could be just free template functions (free but constrained :)).
    
    Reply ↓
Mikołaj Małecki on December 8, 2014 at 2:55 am said:

Well, you have started to have a “slice of Python” in C++ and you have ended up with having a “slice of Tcl”. Reasonable people always end up with reasonable solutions :D.

(see “man lrange” in POSIX system, as long as you installed tcl-doc package).

I’m curious whether you have been inspired by that, or just found out that independently. 😀

Reply ↓
- Eric Niebler on December 8, 2014 at 9:28 am said:
  
  I have never been exposed to Tcl. But from this I see that they also went for the end-2 syntax. Interesting! 🙂
  
  Reply ↓
  - Mikołaj Małecki on December 10, 2014 at 2:17 am said:
    
    I think C++ can take some useful lessons from Tcl, especially when it comes to modules.
    
    But even though this “slicing” syntax lrange command uses can be useful in C++, I’m generally not a fan of how ranges are implemented in Tcl – because they are not “gap-oriented” as in STL-based C++ standard library, which makes it not possible to define a universal definition of a range (especially an empty range – in Tcl actually the only way to get an “empty range” is to use “end+1 end+1” or “-1 -1”, which are simply invalid ranges and lrange or string range commands result in returning an empty list in these cases). I mean, the case when you have to return from a function a reference to a container and a pair of iterators that comprise the range (which may happen to be empty in some special cases).
    
    Reply ↓
cpplearner on December 8, 2014 at 3:25 am said:

Why use the type int instead of some unsigned integral type, such as unsigned int or std::size_t?

Reply ↓
- Eric Niebler on December 8, 2014 at 9:28 am said:
  
  See my response to Joe above. It’s so that I can catch more errors.
  
  Reply ↓
German Diago on December 8, 2014 at 5:08 am said:

Principle of least surprise: don’t let anyone slice with O(n), it is confusing. My suggestion is to have a common named as in std::advance and just give the nice syntax to what it is cheap.

Reply ↓
- Eric Niebler on December 8, 2014 at 9:32 am said:
  
  <sigh> You’re probably right. I may have gotten carried away. FWIW, creating the slice is O(1). Moving to the first element might be O(N). That doesn’t fundamentally change the equasion, though.
  
  Reply ↓
  - Kelly on December 22, 2018 at 9:47 pm said:
    
    I’m very sorry I couldn’t help but
    
    I can’t keep people from writing slow code. But I’m guilty of collusion if I make it easy.
    
    Reply ↓
Gerald Squelart on December 8, 2014 at 1:09 pm said:

Python slices also allow an optional 3rd argument: ‘step’.
E.g.: [1,2,3,4,5,6][::2] -> [1,3,5].

Please consider providing it.

And as the above example shows, Python slices can omit arguments, falling back on defaults [0,size,1]. Maybe you could use a similar trick as your ‘end’, to allow things like: a[{start,end,2}]. More verbose than “[::2]”, but we probably can’t do better.(?)

Reply ↓
- Eric Niebler on December 8, 2014 at 1:59 pm said:
  
  I’ve considered it. Python lets you stride with a negative offset, which in effect reverses the traversal. The problem for striding is the same as for slicing: not all range types support it. There would need to be another clever syntax, and then the number of overloads of operator [] explodes. I’ve left it out for now, but may as it later.
  
  By the way, you don’t need to use begin for the start offset of a slice. 0 always means the same thing, and it’s shorter.
  
  Reply ↓
  - Gerald Squelart on December 8, 2014 at 2:49 pm said:
    
    Thanks for thinking about it, and even thinking about negative offsets! I hope you’ll get to that one day…
    
    Yep, 0 instead of ‘start’ or ‘begin’ would work, but [{0, end}] just looks inconsistent to me! 😛
    
    Reply ↓
Malte Skarupke on December 8, 2014 at 7:48 pm said:

I like having slices and I think they should be very easy to do, (they deserve special treatment) but I think using operator[] for slicing is more confusing than just providing a “slice” function.

letters[{2, 5}];
looks more confusing to me than
letters.slice(2, 5);

If something can have a short name, I almost always prefer naming it over using an operator overload.

I’ve also found that when people use operator overloading, they try to cram everything into additional overloads. Where if you start using names, it’s all of a sudden easy to just create additional named functions. For example if you’re already using names, it doesn’t seem too bad to create

letters.sliceFrom(4) // would be letters[4:] in python
letters.sliceFrom(letters.end() – 4) //would be letters[-4:] in python
letters.sliceTo(4) // would be letters[:4] in python
letters.sliceTo(letters.end() – 4) // would be letters[:-4] in python

Where with operator overloads the first design would be to somehow get all of this functionality into overloads of one function.

So for example creating a slice with a stride can also get its own function:

letters.stride(2) // would be letters[::2]
letters.strideBackward(2) // would be letters[::-2]
letters.slice(2, letters.end() – 2).stride(4) // would be letters[2:-2:4]

Yes its more verbose, but not very. If it was just
letters.slice(2, -2).stride(4)
then that is actually more readable than the Python version.

So if slices should get special treatment, I would add several named member functions instead of overloading operator[].

Reply ↓
- Slava on December 9, 2014 at 5:50 am said:
  
  Looks like you are a Java guy trapped in C++ programmer body 😉
  
  Reply ↓
  - Eric Niebler on December 9, 2014 at 9:18 am said:
    
    Take that back.
    
    Reply ↓
- Jay Miller on December 10, 2014 at 6:55 am said:
  
  I agree with this. I really like the ‘end’ solution, and this use of operator[] is clever, but I think it is much less clear. Overloading operators is nice when the use is intuitive. In this case I think the word ‘slice’ is more clear.
  
  Reply ↓
  - Eric Niebler on December 10, 2014 at 10:14 am said:
    
    OK, that seems to be a popular sentiment. Perhaps I was being too clever. But I hesitate to make slice a member. Why slice and not the others? Where does it stop?
    
    Reply ↓
    - Malte Skarupke on December 10, 2014 at 4:39 pm said:
      
      So one really big benefit of operator overloading is that you can do assignment like this:
      
      letters[2:5] = [‘C’, ‘D’, ‘E’]
      
      Which is from your Python example. I think the fact that you didn’t bring this example up in your C++ code was what made me think that I would prefer a normal member function. And then once it’s a normal member function it might as well be a nonmember function.
      
      If you do not support assignment like in the above example, I’d say just make it a nonmember function. If you do support assignment, I would actually almost prefer it to be an operator overload again.
      
      That being said assignment using slice syntax is not used that often in Python. I have seen code that genuinely had to do this:
      
      range[1::2] = -range[1::2]
      
      but this is the exception rather than the rule. In fact it happens so rarely that I would bet that most Python programmers have to think really hard to tell you what the above statement does. (If you want to run this: this was using a numpy array which has operator- overloaded) Here is a C++ raw loop that does the same thing:
      
      for (size_t i = 1; i < range.size(); i += 2)
      range[i] = -range[i];
      
      Reply ↓
      - Eric Niebler on December 11, 2014 at 9:00 am said:
        
        So one really big benefit of operator overloading is that you can do assignment like this:
        
        letters[2:5] = ['C', 'D', 'E']
        
        Python lets you do that, but not my range library. The semantics get weird. For all kinds of reasons, I want ranges to have reference semantics. Allowing assignment through a range would be like saying that pointer assignment should change the value of the thing pointed to. Obviously, that’s not true.
        
        If you do not support assignment like in the above example, I’d say just make it a nonmember function.
        
        <nod>
      - Michael Caisse on December 18, 2014 at 12:57 pm said:
        
        I don’t see a ‘reply’ as the threads get longer. This response actually goes with Eric’s reply:
        
        I might need to think a bit more about your reference semantics comment, but the first two thoughts that come to mind are:
        
        I would expect a view to be a mutable reference to the objects that it is ‘viewing’. Just like std::tie.
        It seems to me that assignment of a range would be like assignment of ptr (as you describe):
        
        ptr1 = ptr2;
        
        Doesn’t affect the things pointed at by ptr1.
        
        … and while letters[2:5] might return a range view object… it looks a lot more like:
        
        ptr1[2] = 42;
        
        I think I would be a little surprised if:
        
        range[2] = 42;
        
        behaved differently.
        
        I’m sure my mental view will be adjusted (o;
      - Eric Niebler on December 18, 2014 at 1:13 pm said:
        
        You have it right. A range has reference semantics like a pointer, and if the range’s reference type permits mutation, then you can mutate through the range by assigning to an element of the range. The question that was being discussed is whether assigning to the range itself should mutate the elements to which the range refers. The answer is no, it does not. So for example:
        
        std::vector<int> v = view::ints(0,9); // v == {0,1,2,...9} auto rng = view::all(v) | view::slice(1,8); // rng == {1,2,...7} rng[0] = 42; // v == {0,42,2,...9} rng = view::all(v) | view::slice(2,9); // v == {0,42,2,...9} // rng == {2,3,..8}
        
        I think I got that right. 🙂 In particular, note the assignment to rng on line 7. It does not mutate the vector v. Hope that helps.
Pingback: Непросмотренные ссылки – 6 | Откомпилируй Это
tkamin on December 9, 2014 at 5:53 am said:

I am curious what problems of the views::slice you are trying to resolve with subscrib syntax. The main disadvantage of this syntax is that it is ambiguus in case of several containers:
1. Already mentonied map<std::pair<int, int>, int> m; The syntax m[{1,2}] is aready well definied.
2. Even your article mentoins that the trick you are using is also used for idexing of array_view. And as you know the array_view is also a range, so
array_view<int, 2> ar; ar[{1,2}] – would be aslo amibugus. It now gives elements at index [1][2] but will you meaning could return a range of rows.

Altought I really like the end-n apporach for indexing a range for the end, and I think it should be extended to slice, so we can for example wirite slice(end-5,end-2). I would also extend it for consitency with overload of begin, so it would be possible to write slice(begin, end-2). In addition this will give an nice way to create a strided version of slice: slice(begin, end, 2) would skip one elements.

Reply ↓
- Eric Niebler on December 9, 2014 at 9:32 am said:
  
  Ugh. I’m beginning to agree with you. Crud.
  
  I really like the end-n apporach for indexing a range for the end, and I think it should be extended to slice, so we can for example wirite slice(end-5,end-2)
  
  Yep, you already can.
  
  I would also extend it for consitency with overload of begin, so it would be possible to write slice(begin, end-2).
  
  You’re the second person that’s asked for that. I initially had it but took it out because it seemed superfluous. Maybe I should put it back.
  
  In addition this will give an nice way to create a strided version of slice: slice(begin, end, 2) would skip one elements
  
  Yeah, but see my response to Gerald here.
  
  Reply ↓
  - tkamin on December 10, 2014 at 12:54 am said:
    
    For adding stride for the slice: The slice with a stride and slice without a slice could produce a different objects (overload on 2 or 3 arguments), so then it would be possible to generate a compiler error if the views::slice(begin, end, 2) is applied on range that does not support strides.
    
    Reply ↓
    - Eric Niebler on December 10, 2014 at 10:13 am said:
      
      That much is true, and it’s not the problem. The problem is that not all range types support negative strides, because not all ranges can be iterated in reverse. So we need a way to distinguish between view::slice(from, to, 2) and view::slice(from, to, -2). We could use end-2 for the step, but it doesn’t read as well. And if we permit 3 different types in the first position, 3 in the second, and an optional third with 2 different types, the number of overloads explodes.
      
      Reply ↓
      - tkamin on December 10, 2014 at 11:02 am said:
        
        I see. I think that the problems lies in the fact that I want to cluther to much functionality in the single view. With great compasability of the views, I can just use more than one view to achieve desired functionality:
        slice(from, to, 2) -> slice(from, to) | stride(2)
        slice(from, to, -2) -> slice(from, to) | reverse | stride(2)
        And each of this components has clear requirements on the input range. Just another example how important is to stick to SRP.
      - Eric Niebler on December 10, 2014 at 11:05 am said:
        
        Right, and that’s how it works today. There are separate stride and reverse views that are single-responsibility types. It might be nice for stride to conditionally support reverse iteration, but it’s not critical.
      - tkamin on December 10, 2014 at 11:11 am said:
        
        Are you planing to include special overload for the plain end? If I write range | slice(2, end), there is no need to iterate over whole range and value of actual end iterator, even if range is using sentinel. Such slice would be also compatible with input ranges. Altought I think we can just stick to range | slice(2) for such situation. Otherwise there will difference in behaviour of slice(2, end) and slice(2, end-0). Other option would be to disallow using 0 in end-n syntax in addition to negative numbers.
      - Eric Niebler on December 10, 2014 at 11:25 am said:
        
        There already is such an overload. view::slice(2,end) is equivalent to view::drop(2). BTW, all the view operations are lazy, slice and drop included. Creating the views is O(1) always. For non-random-access ranges, you only pay when you move to the front of the range.
      - tkamin on December 10, 2014 at 11:17 am said:
        
        Wouldn’t allowing stride with negative numbers strigthen the requirments on this view to accept BidirectionalRanges only? With the positive only indexes it should accepts InputRanges. How would you resolve such problem? By throwing exception if non revesable range was passed to stride with negative number?
        
        I like the reverse | stride combination because all concept checking can be done at compile time.
      - Eric Niebler on December 10, 2014 at 11:26 am said:
        
        That’s why we would need a special syntax for specifying negative strides, like end-2. You just read this article, right? 😉
    - tkamin on December 10, 2014 at 11:43 am said:
      
      I have red the article, but writting stride(end-2) does not actualy make sense (in contrast to using end as a position for slice). Alternatively something like stride(reverse(2)) or stride(reverse, 2) /* would require stride(forward, 2) */ does not provide any benefit in comparision to reverse | stride(2), so for me only option that actualy make sense is to use stride(-2).
      
      Reply ↓
    - tkamin on December 10, 2014 at 12:26 pm said:
      
      “There already is such an overload. view::slice(2,end) is equivalent to view::drop(2). BTW, all the view operations are lazy, slice and drop included. Creating the views is O(1) always. For non-random-access ranges, you only pay when you move to the front of the range.”
      
      I would like you to clarify things a bit for me. If I have a biderection non-sized finite range with a sentinel used as end (linked list, with only head stored).
      
      If I write following code:
      range | view::slice(n, end)
      The complexity of the creating a new range from the view application would be O(n) we just need to move front iterator.
      
      If I write following one instead:
      range | view::slice(n, end + k)
      The complexity of creating range would be O(size(range) + k) plus the value determined before O(n), because for the slice I must actually iterate front until the iterator is equal to the sentinel value, and then move it backwards K times.
      
      That would mean that the cost of using view::sliced(2, end) and view::sliced(2, end + 0) is different. The fact that ranges are lazy created does not change this fact, becaue finally the view will be created.
      
      Reply ↓
      - Eric Niebler on December 10, 2014 at 2:40 pm said:
        
        That would mean that the cost of using view::sliced(2, end) and view::sliced(2, end + 0) is different.
        
        It’s true. Fortunately, I don’t see many people saying end-0, but still it might be nice to point this out in the docs.
        
        One other thing. Creating the view is always O(1). You don’t pay the O(N) until you call begin or end. There’s no need to compute the end if you do this, for instance: front(view::slice(2, end-2)). Views are very lazy.
- tkamin on December 11, 2014 at 10:07 am said:
  
  Interesting thing is that forcing the begin+n, end-k syntax would allow you to avoid most of ambiguity problemes: the possibility that container will have an subsrib operator that accepts type of std::begin or your custom from_begin is minimal.
  
  Altought I am still not convinced to use operator[], the other argument aside of possible ambiguity is that the declartion of operator[] is intrusive. That means that pipe and named function syntax can be applied to any object that models appropariate container while the subsribt will work only for claesses were it was already implemented or you are able to modify it source.
  
  Reply ↓
German Diago on December 10, 2014 at 6:59 pm said:

You’re probably right. I may have gotten carried away. FWIW, creating the slice is O(1). Moving to the first element might be O(N). That doesn’t fundamentally change the equasion, though.

If we have a slice, we are expected to be able to index it. So if indexing takes O(n), we have a problem. It would look to me as if we had += instead of std::advance -> it does not warn me about the cost of the operation, even if the slice is O(1). Maybe you can return a slice with [{a,b}] but the kind of slice you return for lists/maps cannot be indexed, but the ones for arrays can. That would be a solution.

Reply ↓
- Eric Niebler on December 11, 2014 at 8:57 am said:
  
  If we have a slice, we are expected to be able to index it.
  
  This doesn’t follow. Do you mean, “if we have slice with operator[] we are expected to be able to index it”? That is something I might be able to agree with. I think I’m reaching a point where operator[] to slice might have been a mistake unconditionally, not just for non-random access ranges. I should probably just take it out.
  
  Reply ↓
  - German Diago on December 30, 2014 at 11:18 am said:
    
    Sorry, I didn’t say correctly there. What I mean actually is: an operator[] for a slice is expected to be random-access. Otherwise, return a slice that does not support that. Because operator[] is expected to be cheap, I would restrict it to random-access.
    
    Reply ↓
tkamin on December 10, 2014 at 11:55 pm said:

This question spans over the context of several post, so I will place it them. Lets consider the following range construct that uses sentinal as an end: double linked list, that marks its both ends via making the value of next/prev as null. The iterator for this conteiner is just raw pointer to the node, and has following operations:
dereference: it -> it->value;
next: it -> it->next;
prior: it -> it->prev;
And sentinel for this range is just the null pointer. If we have set of linked nodes [first, last] we can forwrad and backward iterate from every node in that range to the sentinel and preceding/asceding nodes. From the first glance this range looks like BidirectionalRange.

But, it actually does not work with the algorithms that requires BidirectionalRanges. The problems lays in the fact that current assumption about the types ranges is that we can backward iterate from the end iterator to the last node. In our case, the end iterator value is also null pointer, so we cannot reach the last node from it. Passing such range to algorithms like reserve will cause segmentation fault: the iterator returned by next_to(first, null-sentinel) will return null pointer. The one solution would be to chech the emptiness of the ragne first and then use function like prior_to(first, null-sentinel) that return value of iterator that when is incremented will return iterator equal to setinel. However that would broke generic reverse iterator from stl that has following semantics:
dereference: it -> dereference(prior(it))
next: it -> prior(it)
prior: it -> next(it)
So the rbegin is pointing to actual end and moving backward from it.

Do the requirments to be able to traverse from end (determined from sentinel) are essential for STL algorithm or it would be possible to support such langes? If we have a list without specialy alocated sentinel end node to provide traverse, then it would be possible to have noexcept move/default constructor – no allocation would be needed for empty list.

Reply ↓
- Eric Niebler on December 11, 2014 at 9:08 am said:
  This design runs foul of the requirements for BidirectionalIterator. It goes like this: begin and end return iterators. If begin is not equal to end, it is incrementable and dereferencable. After you increment a bidirectional iterator, it is decrementable, and the result of decrementing the iterator is dereferencable. If those axioms don’t hold, you don’t have a bidirectional iterator. Sorry.
  
  What you have instead is two forward sequences, one that is always the reverse of the other. That’s still an interesting data structure, and you can design its interface around that. For example:
```
struct bidi_list
{
    range<forward_iter, forward_sent> forward();
    range<reverse_iter, reverse_sent> backward();
};
```
  forward() returns a Range that traverses the elements forward; backward() is a Range that goes the other way. HTH.
  Reply ↓
  - tkamin on December 11, 2014 at 9:39 am said:
    
    That fine. But this lead me to another equestion. How do you determine the category of delimter range. I mean if you get a range wher begin() is Iterator of category C and has a sentinel, then for the algorithms and views, you assume that the category of the range is C? Can sentinel type impact the category of the range.
    
    In the structure I have presented, both forward_iter and reverse_iter should support increment and decrement and convertible to each other. And if I take a subrange of the list that does not include the end element I will always get a bidirectional range. So it is with your current design to create ranges in the way that:
    forward_iterator, forward_sentinel will model a ForwardRange, while forward_iterator and forward_iterator will mode BidirectionalRange.
    
    Reply ↓
    - Eric Niebler on December 11, 2014 at 9:45 am said:
      
      Can sentinel type impact the category of the range.
      
      It need not. A delimited random access range still has random access iterators. It’s the user’s responsibility not to advance past the delimiter. Think: null-terminated string.
      
      In the structure I have presented, both forward_iter and reverse_iter should support increment and decrement and convertible to each other.
      
      For the reasons I mentioned, they can’t be. You could provide functions to_forward_iter and to_reverse_iter that takes both an iterator and the bidi_list and does the conversion for you. That’s the best you can do.
      
      Reply ↓
      - tkamin on December 11, 2014 at 10:16 am said:
        
        “For the reasons I mentioned, they can’t be. You could provide functions to_forward_iter and to_reverse_iter that takes both an iterator and the bidi_list and does the conversion for you. That’s the best you can do.”
        
        If only pass the iterator that points to actual node (not nulls) I can freely convert between without passing the bidrectional model.
        
        But from broader prespective, if looks like I have found that the half-open range [begin, end) cannot reprent every range that could be represented as closed range [begin, end], and vice versa (the second one cannot represent empty range). And STL (even with your extension) support only half open ranges.
      - Eric Niebler on December 11, 2014 at 10:34 am said:
        
        If only pass the iterator that points to actual node (not nulls) I can freely convert between without passing the bidrectional model.
        
        Yes, but then the function incompletely covers its domain.
        
        if looks like I have found that the half-open range [begin, end) cannot reprent every range that could be represented as closed range [begin, end],
        
        If you’re saying that a half-open range of bidirectional iterators cannot represent this range, then I agree. Half-open forward iterators cover the range perfectly well.
        
        And STL (even with your extension) support only half open ranges.
        
        Yes.
    - tkamin on December 11, 2014 at 10:40 am said:
      
      “If you’re saying that a half-open range of bidirectional iterators cannot represent this range, then I agree. Half-open forward iterators cover the range perfectly well.”
      
      Yes, I felt a bit dumb, after I finally found that I created a structure that can be represented as bidirectional closed range but only forward half open range and still was trying to fit into STL design.
      
      Reply ↓
      - Eric Niebler on December 11, 2014 at 10:57 am said:
        
        Not sure why you’d feel dumb. This is a finer point of the STL’s design. Once you grok this, you have a deep understanding of how the iterator categories work. EDIT: In fact, this is worthy of a blog post. 🙂
      - tkamin on December 11, 2014 at 11:20 am said:
        
        I was from begining aware of the fact that STL supports only half-open ranges, but the bidirectionalism of this structure seem to obvious, and I was trying to bend and tweak the rules to get it in iterator representation. And after it got to me that it only applies to the closed range, I felt like a child trying to put cube into triangle hole.
    - tkamin on January 30, 2015 at 1:28 pm said:
      
      As we discussed above in the structure that I have presented the range [first, last) cannot be represented as bidirectional range, while its every subrange [x, y) actually can be.
      
      That led me to following question, does ranges v3 allow to provide your own overloads for slice view so the:
      tk_bidi_list | view::slice(begin+k,end-n)
      Would retturn a bidirectional range, even if the tk_bidi_list is forward one? That is actually true for described structure if k > 0 and n > 0.
      
      Also have you considered changing the requirments in the way that for the begin+n,end-n argument of the range, n would be required to be greater than 0 (instead of greater equal). That would allow implementation to detect at compile time is end is included in created subrange:
      1. end-n – end never included
      2. end – end included
      Also this would resolve problem with suprising difference in complexity in slice(0, end-0) and slice(0,end) for ranges with sentinels.
      
      Reply ↓
Santhi P on December 23, 2014 at 9:53 pm said:

Object slicing has been explain well here. Thanks for the information! You can find more on object slicing with examples here below.

Object slicing in Cpp with examples

Reply ↓
Vladimir on January 28, 2015 at 4:51 am said:

Using an int for an index always makes me sad. What if on a 64-bit machine we will use an index greater than INT_MAX? It will just silently wrap around. And the fact that it will rarely happen makes the error only much worse. If it is signed it should definitely be ptrdiff_t or an equivalent typedef/using.

Reply ↓
rwp on May 25, 2015 at 3:12 am said:

As an old APL aficionado I would like to see somthing like:

vector v = {0,1,2,3,4,5};

Note iota as in APL.
v[iota(3)] giving 0, 1, 2

Slicing would be done like this

v[iota(3) + 2] giving 2, 3, 4

Striding would be accomplished by
v[2 * iota[3] + 1] giving 1, 3, 5

Reverse iterating could be done by
v[4 – iota(5)] giving 4, 3, 2, 1, 0

Pretty easy to read and not a lot of different classes and arguments to cope with.

Reply ↓
- Eric Niebler on May 25, 2015 at 6:50 pm said:
  
  I could imagine a range library that works that way, but that’s not my range library. The APL syntax is pretty foreign in a C++ context.
  
  Reply ↓
x on February 23, 2016 at 10:37 am said:

Your statement about Python slices having reference semantics is incorrect. Most containers (including lists and bytearrays) copy on slice. The reason slice assignment works for them is because it is a special syntax that calls a different method.

If you want slices that are truly a view rather than a copy, you have to use memoryview (or a custom data structure).

Reply ↓
- Eric Niebler on February 23, 2016 at 10:45 am said:
  
  Thanks!
  
  Reply ↓
Bradley Gibson on July 9, 2016 at 4:53 pm said:

New here.

ranges::view::slice() works as expected when supplied with compile-time constants.

Is there a way to slice when the parameters are int variables computed at runtime?

Reply ↓
- Eric Niebler on July 9, 2016 at 5:52 pm said:
  
  It should work just fine when the parameters are known only at runtime.
  
  Reply ↓
  - Bradley Gibson on July 9, 2016 at 11:23 pm said:
    
    Strange. I’ll see what’s up with my compiler, then. Given the error message I was getting (something like no function found whenever I used variables as input params), I’d assumed the params needed to be constexpr.
    
    Thanks, Eric. I think it is fantastic work. I’ll be sad if the committee doesn’t go with your proposal for C++17. 🙂
    
    Reply ↓
  - Bradley Gibson on July 10, 2016 at 12:15 am said:
    
    The error message I was getting was “error: no matching function for call to object of type ‘const ranges::v3::view::view'”, without more specifics.
    
    Turns out the end parameter was vector::size(), which is is size_t (aka unsigned long on my compiler), and ranges::view::slice() overloads require signed parameters.
    
    I know there have been discussions about this and your library, and the lack of consistency in the Standard Library, but I wanted to let others know, in case they run into similar issues. (By the way, I’m much happier with “Consistent Failure” than arbitrary behavior, so thank you for picking a position and sticking with it throughout the design of your library.)
    
    As for me, I’ll cast the returned value to signed, along with an assert that the returned value’s unsigned size does not exceed 2^31-1.
    
    Reply ↓
Dedmen on April 25, 2018 at 4:54 am said:

The link to “Customization Point Design” is broken. Thank me later o7

Reply ↓
Hi-Angel on August 16, 2019 at 2:05 am said:

I find it sad you’re using “auto” everywhere. Right now I’m trying to figure out how to store a range and a slice inside a struct, i.e. I want the type. I have no idea where to find this information, in all examples the type is explicit.

Reply ↓
John M on April 28, 2021 at 12:32 am said:

I know this is a very old article, but slices aren’t exactly new; array slices were available in Ada in the early 1980s, if not the late 1970s. However, in Ada, if you ask for a range from 2 to 5, you get the range from 2 to 5, not from 2 to 4.

Reply ↓