Range Concepts, Part 2 of 4: Infinite Ranges

In the last post, I tried to make delimited ranges fit into the STL and found the result unsatisfying. This time around I’ll be trying the same thing with infinite ranges and will sadly be reaching the same conclusion. But the exercise will point the way toward an uber-Range concept that will subsume delimited ranges, infinite ranges, and STL-ish pair-o’-iterator ranges.

Infinite Ranges

Building motivation for delimited ranges was fairly simple; we’re all familiar with the idea from null-terminated strings. The case for infinite ranges is bit harder to make. As C++ programmers, we don’t regularly bump into infinity. In other languages infinity is all in a day’s work. Haskell programmers can create an infinite list of integers as simply as typing [1..]. Does that break your brain? It shouldn’t. It’s a lazy list — the elements are generated on demand. All infinite ranges are necessarily lazy.

What’s the use of that? Consider the take algorithm which constructs a new list from the first N elements of another list. It handles infinite lists with aplomb. Or consider what should happen when you zip an infinite list with a finite one. You end up with a finite list of element pairs. That’s a perfectly sensible thing to do.

Supporting infinite ranges in a generic range library would be a boon, so it’s worth looking at what it does to the concepts.

Infinite Ranges in the STL

We might think of infinite ranges as a kind of degenerate delimited range where the delimiting predicate always returns false. When we’re trying to reach infinity, our work is never done. With that in mind, let’s implement an infinite range of integers starting at some value and ending never. It’s described below.

struct iota_range
{
private:
    int i_;
public:
    using const_iterator = struct iterator
      : boost::iterator_facade<
            iterator, int const,
            std::forward_iterator_tag
        >
    {
    private:
        bool sentinel_;
        int i_;
        friend class boost::iterator_core_access;
        friend struct iota_range;
        iterator(int i) : sentinel_(false), i_(i) {}
        bool equal(iterator that) const
        {
            return sentinel_ == that.sentinel_
                && i_ == that.i_;
        }
        void increment() 
        {
            ++i_;
        }
        int const & dereference() const
        {
            return i_;
        }
    public:
        iterator() : sentinel_(true), i_(0) {}
    };
    constexpr explicit iota_range(int i = 0)
      : i_(i)
    {}
    iterator begin() const
    {
       return iterator{i_};
    }
    iterator end() const
    {
       return iterator{};
    }
    constexpr explicit operator bool() const
    {
       return true;
    }
};

With this range, we can do this:

// Spew all the ints. WARNING: THIS NEVER ENDS!
for( int i : iota_range() )
    std::cout << i << 'n';

iota_range is a forward range; that is, its iterators model the ForwardIterator concept 1. They store both an integer and a Boolean signifying whether the iterator is a sentinel or not. The range’s begin iterator is not a sentinel, the end iterator is. Therefore, they will never compare equal, and we’ll count integers … forever!

A Funny Thing Happened on the Way to Infinity

What you’ll find when you use this range in your code is that some things will work as you expect and other things will spin off into hyperspace and never come back. Take a very simple example: std::distance. Presumably, you won’t be foolish enough to do this:

iota_range iota;
// Oops!
auto dist = std::distance(iota.begin(), iota.end());

What’s less clear is that you should never, ever, under any circumstance, pass this range directly or indirectly to any algorithm that does binary searching, including binary_search, lower_bound, upper_bound, and equal_range — despite the fact that iota_range is, in fact, a sorted forward range. Think about it: binary searching is a divide-and-conquer algorithm. Dividing an infinite range yields — surprise! — an infinite range. If you pass an iota_range to any of these algorithms, go get yourself a cup of coffee. You could be waiting a while.

Performance Problems

If you read the last blog post about delimited ranges, maybe you cringed a bit when you saw the implementation of iota_range::iterator::equal. It is our intention that an iota_range‘s iterator will never, ever finish iterating, so the termination condition should be a constant expression. Instead, we have this:

bool equal(iterator that) const
{
    return sentinel_ == that.sentinel_
        && i_ == that.i_;
}

That’s two runtime checks when it should be zero! As I showed last time, this can have a disastrous effect on the quality of the generated code.

Possibly Infinite Ranges

Infinite loops are one problem with infinite ranges, but there’s another more subtle problem, and unfortunately it already exists in the Standard Library. Take our old friend (and my favorite punching bag) std::istream_iterator. It is an input iterator, so it’s required to have an associated difference_type. In “Elements of Programming,” Alexander Stepanov (the father of the STL and of Generic Programming) says this about an Iterator’s difference type:

DistanceType returns an integer type large enough to measure any sequence of applications of successor allowable for the type. 2

For istream_iterator‘s, the difference_type is std::ptrdiff_t. Now, consider the following code:

std::istream& sin = ...;
std::istream_iterator<char> it{sin}, end;
std::ptrdiff_t dis = std::distance(it, end);    

This is perfectly reasonable and valid code. It pulls characters out of the istream, counts them, and discards them. Now, imaging sin is pulling characters from the network, and that this code runs for days, pulling billions and billions of characters off the net. What happens when a ptrdiff_t isn’t big enough to hold the result? Answer: undefined behavior. In practice, you’ll get garbage, but in principle, anything could happen.

To me, that’s a little disconcerting. An iterator’s difference_type should be big enough to hold the distance between any two iterators. Since input streams are unbounded in principle, there is no scalar signed integer type that’s big enough. Huh. We’re forced to conclude that the validity of istream_iterator‘s increment operation is limited by the size of its difference_type, or that istream_iterator‘s difference_type is wrong. Again: Huh.

Summary, For Now…

Infinite ranges are useful, but they have real problems given the current definition of the STL. You might think that disallowing infinite ranges avoids the problem, but it’s more fundamental than that. In fact, some problems exist today. It’s hard to fix the difference_type overflow issue in the STL today (apart from telling people to be careful), but it’s worth considering whether a new range-based interface can help. (So as not to raise expectations, I’ll say now that this is a vexing problem that I don’t yet have a great solution to.)

Summing up, here are the issues I’ve identified so far with STL-ish pair-o’-iterators-style ranges:

  • Delimited and infinite ranges generate poor code
  • They are forced to model weaker concepts than they might otherwise
  • Also, they’re awkward to implement
  • It’s too easy to pass an infinite range to an algorithm that can’t handle it
  • Possibly-infinite ranges can overflow their difference_type

In the next installment, I’ll describe the conceptual foundations of my new range library that strikes at the root of these problems. Stay tuned.


1. Actually, this is a bit of a lie. Forward iterators aren’t supposed to return references to objects inside them. Please ignore this for the sake of discussion.↩

2. Stepanov, A; McJones, P. Elements of Programming. Addison-Wesley. 2009.↩

x
x

12 Replies to “Range Concepts, Part 2 of 4: Infinite Ranges”

  1. Hi Eric — I think there’s some way out of the infinite input iterator. I’ve implemented this in Boost.Iterator, called the boost::function_input_iterator<> with the accompanying state class called boost::infinite.

    That being said, I do agree that a range concept would be better at representing infinity — say, modeling /dev/zero or /dev/urandom — and mentioned as much at the 2012 Kona meeting where std::range was being proposed. In beer-washed discussions then with Andrew Sutton, I remember bringing up the idea of a range representing all Integers, then having a predicate-limited range synthesized from that representing only prime numbers, and yet even just the first N of these primes. The concept should really be able to support these notions of “range” which may or may not really be lazy or expanded.

    I’m intrigued with what you’re coming up with and running into and look forward to the next one!

    • Hi Dean! I finally got a chance to look into function_input_iterator, and I don’t think it solves the difference_type problem. Unless I’m mistaken, function_input_iterator has ptrdiff_t as its difference_type, just like std::istream_iterator. It’ll eventually overflow too.

      I cornered Andrew Sutton a few weeks ago at the Issaquah meeting and talked his ear off about ranges, too. He knows what I’m up to, and he didn’t run away screaming. We may even get a paper out of it.

  2. Binary search requires a pair of random access iterators. If you make your iota random access (which you can), there will be no problem computing distance. You can’t make the distance infinite though, it will have to be the maximum int. You can also make your iterator more efficient by using this property.

    • Binary search requires a pair of random access iterators.

      You sure about that? You should look at the requirements of the algorithms I listed. What you say is true about the iota_range I showed, but it missed the point I was trying to make. Integers are unbounded in theory, and they would be in practice too if we had an infinite precision integer class, like some other languages do. The problem with infinity is real there.

      • I think you mean “… unspecified precision integer class, like some other languages do.” They all run out of memory eventually. Practically speaking, this isn’t (usually) much of an issue — but it isn’t unbounded like istream iteration could theoretically be.

      • well afaik binary search does idxc=(idxa+idxb) /2
        that is RA requirement for me.
        and tbh complaining about infinity not being in the language is silly. it is like complaining that NaNs ruin your FP calculations…
        Also all your use cases dont need infinity(they require that you generate numbers up to the size of your other range, or 2x that if you want to unpack pair, or whatever multiple, including floating point ones, but all of them are real integer numbers, not infinity)
        So it is just convenience( you dont need to type zip(other, [1..other.size ), you type zip(other, [1..])

        my advice:
        IRL when counting (u)int64_t == infinity πŸ˜€

  3. If you’re going to lie, you might as well call it a RandomAccessIterator πŸ˜‰ Joking aside, I think footnote 1 deserves more attention. Maybe a sidebar. The greatest shortcoming of the standard library’s iterator design is the bundling of access and traversal. The ForwardIterator requirements that (a) dereferencing produces an actual reference (24.2.5/1), and (b) dereferenceable iterators are equal iff they reference the same object (24.2.5/6), are crippling. Most programmers who don’t study the standard (most programmers) refuse to believe that any iterator better than input/output must designate a range of elements physically stored in memory.

    I once spent an hour on StackOverflow trying to convince someone that std::vector<bool>::iterator must be an InputIterator because it fails to meet those two requirements. I almost won him over, until he tried something like this program. You know there’s a problem when major library implementations (not only blog authors!) knowingly violate the standard.

    I know that the issue is not central for this particular post, but you did touch on it (and apparently it’s a hot-button for me). I think it’s important to shine a nice bright spotlight on it as part of any development of ranges if only to ensure that readers see the tar pit and understand the importance of avoiding it in a potential range design.

    • Well, I really have no defense. Guilty as charged. Iterators do beg to be abused this way. But you’re right, too many useful iterator types cannot satisfy the letter of the standard. I wonder what would break if the requirements were loosened. E.g. a transform iterator needs to return an rvalue. “Squirrely” iterators (that squirrel away values inside them) like the one I showed are useful but problematic. (For a good time, read this: http://cplusplus.github.io/LWG/lwg-active.html#2360)

      • Interestingly enough, you can avoid the problem outlined in http://cplusplus.github.io/LWG/lwg-defects.html#2360 if you are are given a pair of iterators to reverse, not just one of them.

        template <class I>
        pair<reverse_iterator<I>, reverse_iterator<I>>
        reverse(I begin, I end);

        The same goes for several other lazy range transformations (e.g., filtering): they can be implemented for a pair of iterators but not for single iterators independently.

  4. In “Performance Problems” you claim that it is necessary to make both tests. Why could you not just cheat and return false, given that you know the sequence is infinite?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.