Range Comprehensions

I’ve been busy since I last wrote about ranges. I have a lot of news to share, but in this post, I’m going to narrowly focus on a recent development that has me very excited. It’s a new feature that I’m calling range comprehensions, and they promise to greatly simplify the business of creating custom ranges.

List Comprehensions

If you’re familiar with list comprehensions from Haskell or Python, your ears might have pricked up when I said “range comprehensions.” List comprehensions give you a pithy way to generate new lists from existing ones, either by transforming thenm, filtering them, or combining them, or whatever. Here, for instance, is a Haskell program for generating the first 10 Pythagorean triples:

main = print (take 10 triples)

triples = [(x, y, z) | z <- [1..]
                     , x <- [1..z]
                     , y <- [x..z]
                     , x^2 + y^2 == z^2]

The way to read the triples line is this: generate a list of tuples (x, y, z) where z goes from 1 to infinity, x goes from 1 to z (inclusive), and y goes from x to z, but only yield those triples for which x^2 + y^2 == z^2 is true. The code then generates every combination of x, y, and z in the specified ranges in some order and filters it, yielding a list of the Pythagorean triples. Beautiful. Of particular interest is the fact that, since Haskell is lazy, there is no problem with a comprehension that generates an infinite list.

Back-Story

Back in October, I published a blog post about API design and std::getline in which I showed how a range-based interface is better than the existing one. My friend Bartosz Milewski commented that ranges are difficult to work with and challenged me to show the range-based equivalent of the above pithy Haskell program. I admit that at the time, I had no answer for Bartosz.

Recently, Bartosz published a blog post about just this problem. In his post, Bartosz describes some pretty simple results from category theory (if any category theory can be described as “simple”), and applies it to the problem of generating the Pythagorean triples lazily in C++. It’s a great post, and you should read it. Here, finally, was my answer. Although Bartosz’s code was terribly inefficient, somewhat difficult to reason about, and not formulated in terms of STL-ish concepts, I knew the direction I had to take.

Introducing Range Comprehensions

Without further ado, here is my solution to the Pythagorean triples problem:

using namespace ranges;

// Lazy ranges for generating integer sequences
auto const intsFrom = view::iota;
auto const ints = [=](int i, int j)
    {
        return view::take(intsFrom(i), j-i+1);
    };

// Define an infinite range of all the Pythagorean
// triples:
auto triples =
    view::for_each(intsFrom(1), [](int z)
    {
        return view::for_each(ints(1, z), [=](int x)
        {
            return view::for_each(ints(x, z), [=](int y)
            {
                return yield_if(x*x + y*y == z*z,
                    std::make_tuple(x, y, z));
            });
        });
    });

// Display the first 10 triples
for(auto triple : triples | view::take(10))
{
    std::cout << '('
        << std::get<0>(triple) << ','
        << std::get<1>(triple) << ','
        << std::get<2>(triple) << ')' << '\n';
}

Lines 4 and 5 define intsFrom and ints, which are lazy ranges for generating sequences of integers. Things don’t get interesting until line 12 with the definition of triples. That’s the range comprehension. It uses view::for_each and yield_if to define a lazy range of all the Pythagorean triples.

view::for_each

What is view::for_each? Like std::for_each, it takes a range and function that operates on each element in that range. But view::for_each is lazy. It returns another range. The function that you pass to view::for_each must also return a range. Confused yet?

So many ranges, but what’s going on? Conceptually, it’s not that hard. Let’s say that you call view::for_each with the range {1,2,3} and a function f(x) that returns the range {x,x*x}. Then the resulting range will consist of the elements: {1,1,2,4,3,9}. See the pattern? The ranges returned by f all got flattened. Really, range flattening is all that’s going on.

Now look again at line 12 above:

auto triples =
    view::for_each(intsFrom(1), [](int z)
    {
        return view::for_each(ints(1, z), [=](int x)
        {
            return view::for_each(ints(x, z), [=](int y)
            {
                return yield_if(x*x + y*y == z*z,
                    std::make_tuple(x, y, z));
            });
        });
    });

For every integer z in the range 1 to infinity, we call view::for_each (which, recall, returns a flattened range). The inner view::for_each operates on all the integers x from 1 to z and invokes a lambda that captures z by value. That function returns the result of a third invocation of view::for_each. That innermost lambda, which finally has x, y, z, makes a call to a mysterious-looking function that is provocatively named yield_if. What’s that?

yield_if

The semantics of yield_if is to “inject” the tuple (x,y,z) into the resulting sequence, but only if is a Pythagorean triple. Sounds tricky, but it’s really very simple. Recall that the function passed to view::for_each must return a range. Therefore, yield_if must return a range. If the condition x*x + y*y == z*z is false, it returns an empty range. If it’s true, it returns a range with one element: (x,y,z). Like I said, simple. There’s also a function called yield that unconditionally returns a single-element range.

Now that you know how it works, you can forget it. You can just use view::for_each and yield_if as if you were writing a stateful function that suspends itself when you call yield or yield_if, kind of like a coroutine. After all, I picked the name “yield” to evoke the yield keyword from C#. That keyword gives the function it appears in precisely those coroutine-ish semantics. What’s more, C# functions that have yield statements automatically implement C#’s IEnumerable interface. IEnumerable fills the same niche as the Iterable concept I’ve described in previous posts. That is, you can loop over the elements.

For instance, in C# you can do this (taken from Wikipedia):

// Method that takes an iterable input (possibly an
//  array) and returns all even numbers.
public static IEnumerable<int>
GetEven(IEnumerable<int> numbers) {
    foreach(int i in numbers) {
        if((i % 2) == 0) {
            yield return i;
        }
    }
}

With range comprehensions, equivalent code looks like this:

auto GetEvens =
    view::for_each(numbers, [](int i)
    {
        return yield_if((i % 2) == 0, i);
    });

That’s darn near the same thing, and we don’t need any special keyword or compiler magic.

Performance

Ranges that return ranges that return ranges, oy vey. How horribly does it perform at runtime? As it turns out, not horribly at all, but a lot depends on how good your optimizer is.

I wrote a simple benchmark program that iterates over the first 3000 triples and does some trivial computation with them. I do this in two ways. One is with the range comprehension above, and the other is with this triply-nested for loop:

for(int z = 1;; ++z)
{
    for(int x = 1; x <= z; ++x)
    {
        for(int y = x; y <= z; ++y)
        {
            if(x*x + y*y == z*z)
            {
                result += (x + y + z);
                ++found;
                if(found == 3000)
                    goto done;
            }
        }
    }
}
done:    

You would expect this solution to fly and the range-based one to crawl. But here are the numbers using a hot-off-the-presses gcc-4.9 with -O3:

Raw loop:             2.2s
Range comprehension:  2.3s

That’s it?! Yes, all that extra work being done by the range comprehension is totally transparent to the optimizer, which generates near-optimal code. Kind of amazing, isn’t it?

If, however, your compiler of choice is clang, I have some bad news for you. The range comprehension is (wait for it) 15 times slower. Dear god, that’s awful. I guess that goes to show that despite the astounding awesome-ness of clang in most respects, its optimizer still has some ways to go.

Summary

Haskell and Python have list comprehensions. C# has LINQ and yield. And now C++ has range comprehensions. It’s now pretty trivial to generate non-trivial sequences on the fly, lazily and efficiently, in a way that plays well with all the STL algorithms. Like I said, I’m pretty excited.

Acknowledgements

My deep thanks to Bartosz Milewski for getting me 90% of the way there. I couldn’t have done this without his insights, and the insights of all the functional programmers and category theoreticians that came before. Math FTW!

35 thoughts on “Range Comprehensions

  1. “Bartosz Milewski commented that ranges are difficult to work worth”, I assume worth is supposed to be with?

    Great post by the way :)

  2. Nice post!

    yield_if is so simple and so nice, good work!

    Any idea about what clang is doing wrong there?
    Could you post the compiler flags and clang-version you used?

    If it is an inlining problem one can try to help the compiler with the attributes [[gcc::flatten / gcc::always_inline]] and [[gcc::const / gcc::pure]] and see if it is still failing(note that const is stronger than pure).

    • I use clang built from svn trunk (updated within the past week). The compiler options are unsurprising: “-std=c++11 -O3 -DNDEBUG“. That’s it.

      I’m disinclined to hack the source to accommodate clang’s lousy optimizer. It’s a bug, and the LLVM guys should fix it. Filing the bug is on my to-do list.

      • Forget it. What I meant is that testing if the performance returns to normal when inlining is forced via an attribute would indicate that the problem is related to clang’s inliner, and that this information could enrich the bug report.

        But I just tried those, and there wasn’t any difference since it seems that clang doesn’t support flatten, and although it supports always_inline/hot they seem to have no effect.

          • I just tried it and it works! Now I only get a 2x slowdown!

            After checking out your code I’m pretty convinced that you haven’t built enough layers of abstraction for the inliner to perform this badly by default. You should definitely submit a bug report.

            Now I just wonder where the remaining 2x slowdown comes from…

  3. This was a great post. I’m a big range fan and a D enthusiast, and it is always welcoming to seeing ranges getting expressive implementations in C++ also.

    • Thanks. And to put this work in its proper historical perspective, ranges in one form or another have been in Boost since roughly forever, so C++ ranges aren’t some Jonny-come-lately, me-too technology.

  4. does anybody knows the complexity of that haskell code for triplets,
    i assume it is terrible, unlike imperative solution that would do sqrt on y.

  5. that would do sqrt on z^2-y^2 and then check if the diff is a square of an int by testing floor^2 and ceil^2 of the root

    • Good question. Maybe some Haskell apologist can tell me what I’m doing wrong, but I couldn’t get any Haskell program to do the equivalent to my benchmark and complete in any reasonable amount of time. I’m using this program (compiled with ghc -O2):

      import Data.List
      main = print (foldl' (+)
                           0
                           (map (\(x,y,z)->x+y+z)
                                (take 1000 triples)))
      triples = [(x, y, z) | z <- [1..]
                           , x <- [1..z]
                           , y <- [x..z]
                           , x^2 + y^2 == z^2]
      

      At 1000 triples, it already takes 7.7s. At 3000, it didn’t come back and I killed it.

      In comparison, the raw nested for loops above finished in 2.2s and the range comprehension in 2.3s.

      • I got an answer on Reddit. The following Haskell finishes in 8.5s:

        import Data.List
        default(Int, Double)
        
        main = print (foldl' (+)
                             0
                             (map (\(x,y,z)->x+y+z)
                                  (take 3000 triples)))
        
        triples :: [(Int, Int, Int)]
        triples = [(x, y, z) | z <- [1..]
                             , x <- [1..z]
                             , y <- [x..z]
                             , x^(2 :: Int) + y^(2 :: Int) == z^(2 :: Int)]
        

        That’s still quite a bit slower than the C++, both the raw nested loops and the fancy range comprehension. Go figure.

          • Nothing to do with the range being infinite. But yeah, Integer is the default in Haskell, and it’s unlimited precision. The operations on it are apparently very slow.

          • bad wording, i mean that Integer can represent -inf to +inf (with inf RAM) aka infinite range, not in a sense [1..]

        • I’m using GHC 7.8.2 with LLVM 3.4svn. I’m also using GCC 4.8.2. Here are two toy programs that compute the same thing in almost the same way:


          -- HUSKELL.hs
          import Data.List (foldl')

          main = print vals
          where
          vals = foldl' (+) 0 $ map add $ take 3000 triples
          add (x, y, z) = x + y + z
          triples =
          [ (x, y, z) :: (Int, Int, Int)
          | z <- [1..], x <- [1..z], y <- [x..z], sq x + sq y == sq z
          ]
          sq = (^ (2 :: Int))


          // SEPPLES.cpp

          include

          constexpr int max_triples = 3000;

          int
          main(int argc, char **argv) {
          int result = 0;
          int found = 0;
          for (int z = 1;; ++z) {
          for (int x = 1; x <= z; ++x) {
          for (int y = x; y <= z; ++y) {
          if (xx + yy == z*z) {
          result += (x + y + z);
          ++found;
          if (found == max_triples) {
          goto done;
          }
          }
          }
          }
          }
          done:
          printf("%d\n", result);
          }

          I built and executed these programs as follows:


          $ ghc -O2 -fllvm HUSKELL.hs
          [1 of 1] Compiling Main ( HUSKELL.hs, HUSKELL.o )
          Linking HUSKELL ...
          $ g++ -O3 -o SEPPLES -std=c++11 SEPPLES.cpp
          $ time ./HUSKELL
          10650478
          ./HUSKELL 3.47s user 0.00s system 99% cpu 3.484 total
          $ time ./SEPPLES
          10650478
          ./SEPPLES 3.42s user 0.00s system 99% cpu 3.429 total

          Toy benchmarks! \o/

  6. I see people are ignoring me, so Ill just be a di*k and post performance measurements:
    http://ideone.com/ikmxwI
    2390ms
    82ms
    ( n had to be reduced from 3000 cuz of timeouts :) )

    So basically you can freely delete performance chapter of a post because it is like a bad joke… it is like writing your sort is within 1% of the bubble sort, aka nobody who cares about perf is not gonna use bubble sort so your measurements are useless.

    So you should probably fix your yield_if code to calculate it properly but then honestly imperative solution looks 10x more readable imho(lambda stuck inside 2 levels of for == :/ ).

      • Um,
        making a elegant solution to a problem ignoring its shi**y alg complexity is a Haskell way not a cpp one. :P
        aka dont drink the Haskell kool-aid :P
        Yes Haskell way can be made efficient, but then you loose(in practice mostly useless anyway) nice concise way of declaring pt range.
        My point was not that u are ignorant of crappy alg complexity of alg but
        a) nobody discussed this
        b) it is kind of irritating for me personally to have those kind of problems- where you need to worry about complexity of your declarative syntax sugar cuz it is not as obvious as imperative, but then again it should not happen ofter IRL

        also though your solution is nice proof of concept syntax is horrible, for eg if it was “languaged in”
        c++ we could:
        -get rid of ugly input params to inner functions([=](int y))
        -get rid of ugly ints(1,n) and have [1..n]

        though again in most cases your ranges will be simple and you wont suffer as much in pt.

        but tbh i have real problems with cpp, for example if serialization of “nonC” structures could be made in ISO portable way that would benefit me IRL, while in most cases I dont really care about comprehensions…
        also often I wish map had a .keys.begin()/end() and .values.begin()/end() than that I can define list comprehensions

        If somebody wants to enlighten me when range comprehensions is so important please do, all I can think of is some nice interview questions for ggl, fb… :)
        where it reduces line count from 10 to 5 :)

  7. Great post and I hope to see some more discussion in the ranges SG about how to handle lazy ranges. Personally, I think that core language support like yield return would be preferable, but this library-base solution is a great start (as you undoubtedly remember range-based for was a reaction to your FOREACH macro becoming popular). So, ideally I would like to be able to write the above triples as a function something similar to the following:

    auto triples()
    {
    for (auto z : intsFrom(1))
    for (auto x : ints(1, z))
    for (auto y : ints(x, z))
    if (xx + yy == z*z)
    yield return std::make_tuple(x, y, z);
    }

        • maybe zip could be

          for (auto z , auto x , auto y: intsFrom(1), ints(1, z), ints(x, z))

          or maybe brackets could mean pair/tuple (ofc only in range based for):

          for ( (z,&x,y) : intsFrom(1), ints(1, z), ints(x, z)) // x is auto&

          but for some reason I feel ppl from ISO love their auto so that wont fly…

          and ofc I must rage a bit now:
          what is more often used:
          nested loops or zip?
          but ofc zip is so functional, functional is so cool => lets add zip to core lang, who cares about nested loops :P

  8. This is one of the things that I love about python. Lets me generate ranges on the fly. I think this would be one of the killer features of ranges: being able to generate them on the fly. Very interesting post, indeed. Can’t wait for a finished proposal.

    • Yeah, I was at C++Now with Chandler, poking him in the back while he worked on this. I’m not sure what the current state is, but last I heard from him, the difficulty clang has optimizing this code is a somewhat intractable problem, and he had given up and turned his attention elsewhere. Which makes me sad.

  9. Pingback: C++ Ranges are Pure Monadic Goodness |   Bartosz Milewski's Programming Cafe

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>