I’ve been busy since I last wrote about ranges. I have a lot of news to share, but in this post, I’m going to narrowly focus on a recent development that has me very excited. It’s a new feature that I’m calling range comprehensions, and they promise to greatly simplify the business of creating custom ranges.
List Comprehensions
If you’re familiar with list comprehensions from Haskell or Python, your ears might have pricked up when I said “range comprehensions.” List comprehensions give you a pithy way to generate new lists from existing ones, either by transforming thenm, filtering them, or combining them, or whatever. Here, for instance, is a Haskell program for generating the first 10 Pythagorean triples:
main = print (take 10 triples) triples = [(x, y, z) | z <- [1..] , x <- [1..z] , y <- [x..z] , x^2 + y^2 == z^2]
The way to read the triples
line is this: generate a list of tuples (x, y, z)
where z
goes from 1 to infinity, x
goes from 1 to z
(inclusive), and y
goes from x
to z
, but only yield those triples for which x^2 + y^2 == z^2
is true. The code then generates every combination of x
, y
, and z
in the specified ranges in some order and filters it, yielding a list of the Pythagorean triples. Beautiful. Of particular interest is the fact that, since Haskell is lazy, there is no problem with a comprehension that generates an infinite list.
Back-Story
Back in October, I published a blog post about API design and std::getline
in which I showed how a range-based interface is better than the existing one. My friend Bartosz Milewski commented that ranges are difficult to work with and challenged me to show the range-based equivalent of the above pithy Haskell program. I admit that at the time, I had no answer for Bartosz.
Recently, Bartosz published a blog post about just this problem. In his post, Bartosz describes some pretty simple results from category theory (if any category theory can be described as “simple”), and applies it to the problem of generating the Pythagorean triples lazily in C++. It’s a great post, and you should read it. Here, finally, was my answer. Although Bartosz’s code was terribly inefficient, somewhat difficult to reason about, and not formulated in terms of STL-ish concepts, I knew the direction I had to take.
Introducing Range Comprehensions
Without further ado, here is my solution to the Pythagorean triples problem:
using namespace ranges; // Lazy ranges for generating integer sequences auto const intsFrom = view::iota; auto const ints = [=](int i, int j) { return view::take(intsFrom(i), j-i+1); }; // Define an infinite range of all the Pythagorean // triples: auto triples = view::for_each(intsFrom(1), [](int z) { return view::for_each(ints(1, z), [=](int x) { return view::for_each(ints(x, z), [=](int y) { return yield_if(x*x + y*y == z*z, std::make_tuple(x, y, z)); }); }); }); // Display the first 10 triples for(auto triple : triples | view::take(10)) { std::cout << '(' << std::get<0>(triple) << ',' << std::get<1>(triple) << ',' << std::get<2>(triple) << ')' << '\n'; }
Lines 4 and 5 define intsFrom
and ints
, which are lazy ranges for generating sequences of integers. Things don’t get interesting until line 12 with the definition of triples
. That’s the range comprehension. It uses view::for_each
and yield_if
to define a lazy range of all the Pythagorean triples.
view::for_each
What is view::for_each
? Like std::for_each
, it takes a range and function that operates on each element in that range. But view::for_each
is lazy. It returns another range. The function that you pass to view::for_each
must also return a range. Confused yet?
So many ranges, but what’s going on? Conceptually, it’s not that hard. Let’s say that you call view::for_each
with the range {1,2,3}
and a function f(x)
that returns the range {x,x*x}
. Then the resulting range will consist of the elements: {1,1,2,4,3,9}
. See the pattern? The ranges returned by f
all got flattened. Really, range flattening is all that’s going on.
Now look again at line 12 above:
auto triples = view::for_each(intsFrom(1), [](int z) { return view::for_each(ints(1, z), [=](int x) { return view::for_each(ints(x, z), [=](int y) { return yield_if(x*x + y*y == z*z, std::make_tuple(x, y, z)); }); }); });
For every integer z
in the range 1 to infinity, we call view::for_each
(which, recall, returns a flattened range). The inner view::for_each
operates on all the integers x
from 1 to z
and invokes a lambda that captures z
by value. That function returns the result of a third invocation of view::for_each
. That innermost lambda, which finally has x
, y
, z
, makes a call to a mysterious-looking function that is provocatively named yield_if
. What’s that?
yield_if
The semantics of yield_if
is to “inject” the tuple (x,y,z)
into the resulting sequence, but only if is a Pythagorean triple. Sounds tricky, but it’s really very simple. Recall that the function passed to view::for_each
must return a range. Therefore, yield_if
must return a range. If the condition x*x + y*y == z*z
is false, it returns an empty range. If it’s true, it returns a range with one element: (x,y,z)
. Like I said, simple. There’s also a function called yield
that unconditionally returns a single-element range.
Now that you know how it works, you can forget it. You can just use view::for_each
and yield_if
as if you were writing a stateful function that suspends itself when you call yield
or yield_if
, kind of like a coroutine. After all, I picked the name “yield” to evoke the yield
keyword from C#. That keyword gives the function it appears in precisely those coroutine-ish semantics. What’s more, C# functions that have yield
statements automatically implement C#’s IEnumerable
interface. IEnumerable
fills the same niche as the Iterable concept I’ve described in previous posts. That is, you can loop over the elements.
For instance, in C# you can do this (taken from Wikipedia):
// Method that takes an iterable input (possibly an // array) and returns all even numbers. public static IEnumerable<int> GetEven(IEnumerable<int> numbers) { foreach(int i in numbers) { if((i % 2) == 0) { yield return i; } } }
With range comprehensions, equivalent code looks like this:
auto GetEvens = view::for_each(numbers, [](int i) { return yield_if((i % 2) == 0, i); });
That’s darn near the same thing, and we don’t need any special keyword or compiler magic.
Performance
Ranges that return ranges that return ranges, oy vey. How horribly does it perform at runtime? As it turns out, not horribly at all, but a lot depends on how good your optimizer is.
I wrote a simple benchmark program that iterates over the first 3000 triples and does some trivial computation with them. I do this in two ways. One is with the range comprehension above, and the other is with this triply-nested for
loop:
for(int z = 1;; ++z) { for(int x = 1; x <= z; ++x) { for(int y = x; y <= z; ++y) { if(x*x + y*y == z*z) { result += (x + y + z); ++found; if(found == 3000) goto done; } } } } done:
You would expect this solution to fly and the range-based one to crawl. But here are the numbers using a hot-off-the-presses gcc-4.9 with -O3
:
Raw loop: 2.2s
Range comprehension: 2.3s
That’s it?! Yes, all that extra work being done by the range comprehension is totally transparent to the optimizer, which generates near-optimal code. Kind of amazing, isn’t it?
If, however, your compiler of choice is clang, I have some bad news for you. The range comprehension is (wait for it) 15 times slower. Dear god, that’s awful. I guess that goes to show that despite the astounding awesome-ness of clang in most respects, its optimizer still has some ways to go.
Summary
Haskell and Python have list comprehensions. C# has LINQ and yield
. And now C++ has range comprehensions. It’s now pretty trivial to generate non-trivial sequences on the fly, lazily and efficiently, in a way that plays well with all the STL algorithms. Like I said, I’m pretty excited.
Acknowledgements
My deep thanks to Bartosz Milewski for getting me 90% of the way there. I couldn’t have done this without his insights, and the insights of all the functional programmers and category theoreticians that came before. Math FTW!
“Bartosz Milewski commented that ranges are difficult to work worth”, I assume worth is supposed to be with?
Great post by the way 🙂
Thanks! I’ll fix that as soon as I get a chance.
Nice post!
yield_if is so simple and so nice, good work!
Any idea about what clang is doing wrong there?
Could you post the compiler flags and clang-version you used?
If it is an inlining problem one can try to help the compiler with the attributes [[gcc::flatten / gcc::always_inline]] and [[gcc::const / gcc::pure]] and see if it is still failing(note that const is stronger than pure).
I use clang built from svn trunk (updated within the past week). The compiler options are unsurprising: “
-std=c++11 -O3 -DNDEBUG
“. That’s it.I’m disinclined to hack the source to accommodate clang’s lousy optimizer. It’s a bug, and the LLVM guys should fix it. Filing the bug is on my to-do list.
Forget it. What I meant is that testing if the performance returns to normal when inlining is forced via an attribute would indicate that the problem is related to clang’s inliner, and that this information could enrich the bug report.
But I just tried those, and there wasn’t any difference since it seems that clang doesn’t support flatten, and although it supports always_inline/hot they seem to have no effect.
But you were right in a sense. Check out this comment on Reddit to see a partial work-around for the clang performance problem. You just have to bump up the inlining threshold (using an unsupported switch, boo!)
I just tried it and it works! Now I only get a 2x slowdown!
After checking out your code I’m pretty convinced that you haven’t built enough layers of abstraction for the inliner to perform this badly by default. You should definitely submit a bug report.
Now I just wonder where the remaining 2x slowdown comes from…
How are these different from Boost’s transform and filter adaptors?
Can you implement the Pythagorean triples example using them?
Boost.Range is missing the bind higher-order function to achieve this.
Right. And for the uninitiated, by
bind
Paul means the function in Haskell that makes lists a Monad. In Boost.Range, the equivalent would be an adaptor that flattens a range of ranges into a range.This was a great post. I’m a big range fan and a D enthusiast, and it is always welcoming to seeing ranges getting expressive implementations in C++ also.
Thanks. And to put this work in its proper historical perspective, ranges in one form or another have been in Boost since roughly forever, so C++ ranges aren’t some Jonny-come-lately, me-too technology.
does anybody knows the complexity of that haskell code for triplets,
i assume it is terrible, unlike imperative solution that would do sqrt on y.
that would do sqrt on z^2-y^2 and then check if the diff is a square of an int by testing floor^2 and ceil^2 of the root
How do the performance of your C++ example compate to the performance of the Haskell and C++++ versions?
Good question. Maybe some Haskell apologist can tell me what I’m doing wrong, but I couldn’t get any Haskell program to do the equivalent to my benchmark and complete in any reasonable amount of time. I’m using this program (compiled with
ghc -O2
):At 1000 triples, it already takes 7.7s. At 3000, it didn’t come back and I killed it.
In comparison, the raw nested for loops above finished in 2.2s and the range comprehension in 2.3s.
I got an answer on Reddit. The following Haskell finishes in 8.5s:
That’s still quite a bit slower than the C++, both the raw nested loops and the fancy range comprehension. Go figure.
So it was
Integer of inf range
vs
Int
problem ?
Nothing to do with the range being infinite. But yeah,
Integer
is the default in Haskell, and it’s unlimited precision. The operations on it are apparently very slow.bad wording, i mean that Integer can represent -inf to +inf (with inf RAM) aka infinite range, not in a sense [1..]
I’m using GHC 7.8.2 with LLVM 3.4svn. I’m also using GCC 4.8.2. Here are two toy programs that compute the same thing in almost the same way:
-- HUSKELL.hs
import Data.List (foldl')
main = print vals
where
vals = foldl' (+) 0 $ map add $ take 3000 triples
add (x, y, z) = x + y + z
triples =
[ (x, y, z) :: (Int, Int, Int)
| z <- [1..], x <- [1..z], y <- [x..z], sq x + sq y == sq z
]
sq = (^ (2 :: Int))
// SEPPLES.cpp
include
constexpr int max_triples = 3000;
int
main(int argc, char **argv) {
int result = 0;
int found = 0;
for (int z = 1;; ++z) {
for (int x = 1; x <= z; ++x) {
for (int y = x; y <= z; ++y) {
if (xx + yy == z*z) {
result += (x + y + z);
++found;
if (found == max_triples) {
goto done;
}
}
}
}
}
done:
printf("%d\n", result);
}
I built and executed these programs as follows:
$ ghc -O2 -fllvm HUSKELL.hs
[1 of 1] Compiling Main ( HUSKELL.hs, HUSKELL.o )
Linking HUSKELL ...
$ g++ -O3 -o SEPPLES -std=c++11 SEPPLES.cpp
$ time ./HUSKELL
10650478
./HUSKELL 3.47s user 0.00s system 99% cpu 3.484 total
$ time ./SEPPLES
10650478
./SEPPLES 3.42s user 0.00s system 99% cpu 3.429 total
Toy benchmarks! \o/
I see people are ignoring me, so Ill just be a di*k and post performance measurements:
http://ideone.com/ikmxwI
2390ms
82ms
( n had to be reduced from 3000 cuz of timeouts 🙂 )
So basically you can freely delete performance chapter of a post because it is like a bad joke… it is like writing your sort is within 1% of the bubble sort, aka nobody who cares about perf is not gonna use bubble sort so your measurements are useless.
So you should probably fix your yield_if code to calculate it properly but then honestly imperative solution looks 10x more readable imho(lambda stuck inside 2 levels of for == :/ ).
Um, the post is about range comprehensions, not about the optimal way to find Pythagorean triples. It’s an example. Jeez.
Um,
making a elegant solution to a problem ignoring its shi**y alg complexity is a Haskell way not a cpp one. 😛
aka dont drink the Haskell kool-aid 😛
Yes Haskell way can be made efficient, but then you loose(in practice mostly useless anyway) nice concise way of declaring pt range.
My point was not that u are ignorant of crappy alg complexity of alg but
a) nobody discussed this
b) it is kind of irritating for me personally to have those kind of problems- where you need to worry about complexity of your declarative syntax sugar cuz it is not as obvious as imperative, but then again it should not happen ofter IRL
also though your solution is nice proof of concept syntax is horrible, for eg if it was “languaged in”
c++ we could:
-get rid of ugly input params to inner functions([=](int y))
-get rid of ugly ints(1,n) and have [1..n]
though again in most cases your ranges will be simple and you wont suffer as much in pt.
but tbh i have real problems with cpp, for example if serialization of “nonC” structures could be made in ISO portable way that would benefit me IRL, while in most cases I dont really care about comprehensions…
also often I wish map had a .keys.begin()/end() and .values.begin()/end() than that I can define list comprehensions
If somebody wants to enlighten me when range comprehensions is so important please do, all I can think of is some nice interview questions for ggl, fb… 🙂
where it reduces line count from 10 to 5 🙂
Great post and I hope to see some more discussion in the ranges SG about how to handle lazy ranges. Personally, I think that core language support like yield return would be preferable, but this library-base solution is a great start (as you undoubtedly remember range-based for was a reaction to your FOREACH macro becoming popular). So, ideally I would like to be able to write the above triples as a function something similar to the following:
auto triples()
{
for (auto z : intsFrom(1))
for (auto x : ints(1, z))
for (auto y : ints(x, z))
if (xx + yy == z*z)
yield return std::make_tuple(x, y, z);
}
actually i prefer
for (auto z : intsFrom(1), auto x : ints(1, z),auto y : ints(x, z))
That’s not a bad suggestion for a language extension. Although I think a similar syntax has been suggested to mean
zip
.maybe zip could be
for (auto z , auto x , auto y: intsFrom(1), ints(1, z), ints(x, z))
or maybe brackets could mean pair/tuple (ofc only in range based for):
for ( (z,&x,y) : intsFrom(1), ints(1, z), ints(x, z)) // x is auto&
but for some reason I feel ppl from ISO love their auto so that wont fly…
and ofc I must rage a bit now:
what is more often used:
nested loops or zip?
but ofc zip is so functional, functional is so cool => lets add zip to core lang, who cares about nested loops 😛
This is one of the things that I love about python. Lets me generate ranges on the fly. I think this would be one of the killer features of ranges: being able to generate them on the fly. Very interesting post, indeed. Can’t wait for a finished proposal.
This comment is to let you know that the Entries RSS feed is broken.
Sigh. Thanks.
The first fix for clang’s performance issue here has been introduced: http://reviews.llvm.org/rL208856
Yeah, I was at C++Now with Chandler, poking him in the back while he worked on this. I’m not sure what the current state is, but last I heard from him, the difficulty clang has optimizing this code is a somewhat intractable problem, and he had given up and turned his attention elsewhere. Which makes me sad.
Pingback: C++ Ranges are Pure Monadic Goodness | Bartosz Milewski's Programming Cafe
Pingback: Study Notes: Functional Programming with C++ | Yongwei's Blog
Eric this maybe a bit offtopic, but will your range library support what Sean Parrent calls projections or something like that…
I mean if I have persons vector
can i do:
persons | filter(Person::get_name())
or
sort(persons, Person::get_name()) // uses < on results of get_name
P.S. Please please put some paypal or btc way to donate…. you may not need the money, but I think that it is polite and moral to give back to people like you that put a lot of hard work to improve the humanity by giving millions of C++ developers a better tools 😀
You probably mean “transform” instead of “filter”, but yes, you can do this:
This isn’t a projection though. It’s what I call an “invokable” — something that can be passed to
std::bind
. And yes, I got the idea from Sean.Yes, I support projections, except that to specify a projection to
sort
, you have to specify the comparison function, so it would look like:This idea also comes from Sean.
When I presented my work to the committee in Urbana-Champaign, we took straw polls regarding whether or not to support invokables and projections in the algorithms. Support was unanimously in favor of invokables, and strongly in favor of projections. I think we’re likely to see them in a future version of the standard library.
Wow. You know, this comment is worth far more to me than the $XY dollars you might donate. THANK YOU. I’ll think of some way to give people a way to express thanks. I’m not sure whether it’ll be money or kudos or what.
Hi Eric, I gave a bit of $ to Boost since they(including you) also do amazing work. If you ever setup a donation account please make a blog post about it so I notice.
I know it is unlikely that you will earn more than 5%-25% of what you could as staff eng at ggl or msft, but I think principle is important. People that do amazing work should be appreciated in more than a “good job Jimmy” way 😀
For personal reasons, I decided against giving users a way to donate money. Money is not how I measure my success. Besides, on the off-chance that it brought in real money (unlikely) I would have to declare it on my taxes, which is more work than it’s worth. 😉
Believe me, I consider myself amply compensated for my work. And I really appreciate the sentiment.
I’m not sure the compiler remark was a fair comment against C#, and as you pointed out, C# has Linq:
public static IEnumerable GetEvens(IEnumerable numbers) => numbers.Where(i => (i % 2) == 0);
or my preference:
public static IEnumerable Evens(this IEnumerable numbers) => numbers.Where(i => (i % 2) == 0);
The topmost example appears to be missing a capture.
And with some c++17 goodness here’s a simplification: https://wandbox.org/permlink/dgD3nSCdgFUeDRYN
Sorry, what example is missing a capture?
The topmost one. It will not compile with
ints
not captured. [I suppose I never got notifications working here]