Out Parameters, Move Semantics, and Stateful Algorithms

At GoingNative back in September, Andrei Alexandrescu posed an interesting question about API design and C++11 that has had me scratching my head for a month. It was about the design of std::getline:

// Read a line from sin and fill in buf. Return sin.
std::istream& getline(std::istream& sin, std::string& buf)
    // ... fill in buf
    return sin;

Seasoned programmers recognize this pattern: The function takes the buffer by non-const reference and fills it in. They also know why the interface is designed this way: Because containers like std::string are too expensive to copy to consider returning one by value. APIs designed like this have traditionally had the benefit of being efficient, at the expense of some awkwardness at the call site:

std::string buf;
std::getline(std::cin, buf);

In C++11, standard containers like std::string are moveable, so returning one by value is darn near free. So, perhaps a better API design would look like this:

// Should getline look like this instead?
std::string getline(std::istream& sin)
    std::string buf;
    // ... fill in buf
    return buf; // This gets moved out efficiently

That allows a more concise, natural usage, and doesn’t force the user to create a named variable:


That’s nice, right? I mean, aside from the obvious shortcoming that now you can’t tell whether getline succeeded or not. Oops. But even overlooking that, there’s an issue here.

Performance, Performance, Performance

You might think that because of move semantics, we don’t have to worry about the lousy performance of returning expensive collections by value, and you’d be right. Sort of. But consider this use of getline:

std::string buf;
while(std::getline(std::cin, buf))

Now consider what this code would be doing if, instead of taking buf as an out parameter, getline created a new string each time and returned it by value. Well, it’s creating a new string each time, duh. But the code above doesn’t do that. After a few times through the loop, buf will probably be big enough to hold whatever lines will be read next, and that space can be reused with no further allocations. Much, much faster.

Back To The Drawing Board

During GoingNative, Andrei left getline there. (It turns out he prefers a different design, and we’ll be arriving at a similar conclusion.) I wanted to continue the discussion. Out parameters are ugly and awkward to use, they hurt API composability, they force you to declare objects and initialize them in separate steps, they cause acne, etc. Surely something could be done!

I studied the problematic code some more:

std::string buf;
while(std::getline(std::cin, buf))

What is this code doing? It’s reading a bunch of lines and processing them one at a time, right? You might even say, it’s returning a range of lines. Then it hit me: std::getline is the wrong API! It should be called getlines (plural), and it should return a range of strings. Take a look:

for(std::string& buf : getlines(std::cin))

This API feels right-er to me. Not only is it easier to use (look ma! one fewer line!), it doesn’t force a two-step initialization of any objects, and ranges and range operations compose. (More on that later.) It also doesn’t suffer from the performance problems of my first attempt, although it takes some work to see why.

Lazy Ranges

What does my getlines function return? Surely it doesn’t fill in a std::vector of string‘s and return that. That would be (a) dumb, (b) expensive, and (c) impossible in practice since a potentially infinite number of lines could be read from an istream. Instead, getlines does something smarter: it returns a lazy range.

A lazy range is something that generates elements on demand. The STL already has such a thing: std::istream_iterator. You can create a range out of istream_iterators that pulls characters — or ints or whatever — from an istream on demand. We need something like that, but for lines.

Unfortunately, we can’t press istream_interator into service for us. Instead, we need to write our own iterator type, and build a valid range out of that. This is a painful and verbose programming exercise, but Boost.Iterator can help. It has some helpers that let you build iterators from a fairly minimal interface. Without further ado, here is the lines_iterator:

struct lines_iterator
  : boost::iterator_facade<
        std::string,            // value type
        std::input_iterator_tag // category
    lines_iterator() : psin_{}, pstr_{}, delim_{} {}
    lines_iterator(std::istream *psin,
                   std::string *pstr,
                   char delim)
        : psin_(psin), pstr_(pstr), delim_(delim)
    friend class boost::iterator_core_access;

    void increment()
        if(!std::getline(*psin_, *pstr_, delim_))
            *this = lines_iterator{};

    bool equal(lines_iterator const & that) const
        return pstr_ == that.pstr_;

    std::string & dereference() const
        return *pstr_;

    std::istream *psin_;
    std::string *pstr_;
    char delim_;

The magic happens when you increment a lines_iterator, which happens in lines_iterator::increment. std::getline is called, and it fills in a buffer referred to by pstr_. Note that it uses the same buffer every time. And when you dereference a lines_iterator, it returns a reference to that buffer. No copying, no unnecessary allocation.

Where does the buffer referred to by pstr_ live? In the lines_range object, which is returned by getlines.

using lines_range_base =

struct lines_range_data {std::string str_;};

struct lines_range
    : private lines_range_data, lines_range_base
    explicit lines_range(std::istream & sin,
                         char delim = 'n')
        : lines_range_base{
              lines_iterator{&sin, &str_, delim},

lines_range getlines(std::istream& sin, char delim = 'n')
    return lines_range{sin, delim};

lines_range is really just a boost::iterator_range of lines_iterators. Some contortion was needed to initialize the str_ member before the iterator_range constructor was called (hence the need for lines_range_data), but that’s just an implementation artifact.

The long and short of it is this: when you call getlines, you get back a lines_range object, which is basically a free operation. Now you can call .begin() and .end() on it, or directly iterate over it using a range-based for loop, like I showed. No more memory allocations are done using this interface than with the original std::getline API. Nice, eh?

Composability of Ranges and Range Algorithms

There’s lots of reasons to prefer the range-based getlines API — and range-based interfaces in general. The most immediate benefit is that people can use range-based for loops, as I showed above. But the real power comes once you start using range algorithms and range adaptors. Both Boost and Adobe’s ASL provide powerful utilities for working with ranges, and the C++ Standardization Committee has a working group dedicated to ranges for some future version of the standard. And for good reason! Range operations compose, so for instance you could do something like this:

// Read some lines, select the ones that satisfy
// some predicate, transform them in some way and
// echo them back out
        | boost::adaptors::filtered(some_pred)
        | boost::adaptors::transformed(some_func),
    std::ostream_iterator<std::string>(std::cout, "n"));

That’s strong stuff. I shudder to think what the equivalent code would look like with straight iterators and STL algorithms.

But what if you just want to read a single line? Doesn’t the new getlines hurt you for this simple usage scenario? Nope! All we need is one perfectly general function that returns the first element of a range. Let’s call it front:

using std::begin;

// return the front of any range    
template<typename Range>
auto front(Range && rng)
    -> decltype(boost::make_optional(*begin(rng)))
    for(auto x : rng)
        return x;
    return boost::none;

Since a range might be empty, we need to return an optional. Now you can read a single line from an istream like this:

if(auto s = front(getlines(std::cin)))

Compare this to the original and I think you’ll see it’s no worse:

std::string str;
if(std::getline(std::cin, str))

Stateful Algorithms

So have we completely addressed all of Andrei’s concerns with getline? Yes and no. Certainly we’ve fixed getline, but Andrei’s point was bigger. He was showing that you can’t just blindly pass and return by value, hoping that move semantics will magically make your programs faster. And that’s a valid point. I can’t say anything that changes that fact.

I think getline is a curious example because what looks at first blush like a pure out parameter is, in fact, an in/out parameter; on the way in, getline uses the passed-in buffer’s capacity to make it more efficient. This puts getline into a large class of algorithms that work better when they have a chance to cache or precompute something. And I can say something about that.

If your algorithm needs a cache or a precomputed data structure, then your algorithms are inherently stateful. One option is to pass the state in every time, as getline does. A better option is to encapsulate the state in some object that implements the algorithm. In our case, the state was the buffer and the object was the range. To take another case, Boyer-Moore search is faster than strstr because it precomputes stuff. In the Boost implementation, boyer_moore is a stateful function object that keeps its precomputed part private.


Here are the key take-aways:

  • If your algorithm runs faster with a cache or a precomputed data structure, encapsulate the state in an object that implements the algorithm, rather than forcing your users to pass the state in.
  • API design must be guided by the expected usage scenarios of the API, and also the common idioms of modern C++11.
  • Ranges are a powerful abstraction because operations on them compose.
  • Boost.Iterator and Boost.Range greatly simplify the job of implementing custom ranges.

Thanks for reading!


33 Replies to “Out Parameters, Move Semantics, and Stateful Algorithms”

  1. Having toyed with ranges (due to being unsatisfied with Boost.Range), I have to say that so far I have favoured non-reference element types for those ranges that ‘produce’ values. Conversely I use ranges over references when any reference ‘element’ can really outlive the current iteration: if something grabs a hold of an element, and the range is exhausted, the expectation is that the reference is still valid and meaningful. (Although I have to say I use that expectation for the user code, and I don’t need it in the range code proper.)

    But then to implement, say, filtered ranges I need to re-implement caching: between having fetched a value from the parent range, and before handing it off to the predicate to test it, I have to store it somewhere. If more composite ranges do that caching, and are used with one another, it stacks up. (Unless it doesn’t, if the compiler can see through. Oh well.)

    So it’s not just a matter of a trade-off between caching for efficiency, and passing values for ease of correctness (in user code). There’s also a concern of implementation.

    (And now my mind is full of ideas about a ‘peek’ operation for ranges, and a peek_range that transforms a non-peekable range into a peekable range. Thanks!)

    • I’m curious about the trouble you’ve had with Boost.Range. Please, please share, either here, or on the Boost developers list, or even on the mailing list for the Ranges standardization working group. Your experience is valuable.

      Regarding returning references from an input iterators’s operator*, I don’t see the problem. Can you clarify? The lines_iterator should probably return a std::string const&, but either way, the string lives in the lines_range so lifetime isn’t an issue. Returning by-value will reintroduce all the performance problems Andrei was complaining about.

      The caching of filter iterators, and fat iterators in general, are a problem. Have you seen N3752? It suggests something along the lines of cursors and property maps to implement ranges. That might be a way forward.

      • I didn’t intend to condemn Boost.Range with my comment, my apologies. My dissatisfaction stems from one of its core design, which is to interplay nicely with iterators. In my opinion that makes it really, really hard to write new ranges (and perhaps relatedly, Boost.Range comes with few ranges of its own — even something like zipping is still sitting in trac I think?).

        OTOH I understand completely the need to play nicely with all the preexisting iterator-based code (e.g. all of comes for free).

        My problem with returning a reference to a cached element is that consider writing a function that returns a tuple of the three elements of a range (we’ll assume that there are at least three elements as a precondition). You could decay everything and copy it into the tuple, but that kind of defensive copying is starting to grate me. I’d rather hold whichever element the range provides — then a client can choose to transform with a decaying, copying functor if they really want to get a copy.

        Or perhaps it’s a composite range that gives a sliding window of three consecutive elements, in a tuple (so you’d get an element, and both neighbours — useful for some algorithms I would say). But then if you start using that range amongst other composite ranges, it might not be obvious if or where an ‘untimely’ reference is passed.

        In the end I was interested to see what Andrei-style ranges would look like in C++11, and so I’m playing with that.

        • it might not be obvious if or where an ‘untimely’ reference is passed

          Ah, but it’s very easy to tell. If the range is an input range, you have to cache the values as they are produced. The sliding window adaptor you describe would be responsible for caching the values if it is adapting an input range. For forward ranges or better, it doesn’t have to. This is one of the reasons why the STL has a distinction between input iterators and forward iterators.

          • Yes, but also no. Because that discounts input ranges that may very well have a reference element type, which elements are in fact timely.

            In addition, I separated the traversal concepts (forward – bidirectional – random-access) from that of ‘repeatability’, whether a range is saveable (in D-speak) or not. Admittedly this is more out of curiosity than out of a genuine need for functionality, but if there’s a generic low-hanging fruit I’d like to reap it. More special cases is not what I want.

          • You’re not crazy for wanting to do this. The Boost.Iterators library is refined enough so that you can specify traversal and element access separately. Check out the docs for its new-style iterators. Maybe you’re looking for something like that?

  2. “At this point, Andrei gave up. The design of std::getline is satisfactory, don’t mess with it.”

    That’s quite unfair, if you read one of his comment in the reddit thread about the talk, he actually said that he prefer a range-based API and it’s what they use at facebook for getline !

    “That’s a range, so you’ll have an easy time convincing me that it’s better than both alternatives :o). My example, however, was focused on ref vs. value, as opposed to larger API issues.
    Incidentally we have a class at Facebook called exactly LineReader, and with some ancillary stuff it allows stuff like:
    for (auto& line : byLine(stream)) { … }
    where stream is an iostream, a FILE* etc. That idiom has the best efficiency*convenience measure in our codebase.”

    (see http://www.reddit.com/r/programming/comments/1m1izv/goingnative_2013_writing_quick_code_in_c_quickly/cc4zrb1)

    • Ah! I didn’t see the reddit thread, thanks. During GoingNative, Andrei didn’t mention ranges as a possible fix, but he’s a stout range proponent so I’m not surprised. I’ll add this discussion to the blog. Thanks.

  3. Nice article. I’ve similar API in my experimental project.

    However, I’ve this doubt:

    lines_iterator() = default;

    Does it ensure the members are value-initialized or default-initialized? If they’re default-initialized, then the pointer members (and other built-in types) will remain uninitialized and the equal() function will invoke undefined behavior when reading their value.

    • This is a very good question. A quick review of the standard wasn’t enlightening. I’ll get back to you when I know the answer.


      OK, you’re right. The relevant part of the latest draft (N3291) is 12.1/6:

      The implicitly-defined default constructor performs the set of initializations of the class that would be performed by a user-written default constructor for that class with no ctor-initializer (12.6.2) and an empty compound-statement.

      A ctor-initializer is the “: mem1(), mem2()” stuff that initializes members in a constructor. That is, by declaring lines_iterator() = default, I’m defining lines_iterator() {}, which leaves the pointers uninitialized! This is most definitely a bug, and I’ll fix it. Thanks for reporting it.

      • I use this technique, so I was surprised that you abandoned it in your code. Unfortunately, many things in C++ need you to reference several disjoint parts of the Standard. This is one of them. (Look at Section 8.5, Initializers.)

        If a class has a default constructor that is deleted, otherwise canceled, ambiguous, or inaccessible; both default- and value-initialization return an error.
        If a class uses a code-ful default constructor you wrote, both default- and value-initialization will use it.
        If a class uses a (mostly) code-less default constructor, default-initialization will use the random garbage bits already in the object’s region of storage. However, value-initialization will do zero-initialization first!

        Basically the answer to “Does ‘MyClass() = default‘ do default- or value-initialization?” is: both! Whichever initialization you do to the top-level object is what happens to its sub-objects. I use this in my complex-number class. This is an advantage over std::complex since I’m not forced to run the code to zero-fy my complex numbers before running them over with an external initialization function.

        Look at:

        MyClass a, b{}, c();

        Object a is a default-initialized object, b is value-initialized, and c is a function declaration. (You can only use the () for value-initialization when it wouldn’t trigger the “most vexing parse.”) The only way I see that you would gave up on “= default” for the default constructor is if you created your default-value objects like “a.” Well, don’t, you may get random garbage bits. Always use “b” when you need a zero-fied default-value object. (And you can use “b” in an r-value way.) This option is a big deal after C++98/03 not having it.

        So, go back to “lines_iterator() = default,” but make sure any use of that constructor is “line_iterator{}” or “line_iterator x{}.” Never use “line_iterator y” where y will be read without any initialization in between. It may work with code-ful class types, but for any other type.

    • Very cool. I was going to say a few words about how complicated it is to define iterators in C++, even with the help of Boost.Iterators, and how C#’s yield keyword makes it so trivial. Your solution with coroutines looks a lot like how the C# code would look. Nice.

      Sorry about the code comment trouble. Lemme try:

      #include <iostream>
      int main() {
          std::cout << "hello world" << std::endl;

      If the preview is to be believed, that works. The trick is to indent your code 4 spaces.

  4. Small suggestion: I would implement front as a range of zero or one element and call it in a for loop rather than inside an if.

    I’ve been just writing about infinite lazy lists in Haskell and comparing them to C++. As you mentioned, writing input iterators or ranges in C++ is hard. Your example was simple enough, but try to convert this Haskell program that generates Pythagorean triples to a lazy iterator.

    triples =
    [(x, y, z) | z <- [1..]
    , x <- [1..z]
    , y <- [x..z]
    , x * x + y * y == z * z]

    main = print $ take 4 triples

    You’ll have to use a state machine with accompanying inversion of control. I hope C++ adopts yield and simplifies it. I suspect that the general solutions would involve coalgebras and anamorphisms 😉

    • Re: front, that’s an interesting suggestion, but I’m not sure how I feel about it. It feels a little too cute to use a range for this when optional expresses the intent perfectly.

      Pythagorean triples sounds like an interesting little programming challenge. Maybe some ambitious soul reading this will show an elegant range-based solution for C++. That said, the yield keyword would be a nice addition. In lieu of that, there’s a coroutines library in Boost that would greatly simplify this problem.

      • Some languages or libraries regard optional values as a 0-to-1 range, or introduce a 0-to-1 range that’s as good as an optional value.

        That Pythagorean triples are regarded as a challenge is what I find embarrassing about the state of the art regarding ranges in C++. I wish it wasn’t one.

      • Think of front as take 1, where take n returns the list of the first n elements. The problem with optional is that you might accidentally dereference it without checking (I presume, it throws an exception if it’s empty). If front returned a range then you’d be pretty much forced to use a loop, which is safe. Maybe is safe in Haskell because it (pretty much) forces you to pattern match — it’s harder to miss Nothing.

        • Rather than using optional on front I would rather use an empty() function. Right off I’d prefer that be part of the range concept itself, but it could be done in this same style with a generic free function:

          template &lt;typename Range&gt;
          bool empty(Range&amp;&amp; rng)
             return begin(rng) == end(rng);

          Now we can test explicitly:

          auto lines = getlines(std::cin);
          if (!empty(lines))

          Where front() no longer returns an optional but maps to the same function as std::vector or std::list.

          This doesn’t address your safety concern but I think being able to directly test for an empty set should be part of the range abstraction in general.

          • Your suggestion feels very much in the spirit of C++. And I have no doubt this is what the standardization committee would want. It’s really a very interesting question. I like optional because I don’t have to save the result of getlines anywhere before I use it. But in some sense, optional is messy, and there will certainly be some applications where you know the range is non-empty and you want to be able to call front without the check. Then again, Bartosz’ solution holds some appeal for me. But then again, even Haskell’s head function has undefined behavior if called on an empty list. What to do?

          • @Eric’s reply, regarding “what to do”:

            Usually I think you would be using an algorithm that acts on a Range (for_each, map, reduce, etc) and not need to handle the empty case directly..

            When you do need to, perhaps storing in local scope isn’t the worst price to pay. You could also trivially write an optional_front() helper on top of empty() and front().

            Or, perhaps use a helper like “if_not_empty”:

                if_not_empty(getlines(std::cin), [](auto lines) {
        • Regarding front() returning an optional.. std::vector front() is undefined in the vector is empty. Why return an optional which you then have to check? Why not just check the original range if it is empty in the first place? So, I suggest front() be undefined in the Range is empty, which I think is also likely an important criteria if on-par performance is considered importnat.

  5. Pingback: Eric Niebler: out parameters, move semantics, and stateful algorithms | The other branch

  6. I always knew® that this was the right way to do it. Even in C++98. But after implementing the “iterator_facade” part I always got stuck with the ‘.begin()/.end()’ part. This closes the circle with the iterator_range class. Thank you Eric. I have two small questions, 1) How is it that .begin()/.end() is automatically handled by iterator_range? 2) What would you do for lazy sequence that even in principle has no end? or for “converging” lazy sequences whose end can be defined in practice by a convergence criterion (I know this is a mathematical matter, but how (or even if) would you do it if the convergence criterion is well defined.

    • 1) How is it that .begin()/.end() is automatically handled by `iterator_range`?

      Just look at the docs for iterator_range. There you can see public .begin() and .end() member functions, that lines_range acquires through inheritance. Stylistically, it’s a bit sloppy to publicly inherit implementation like that, but it’s quick and easy.

      2) What would you do for lazy sequence that even in principle has no end?

      lines_range is a lazy sequence that even in principle has no end. You can keep sending lines to lines_range till the cows come home, and the for loop will keep looping blissfully forever.

      … or for “converging” lazy sequences whose end can be defined in practice by a convergence criterion

      I think lines_range is also a sequence like this, the convergence criterion being: when std::getline returns an istream in a bad state. You can define an InputIterator like lines_iterator that checks any ol’ predicate, mathematical or not, and moves itself into the end iterator’s state, signifying the end of the sequence.

      Hope that helps.

  7. I was thinking about this further, and came up with this simple solution:


    I suspect its relying on undefined behaviour, but all tested compilers are producing such odd (and correct) output, that I’m almost tempted to believe there may be a rule I’m unaware of that makes it correct.

    Can anyone take a look and let me know if returning a reference to a temporary created using function default arguments yields dangling reference or if the lifetime is extended? The output suggests it is extended (take a look at when destructors are called, especially in the “piped” test case)

    • Interesting thought, but it’s a dangerous interface. Here’s why. Given a function like:

      template<typename X = std::string>
      X&& foo(X&& x = X{}) { return std::forward<X>(x); }

      you are allowed to do this:

      std::cout << foo() << std::endl;

      It’s exactly equivalent to:

      std::cout << foo(std::string{}) << std::endl;

      The temporary string is constructed at the call site and has a lifetime of the full expression, so it survives long enough to be streamed out.

      But here’s the problem:

      std::string const & s = foo();
      std::cout << s << std::endl;

      The trouble is, the hidden temporary object constructed on the first line is not directly bound to the const reference, so it doesn’t have its lifetime extended. As a result, the object s refers to dies after the first statement, and so the second statement is undefined behavior.


  8. Like almost every other C++ developer I encountered the problem line mode iterator missing from language. I really like your solution, someone (Stepanov?) should have made it standard before.

    There are some adjustments I consider for version I will use.

    Some of your members are externally owned dumb pointers which is not something you normally want. I’d rather wrap iostream in container.

    struct line_stream_container {
    std::istream& stream;
    std::string current;
    //TODO: constructors and swap

    And inside lines_iterator

    class lines_iterator {
    //Your code ommited
    line_stream_container& container;
    linea_iterator(line_stream_container& c) container(c) {
    const std::string& dereference () const noexcept {
    return container.current;
    //More code ommited

    This clarifies that iterator is valid as long as underlying container is valid, there is still a problem of two iterators using the same string, referenced string so we must somehow make it clear that this iterator is input iterator.

    P.S: I don’t like the term “input iterator”. It is not a real iterator. I beleive better technical term for “input iterator” is “sequencer”.

    • Some of your members are externally owned dumb pointers which is not something you normally want.

      I disagree that there is anything wrong with using a raw pointer as I did. I read std::string * as: this is a pointer to a string defined and owned elsewhere. Using a reference as you do changes the semantics. First of all, your iterator no longer gets a compiler-generated assignment operator, and defining it yourself becomes problematic because you can’t rebind a reference. When using a pointer like I do, the compiler-generated copy and assign operators Just Work and do the right thing.

      And check out my next post which shows a better way to define this range and its iterator type.

  9. P.S: for my previous post. “Sequencer” is a bad name, “Generator” is much better. Such input iterator is semantically closer to generators in than real iterators over containers.

  10. Another method to create this sort of functionality relies upon imbuing the associated istream with a modified locale (and just using the associate std::istream_iterator<std::string>). Although modifying the locale changes the associated behaviour of the input stream (this can be reset easily).

    So, the object below sets up the std::locale::cctype:

    struct line_reader: std::ctype<char> {
    line_reader(): std::ctype<char>(get_table()) {}
    static std::ctype_base::mask const* get_table() {
    static std::vector<std::ctype_base::mask>
    rc(table_size, std::ctype_base::mask());

    rc['n'] = std::ctype_base::space;
    return &rc[0];

    If you are using std::cin, then to imbue it with the modified locale:
    std::cin.imbue(std::locale(std::locale(), new line_reader()));

    Finally, the line reader iterator is:
    and the terminating iterator is:

  11. Pingback: Meeting C++ 2013 « Домик Миа

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.