Structured Concurrency

Posted on November 8, 2020 by Eric Niebler — 28 Comments ↓

TL;DR: “Structured concurrency” refers to a way to structure async computations so that child operations are guaranteed to complete before their parents, just the way a function is guaranteed to complete before its caller. This sounds simple and boring, but in C++ it’s anything but. Structured concurrency — most notably, C++20 coroutines — has profound implications for the correctness and the simplicity of async architecture. It brings the Modern C++ style to our async programs by making async lifetimes correspond to ordinary C++ lexical scopes, eliminating the need for reference counting to manage object lifetime.

Structured Programming and C++

Back in the 1950’s, the nascent computing industry discovered structured programming: that high-level programming languages with lexical scopes, control structures, and subroutines resulted in programs that were far easier to read, write, and maintain than programming at the assembly level with test-and-jump instructions and goto. The advance was such a quantum leap that nobody talks about structured programming anymore; it’s just “programming”.

C++, more so than any other language, leverages structured programming to the hilt. The semantics of object lifetime mirror — and are tied to — the strict nesting of scopes; i.e., the structure of your code. Function activations nest, scopes nest, and object lifetimes nest. Objects’ lifetimes end with a scope’s closing curly brace, and objects are destroyed in the reverse order of their construction to preserve the strict nesting.

The Modern C++ programming style is built on this structured foundation. Objects have value semantics — they behave like the ints — and resources are cleaned up in destructors deterministically, which guarantees structurally that resources aren’t used after their lifetimes have ended. This is very important.

When we abandon this strict nesting of scopes and lifetimes — say, when we reference count an object on the heap, or when we use the singleton pattern — we are fighting against the strengths of the language rather than working with them.

The Trouble With Threads

Writing correct programs in the presence of concurrency is far more difficult than in single-threaded code. There are lots of reasons for this. One reason is that threads, like singletons and dynamically allocated objects, scoff at your puny nested scopes. Although you can use the Modern C++ style within a thread, when logic and lifetimes are scattered across threads, the hierarchical structure of your program is lost. The tools we use to manage complexity in single-threaded code — in particular, nested lifetimes tied to nested scopes — simply don’t translate to async code.

To see what I mean, let’s look at what happens when we take a simple synchronous function and make it asynchronous.

void computeResult(State & s);

int doThing() {
  State s;
  computeResult(s);
  return s.result;
}

doThing() is simple enough. It declares some local state, calls a helper, then returns some result. Now imagine that we want to make both functions async, maybe because they take too long. No problem, let’s use Boost futures, which support continuation chaining:

boost::future<void> computeResult(State & s);

boost::future<int> doThing() {
  State s;
  auto fut = computeResult(s);
  return fut.then(
    [&](auto&&) { return s.result; }); // OOPS
}

If you’ve programmed with futures before, you’re probably screaming, “Nooooo!” The .then() on the last line queues up some work to run after computeResult() completes. doThing() then returns the resulting future. The trouble is, when doThing() returns, the lifetime of the State object ends, and the continuation is still referencing it. That is now a dangling reference, and will likely cause a crash.

What has gone wrong? Futures let us compute with results that aren’t available yet, and the Boost flavor lets us chain continuations. But the continuation is a separate function with a separate scope. We often need to share data across those separate scopes. No more tidy nested scopes, no more nested lifetimes. We have to manage the lifetime of the state manually, something like this:

boost::future<void>
computeResult(shared_ptr<State> s); // addref
                                    // the state

boost::future<int> doThing() {
  auto s = std::make_shared<State>();
  auto fut = computeResult(s);
  return fut.then(
    [s](auto&&) { return s.result; }); // addref
                                       // the state
}

Since both async operations refer to the state, they both need to share responsibility to keep it alive.

Another way to think about this is: what is the lifetime of this asynchronous computation? It starts when doThing() is called, but it doesn’t end until the continuation — the lambda passed to future.then() — returns. There is no lexical scope that corresponds to that lifetime. And that is the source of our woes.

Unstructured Concurrency

The story gets more complicated yet when we consider executors. Executors are handles to executions contexts that let you schedule work onto, say, a thread or thread pool. Many codebases have some notion of an executor, and some let you schedule things with a delay or with some other policy. This lets us do cool things, like move a computation from an IO thread pool to a CPU thread pool, or retry an async operation with a delay. Handy, but like goto it is a very low-level control structure that tends to obfuscate rather than clarify.

For instance, I recently came across an algorithm that uses executors and callbacks (called Listeners here) that retries the async allocation of some resource. Below is a greatly abridged version. It is described after the break.

// This is a continuation that gets invoked when
// the async operation completes:
struct Manager::Listener : ListenerInterface {
  shared_ptr<Manager> manager_;
  executor executor_;
  size_t retriesCount_;

  void onSucceeded() override {
    /* ...yay, allocation succeeded... */
  }
  void onFailed() override {
    // When the allocation fails, post a retry
    // to the executor with a delay
    auto alloc = [manager = manager_]() {
      manager->allocate();
    };
    // Run "alloc" at some point in the future:
    executor_.execute_after(
      alloc, 10ms * (1 << retriesCount_));
  }
};

// Try asynchronously allocating some resource
// with the above class as a continuation
void Manager::allocate() {
  // Have we already tried too many times?
  if (retriesCount_ > kMaxRetries) {
    /* ...notify any observers that we failed */
    return;
  }

  // Try once more:
  ++retriesCount_;
  allocator_.doAllocate(
    make_shared<Listener>(
      shared_from_this(),
      executor_,
      retriesCount_));
}

The allocate() member function first checks to see if the operation has already been retried too many times. If not it calls a helper doAllocate() function, passing in a callback to be notified on either success or failure. On failure, the handler posts deferred work to the executor, which will call allocate() back, thus retrying the allocation with a delay.

This is a heavily stateful and rather circuitous async algorithm. The logic spans many functions and several objects, and the control and data flow is not obvious. Note the intricate ref-counting dance necessary to keep the objects alive. Posting the work to an executor makes it even harder. Executors in this code have no notion of continuations, so errors that happen during task execution have nowhere to go. The allocate() function can’t signal an error by throwing an exception if it wants any part of the program to be able to recover from the error. Error handling must be done manually and out-of-band. Ditto if we wanted to support cancellation.

This is unstructured concurrency: we queue up async operations in an ad hoc fashion; we chain dependent work, use continuations or “strand” executors to enforce sequential consistency; and we use strong and weak reference counts to keep data alive until we are certain it’s no longer needed. There is no formal notion of task A being a child of task B, no way to enforce that child tasks complete before their parents, and no one place in the code that we can point to and say, “Here is the algorithm.”

If you don’t mind the analogy, the hops through the executor are a bit like goto statements that are non-local in both time and space: “Jump to this point in the program, X milliseconds from now, on this particular thread.”

That non-local discontinuity makes it hard to reason about correctness and efficiency. Scale unstructured concurrency up to whole programs handling lots of concurrent real-time events, and the incidental complexity of manually handling out-of-band asynchronous control and data flow, controlling concurrent access to shared state, and managing object lifetime becomes overwhelming.

Structured Concurrency

Recall that in the early days of computing, unstructured programming styles rapidly gave way to structured styles. With the addition of coroutines to C++, we are seeing a similar phase shift happening today to our asynchronous code. If we were to rewrite the above retry algorithm in terms of coroutines (using Lewis Baker’s popular cppcoro library), it might look something like this:

// Try asynchronously allocating some resource
// with retry:
cppcoro::task<> Manager::allocate() {
  // Retry the allocation up to kMaxRetries
  // times:
  for (int retriesCount = 1;
       retriesCount <= kMaxRetries;
       ++retriesCount) {
    try {
      co_await allocator_.doAllocate();
      co_return; // success!
    } catch (...) {}

    // Oops, it failed. Yield the thread for a
    // bit and then retry:
    co_await scheduler_.schedule_after(
      10ms * (1 << retriesCount));
  }

  // Error, too many retries
  throw std::runtime_error(
    "Resource allocation retry count exceeded.");
}

Aside: This replaces the executor_ with a scheduler_ that implements cppcoro’s DelayedScheduler concept.

Let’s list the ways in which this is an improvement:

It’s all in one function! Good locality.
The state (like retriesCount) can be maintained in local variables instead of as members of objects that need to be ref-counted.
We can use ordinary C++ error handling techniques.
We are guaranteed structurally that the async call to allocator_.doAllocate() completes before this function continues executing.

Point (4) has profound implications. Consider the trivial example from the beginning of the article. The following re-implementation in terms of coroutines is perfectly safe:

cppcoro::task<> computeResult(State & s);

cppcoro::task<int> doThing() {
  State s;
  co_await computeResult(s);
  co_return s.result;
}

The above code is safe because we know that computeResult completes before doThing is resumed and thus before s is destructed.

With structured concurrency, it is perfectly safe to pass local variables by reference to child tasks that are immediately awaited.

Cancellation

Taking a structured approach to concurrency, where the lifetime of concurrent operations is strictly nested within the lifetime of resources that it uses and is tied to program scopes, allows us to avoid needing to use garbage collection techniques like shared_ptr to manage lifetime. This can lead to code that is more efficient, requiring fewer heap-allocations and fewer atomic reference-counting operations, as well as code that is easier to reason about and is less bug-prone. However, one implication of this approach is that it means that we must always join and wait for child operations before the parent operation can complete. We can no longer just detach from those child operations and let the resources get cleaned up automatically when their ref-counts drop to zero. To avoid having to wait unnecessarily long times for child operations whose results are no longer needed, we need a mechanism to be able to cancel those child operations so that they complete quickly. Thus the structured concurrency model requires deep support for cancellation to avoid introducing unnecessary latency.

Note that we rely on structured lifetime and structured concurrency every time we pass a local variable to a child coroutine by reference. We must ensure that the child coroutine has completed and is no longer using that object before the parent coroutine exits the scope of that local variable and destroys it.

Structured Concurrency > Coroutines

When I talk about “structured concurrency,” I am not just talking about coroutines — although that is its most obvious manifestation. To see what I mean, let’s talk briefly about what coroutines are and what they are not. In particular, there is nothing inherently concurrent about C++ coroutines at all! They are really just a way to get the compiler to carve your function up into callbacks for you.

Consider the simple coroutine above:

cppcoro::task<> computeResult(State & s);

cppcoro::task<int> doThing() {
  State s;
  co_await computeResult(s);
  co_return s.result;
}

What does co_await here mean? The trite answer is: it means whatever the author of cppcoro::task<> wants it to mean (within certain bounds). The fuller answer is that co_await suspends the current coroutine, bundles up the rest of the coroutine (here, the statement co_return s.result;) as a continuation, and passes it to the awaitable object (here, the task<> returned by computeResult(s)). That awaitable will typically store it somewhere so it can be invoked later, when the child task completes. That’s what cppcoro::task<> does, for instance.

In other words, the task<> type and the coroutines language feature conspire together to layer “structured concurrency” on top of boring ol’ callbacks. That’s it. That’s the magic. It’s all just callbacks, but callbacks in a very particular pattern, and it is that pattern that makes this “structured.” The pattern ensures that child operations complete before parents, and that property is what brings the benefits.

Once we recognize that structured concurrency is really just callbacks in a particular pattern, we realize that we can achieve structured concurrency without coroutines. Programming with callbacks is nothing new, of course, and the patterns can be codified into a library and made reusable. That’s what libunifex does. If you follow C++ standardization, it is also what the sender/receiver abstraction from the Executors proposal does.

Using libunifex as a basis for structured concurrency, we can write the example above as follows:

unifex::any_sender_of<> computeResult(State & s);

auto doThing() {
  return unifex::let_with(
    // Declare a "local variable" of type State:
    [] { return State{}; },
    // Use the local to construct an async task:
    [](State & s) {
      return unifex::transform(
        computeResult(s),
        [&] { return s.result; });
    });
}

Why would anybody write that when we have coroutines? You would certainly need a good reason, but I can think of a few. With coroutines, you have an allocation when a coroutine is first called, and an indirect function call each time it is resumed. The compiler can sometimes eliminate that overhead, but sometimes not. By using callbacks directly — but in a structured concurrency pattern — we can get many of the benefits of coroutines without the tradeoffs.

That style of programming makes a different tradeoff, however: it is far harder to write and read than the equivalent coroutine. I think that >90% of all async code in the future should be coroutines simply for maintainability. For hot code, selectively replace coroutines with the lower-level equivalent, and let the benchmarks be your guide.

Concurrency

I mention above that coroutines aren’t inherently concurrent; they’re just a way of writing callbacks. Coroutines are inherently sequential in nature and the laziness of task<> types — where a coroutine starts suspended and doesn’t start executing until it is awaited — means that we can’t use it to introduce concurrency in the program. Existing future-based code often assumes that the operation has already started eagerly, introducing ad hoc concurrency that you need to be careful to prune back. That forces you to re-implement concurrency patterns over and over in an ad hoc fashion.

With structured concurrency, we codify concurrency patterns into reusable algorithms to introduce concurrency in a structured way. For instance, if we have a bunch of tasks and would like to wait until they have all completed and return their results in a tuple, we pass them all to the cppcoro::when_all and co_await the result. (Libunifex also has a when_all algorithm.)

At present, neither cppcoro nor libunifex has a when_any algorithm, so you can’t launch a bunch of concurrent operations and return when the first one completes. It’s a very important and interesting foundational algorithm, though. To maintain the guarantees of structured concurrency, when the first child task completes, when_any should request cancellation on all the other tasks and then wait for them all to finish. The utility of this algorithm depends upon all async operations in your program responding promptly to cancellation requests, which demonstrates just how important deep support for cancellation is in modern async programs.

Migration

So far, I’ve discussed what structured concurrency is and why it matters. I haven’t discussed how we get there. If you are already using coroutines to write async C++, then congrats. You may keep enjoying the benefits of structured concurrency, perhaps with a deeper understanding and appreciation for why coroutines are so transformative.

For codebases that lack structured concurrency, deep support for cancellation, or maybe even an abstraction for asynchrony, the job is hard. It may even start with introducing complexity in order to carve out an island in which the surrounding code provides the guarantees that structured concurrency patterns require. This includes, for instance, creating the impression of prompt cancellation of scheduled work, even when the underlying execution contexts don’t offer that directly. That added complexity can be isolated in a layer, and the islands of structured concurrency can be built on top. Then the simplifying work can begin, taking future- or callback-style code and converting them to coroutines, teasing out parent/child relationships, ownership, and lifetime.

Summary

Adding co_await makes a synchronous function asynchronous, without disturbing the structure of the computation. The async operation being awaited necessarily completes before the calling function does, just like ordinary function calls. The revolution is: nothing changes. Scopes and lifetimes still nest as they always have, except now the scopes are discontinuous in time. With raw callbacks and futures, that structure is lost.

Coroutines, and structured concurrency more generally, bring the advantages of the Modern C++ style — value semantics, algorithm-driven design, clear ownership semantics with deterministic finalization — into our async programming. It does that because it ties async lifetimes back to ordinary C++ lexical scopes. Coroutines carve our async functions up into callbacks at suspension points, callbacks that get called in a very specific pattern to maintain that strict nesting of scopes, lifetimes, and function activations.

We sprinkle co_await in our code and we get to continue using all our familiar idioms: exceptions for error handling, state in local variables, destructors for releasing resources, arguments passed by value or by reference, and all the other hallmarks of good, safe, and idiomatic Modern C++.

Thanks for reading.

If you want to hear more about structured concurrency in C++, be sure to check out Lewis Baker’s CppCon talk from 2019 about it.

"\e"

28 Replies to “Structured Concurrency”

encyclopedist on November 9, 2020 at 2:49 am said:

Hi, Eric!
Currently, your website is super slow to load. Loading this page currently takes ~20 s for me.
One of the reasons may be that the banner image is 5,759px × 2,390px in size and weighs 558 kB.
Also it looks like it loads some stuff in blocking way (ironic for an article for concurrency, isn’t it?)
You can get more info using Google PageSpeed

Reply ↓
- Eric Niebler on November 9, 2020 at 7:09 am said:
  
  Thanks for letting me know. I’ll see what I can do.
  
  Reply ↓
Guy Davidson on November 9, 2020 at 11:32 am said:

This is an EXCELLENT post Eric. Many thanks for clarifying some of the knottier concepts still bewildering me about this material.

Reply ↓
Austin McCartney on November 9, 2020 at 8:03 pm said:

I recently took some time to dig into the unifex implementation and learned a lot.

My mental model is that there is a distinction between base senders (e.g. just) and sender “decorators” (e.g. transform_sender). On connect, the decoration layers are pealed off the sender in LIFO order, applying a corresponding “decoration” to the incoming receiver (e.g. transform_receiver).

Unless I overlooked something, the same sorts of decorator factory functions (transform) weren’t available for receivers; I wasn’t sure why. I can imagine cases where it might be convenient to call

return transform(fn) | receiver;

or even compose decorations ad-hoc

return transform(f) | let(g);

The implementation of the let algorithm is brilliant! I went in expecting the operation state to manage the lifetime of the arguments, but the dance between the predecessor and successor nested operation states was really something to see! This is where the use of the manual lifetime classes finally clicked for me.

Forking flows are still a challenge. Lewis was totally right that let could meet the need (so long as the control flow merged before the end of the let callable’s context anyway), but it has the unwanted side effect keeping data alive longer than it might otherwise need to be. I’m not sure there’s a way around that so long a connect is defined between a single sender and a single receiver only.

Are there plans for a (recursive?) sum type over senders? Currently, it doesn’t seem like certain types of recursive divide an conquer algorithms (e.g. parallel quicksort) are inexpressible. Or maybe I’ve overlooked something.

Is there a possibility to allow the set_value CPO to return a sender-specified type to indicate failure (like an std::error_condition or something)? It looks like there are places where a nested receiver needs to communicate failure to its enclosing decorator.

https://github.com/facebookexperimental/libunifex/blob/master/include/unifex/let.hpp#L64

Currently, this is handled by throwing an exception. When errors are expected and/or common, that seems quite expensive. When exceptions are disabled, it terminates (which isn’t very nice). Using the return value as to signal error seems like a reasonable alternative for some use cases.

Reply ↓
- Austin McCartney on November 9, 2020 at 8:05 pm said:
  
  Whoops
  
  Currently, it doesn’t seem like certain types of recursive divide an conquer algorithms (e.g. parallel quicksort) are inexpressible.
  
  should read
  
  Currently, it certain types of recursive divide an conquer algorithms (e.g. parallel quicksort) seem inexpressible.
  
  Reply ↓
- Eric Niebler on November 10, 2020 at 10:26 am said:
  
  Unless I overlooked something, the same sorts of decorator factory functions (transform) weren’t available for receivers; I wasn’t sure why.
  
  By and large, receivers are implementation details of generic algorithms. The end-user consuming async operations really only needs to deal with senders and algorithms that transform senders. There is really no need to provide utilities for working with receivers.
  
  The implementation of the let algorithm is brilliant!
  
  So much of the implementation of libunifex tickles my brain in a fun way. I should say, most of it has been written by my team, not by me: Lewis Baker, Kirk Shoop, and Lee Howes.
  
  Are there plans for a (recursive?) sum type over senders?
  
  That’s an interesting suggestion. I know we’ve discussed something like a variant sender, but I hadn’t considered permitting it to be recursive. How would a parallel quicksort make use of it?
  
  It looks like there are places where a nested receiver needs to communicate failure to its enclosing decorator.
  
  Although in the current model, it’s possible for set_value to exit with an exception, that’s not really necessary for a receiver to signal a failure — it could simply decide to call set_error on what ever receiver it happens to wrap — and in the sender/receiver model, errors logically don’t propagate from receiver to sender, anyway. They logically propagate the other direction. We are considering requiring set_value to be noexcept also, just like set_error and set_done.
  
  Reply ↓
Jeff Petkau on November 10, 2020 at 11:08 am said:

A similar and very nice explanation of why structured concurrency is good is over at https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/, talking about Python’s trio library.

(You’ve probably read it already, but pointing to it for anyone else interested in the topic. I found it really helpful.)

Reply ↓
- Eric Niebler on November 10, 2020 at 12:23 pm said:
  
  Yes, I really enjoyed that post!
  
  Reply ↓
Jiayuehua on November 10, 2020 at 9:11 pm said:

“The allocate() function can’t signal an error by throwing an exception if it wants any part of the program to be able to recover from the error. Error handling must be done manually and out-of-band. Ditto if we wanted to support cancellation. ”
I don’t understand that. Does do_allocate like asio async_read? And listener like completed function. Does the quote mean allocate already return when do asynchronous operation so that it doesn’t know whether the operation success or fail? So it can’t throw exception.

Reply ↓
Austin McCartney on November 10, 2020 at 11:27 pm said:

That’s an interesting suggestion. I know we’ve discussed something like a variant sender, but I hadn’t considered permitting it to be recursive. How would a parallel quicksort make use of it?

Non-recursive is definitely useful in and of itself of course, e.g.

variant<sender_of<tuple<int>>, sender_of<tuple<double>>> -> sender_of<tuple<int>, tuple<double>>>

Starting from an implementation of using a type-erased sender type

https://godbolt.org/z/Y97dn9

I would anticipate replacing the any_sender_of here with some sort of recursive variant. Because the sender factory passed to let is recursive, I suspect that variant would need to be self-referential (at least in the absence of type erasure). That said, I have no earthly idea how to go about spelling the name of that type.

This example also demonstrates a hang-up with using let to accommodate forking: we unnecessarily extend the lifetimes of a bunch of intermediate subrange objects until the end of the parallel region.

Or have I applied the tools in library naively?

Reply ↓
- Austin McCartney on November 11, 2020 at 5:12 am said:
  
  Oh geez, so many typos…
  
  Starting from an implementation of using a type-erased sender type
  
  should read
  
  Starting from an implementation of three-way quicksort using a type-erased sender type
  
  and
  
  Or have I applied the tools in library naively?
  
  should read
  
  Or have I applied the tools in unifex naively?
  
  Reply ↓
Dave Abrahams on December 1, 2020 at 10:31 pm said:

Hi Eric, great article!

A few things are opaque to me though. You talk in several places about

deep support for cancellation

but it’s not really clear what you mean by that. Care to elaborate? Thanks!

More questions to come

Reply ↓
- Eric Niebler on December 2, 2020 at 2:18 pm said:
  
  deep support for cancellation
  
  What I mean by that is that async tasks should respond promptly to stop requests. “Respond promptly” here means: stop what they are doing and signal that they have stopped by calling an I-have-stopped continuation (set_done on the receiver). Only in systems where tasks respond promptly to cooperative cancellation requests can a structured async algorithm like when_any make sense.
  
  Reply ↓
  - Dave Abrahams on December 2, 2020 at 6:01 pm said:
    
    Oh sure. I thought because of “support” that you were talking about a language or library feature, but this really just means all async code doing significant computation needs to be written with periodic cancellation checks—not too often, but often enough. Seems like a pain to get right, and there oughta be some way of reducing that burden. Also I hope there’s a way to genericize over async so we don’t need a nearly-identical async copy of every synchronous algorithm
    
    Reply ↓
    - Eric Niebler on December 2, 2020 at 7:50 pm said:
      
      Oh sure. I thought because of “support” that you were talking about a language or library feature,
      
      There’s no magic bullet here, sadly. There are fundamental library types, like C++20’s std::stop_token, and certain patterns that help. We also have a few levers to pull in coroutines and in the design of the customization points to transparently convey context like stop tokens and schedulers from parent to child, but at the end of the day, there will be some polling somewhere to detect that stop has been requested and quit early.
      
      but this really just means all async code doing significant computation needs to be written with periodic cancellation checks
      
      Pretty much. It is also possible for a task to register a callback with a stop token so that when stop is requested, some action is taken to proactively interrupt the task.
      
      Seems like a pain to get right, and there oughta be some way of reducing that burden.
      
      It’s fiddly if you are mucking about with the primitives a lot. We have async algorithms like stop_when that capture a lot of the fiddly details for you so you can program at a higher level.
      
      Also I hope there’s a way to genericize over async so we don’t need a nearly-identical async copy of every synchronous algorithm
      
      You should join us! I would love to know how to unify sync and async algorithms. 🙂
      
      Reply ↓
      - Dave Abrahams on December 9, 2020 at 10:44 am said:
        
        You should join us!
        
        Who is us? Join you where?
      - Eric Niebler on December 9, 2020 at 12:33 pm said:
        
        Me, Lewis Baker, Kirk Shoop and Lee Howes. I mean join us thinking about these problems, designing a universal async abstraction for native languages … or as universal as possible. I’m not suggestion you change employers.
      - Dave Abrahams on December 9, 2020 at 6:52 pm said:
        
        I mean join us thinking about these problems, designing a universal async abstraction for native languages … or as universal as possible.
        
        Sure, you mean you don’t have it all figured out already? What are the open problems?
        
        I think it would be really interesting to implement some of these abstractions in Swift, especially now that async/await support (https://forums.swift.org/t/se-0296-async-await/42605) is coming into the language. Amusingly, one of the constituent proposals has the same title as this blog post (https://github.com/DougGregor/swift-evolution/blob/structured-concurrency/proposals/nnnn-structured-concurrency.md).
        
        I’m not suggestion you change employers.
        
        i.e., “no, I’m not offering you a job, Dave” 😉
      - Eric Niebler on December 15, 2020 at 10:31 am said:
        
        What are the open problems?
        
        There are a few really interesting remaining design questions. The first is cancellation, which is a bidirectional communication channel: I request than an async operation stop, and the async operation tells me when it has stopped. The sender/receiver design specifies the second half of that handshake, but leaves the first half unspecified. That’s a problem for composability. Some algorithms want to be transparent to cancellation, and others want to introduce a cancellation scope. That’s not possible generically if each generic task is using its own home-grown, ad hoc mechanism for requesting cancellation. We do know the shape of the solution here. Lewis Baker has a proposal in the upcoming mailing.
        
        The second and thornier open design question is some form of async cleanup, or async RAII. Imagine that you have some async task that you would like to run at the end of some scope. How do you do it? Destructors are required to be synchronous because of the interaction with unwind. In the case we’re interested in, the “scope” is itself async. So imagine you are in a coroutine and you want to run an async op at the end of a scope, whether the coroutine exits normally, with an exception, or is cancelled. Ideally, we take this to its logical conclusion, and we want a different async action to be taken for the three cases: success, cancel, or error. And now imagine we’re talking about senders instead of coroutines. That’s the problem in a nutshell.
        
        There are other problems too, but they are more minor or else have workarounds. C++ coroutines have a nice feature called “symmetric transfer,” where a coroutine being suspended can return a handle to a coroutine that should be resumed, and that coroutine is resumed without consuming any additional stack space. Basically, it’s a guaranteed tail call. That makes it possible to implement some interesting algorithms where two coroutines resume each other “recursively” but without blowing the stack. There is no equivalent in sender/receiver. It’s possible that a refactorization of the concepts would permit something similar. Without tail call, sender/receiver algorithms must use a “trampoline” scheduler, which periodically unwinds the stack and reschedules work back to an event loop to keep stack growth bounded. Not ideal, but also not the end of the world.
        
        Amusingly, one of the constituent proposals has the same title as this blog post
        
        That’s not surprising. “Structured concurrency” is a term of art. I didn’t come up with it. The term “structured multiprogramming” is actually older than I am. https://dl.acm.org/doi/10.1145/361454.361473
Dave Abrahams on December 1, 2020 at 10:40 pm said:

I think that >90% of all async code in the future should be coroutines simply for maintainability. For hot code, selectively replace coroutines with the lower-level equivalent, and let the benchmarks be your guide.

This seems to be suggesting that libunifex should be considered lower-level and less of a default than open-coded async function calls, but that seems inconsistent with other things you’ve told me. I thought open-coded async calls were like the ”raw for loops“ to libunifex’s “algorithms.” Care to clarify?

Reply ↓
- Eric Niebler on December 2, 2020 at 2:27 pm said:
  What do you mean by “open coded async calls”? Do you mean coroutines?
  
  I may have conflated a few things in my discussion of libunifex. Libunifex is a few different things:
  - A collection of concepts like sender, receiver, and scheduler that make async algorithms possible.
  - A collection of concrete types that satisfy those concepts, like timed_single_thread_context.
  - A collection of async algorithms implemented in terms of the concepts.
  I think the concepts of sender/receiver are lower-level than C++20 coroutines. They make more designs expressible, and more efficiently, than coroutines do. The algorithms, on the other hand, are higher-level than coroutines in just the same way that std::transform is higher-level than a for loop.
  Reply ↓
  - Dave Abrahams on December 2, 2020 at 6:28 pm said:
    
    What do you mean by “open coded async calls”? Do you mean coroutines?
    
    Yes
    
    Reply ↓
Dave Abrahams on December 1, 2020 at 10:44 pm said:

Existing future-based code often assumes that the operation has already started eagerly, introducing ad hoc concurrency that you need to be careful to prune back. That forces you to re-implement concurrency patterns over and over in an ad hoc fashion.

I think I understand what this ad-hoc concurrency is, but I don’t understand the rest. What would “pruning it back” look like? Can you show how this forces re-implementing concurrency patterns?

Reply ↓
- Eric Niebler on December 2, 2020 at 3:08 pm said:
  Sure. Let’s stick with the example of the when_all algorithm. It is possible to implement when_all over futures, for instance, but not in a way that makes any structural guarantees. Consider code that does the following:
```
auto futureOfTuple = when_all( makeFuture1(), makeFuture2() );
```
  Now, imagine the call to makeFuture1() succeeds. Since futures are handles to work that starts eagerly, the async work of makeFuture1() has already started running. Now we call makeFuture2() and that throws an exception. when_all is not given a chance to execute. The temporary future returned by makeFuture1() will be destroyed, and that async task has “leaked”; that is, it will continue running detached in the background with no way to join it ever. What if it refers to resources that are now going out of scope? What if that task completes exceptionally? Where does the exception go? These are thorny issues for which there are no good answers. These are all problems introduced by the eagerness of the future abstraction.
  
  EDIT: Handling these situations correctly with futures would involve a lot of logic around the call to when_all to catch the exception, detect which tasks were running, request cancellation, and block until they finish. We’ll have to reimplement similar logic wherever we call when_all. What we would rather do is capture that logic in the when_all algorithm itself.
  
  In contrast to futures, coroutines and senders are lazy. You create them, but the work has not been scheduled for execution yet. That makes it possible to make a bunch and pass them to an algorithm like when_all, which can ensure that when they are all launched, they are done so in a way that guarantees that they are all joined on all execution paths.
  Reply ↓
kai liao on March 8, 2023 at 2:24 am said:

I’m a little confused. Can the allocator example be written with executor? How to handle the for loop?

Reply ↓
- Eric Niebler on May 29, 2023 at 10:32 am said:
  
  It’s not hard at all to write a sender-based looping algorithm, and libunifex has several.
  
  Reply ↓
kai liao on March 8, 2023 at 5:21 pm said:

hi, eric.
Can the Manager::allocate() example in your page be written with libunifex? I don’t know how to handle the for-loop in the example. May you give an example of that? or it just cannot be done.

Reply ↓
- Eric Niebler on May 29, 2023 at 9:37 am said:
  
  It can! There are sender-based looping algorithms like repeat_effect and repeat_effect_until.
  
  Reply ↓

Structured Programming and C++

The Trouble With Threads

Unstructured Concurrency

Structured Concurrency

Cancellation

Structured Concurrency > Coroutines

Concurrency

Migration

Summary

28 Replies to “Structured Concurrency”

Leave a Reply Cancel reply