Customization Point Design in C++11 and Beyond

(Disclaimer: here be esoteric language wonkery. Abandon all hope.)

If you read or write Generic-code-with-a-capitol-‘G’, you may have written or seen code like this:

using std::swap;
swap( a, b );

The first line brings std::swap into consideration, and the second makes an unqualified call to the swap function. I call this the “Std Swap Two-Step”.

Why do we do the Two-Step? It has to do with the decidedly wonky way C++ resolves function calls in templates. I won’t fully elaborate on two-phase name lookup (you’re welcome), but suffice it to say that we do it for the sake of genericity: We make an unqualified call to swap in order to find an overload that might be defined in a and b‘s associated namespaces (itself a rich topic), and we do using std::swap so that, on the off-chance that there is no such overload, we find the default version defined in the std namespace.

We call swap and functions like it customization points — hooks used by generic code that end-users can specialize to customize the behavior for their types.

Are there other standard customization points? You bet. When you use a range-based for loop, the compiler inserts calls to unqualified begin and end to get the bounds of the range. So begin and end are customization points. Depending on how you read the specification of std::reverse, iter_swap may also be a customization point. (I think it is, or that it should be.) And other customization points may be in the offing. Proposal N4155, proposes non-member size for fetching the size of a range, and my own N4128 will propose size as a customization point as well.

Trouble With The Two-Step

Have you seen code that makes qualified calls to swap in a template, like std::swap( a, b ); ? Congratulation, you have probably found a bug. If the type of a and b overloads swap in its own namespace, a qualified call to std::swap won’t find it. It’s an all-too-easy mistake to make.

The problem with the Two-Step is that is forces users to type more to do the right thing. FAIL. Most damning, it requires users to either blindly memorize and regurgitate the Two-Step pattern, or worse: understand two-phase name lookup in templates.

Through With The Two-Step

We need a better solution for the design of customization points in C++. In my own range library, I gave a great deal of thought to the problem, and I think I have an answer. Below is how I would like to see a future version of the Standard Library define std::begin, to pick an example at random. I explain it after the break:

namespace std
{
  namespace __detail
  {
    // define begin for arrays
    template<class T, size_t N>
    constexpr T* begin(T (&a)[N]) noexcept
    {
      return a;
    }

    // Define begin for containers
    // (trailing return type needed for SFINAE)
    template<class _RangeLike>
    constexpr auto begin(_RangeLike && rng) ->
      decltype(forward<_RangeLike>(rng).begin())
    {
      return forward<_RangeLike>(rng).begin();
    }

    struct __begin_fn
    {
      template<class R>
      constexpr auto operator()(R && rng) const ->
        decltype(begin(forward<R>(rng)))
      {
        return begin(forward<R>(rng));
      }
    };
  }

  // To avoid ODR violations:
  template<class T>
  struct __static_const
  {
    static constexpr T value{};
  };

  template<class T>
  constexpr T __static_const<T>::value;

  // std::begin is a global function object!
  namespace
  {
    constexpr auto const & begin =
        __static_const<__detail::__begin_fn>::value;
  }
}

Let’s break this down. First, we define a couple of begin free functions in a std::__detail namespace. These overloads handle array types and range-like things with .begin() member functions. (Think of the standard containers.)

Next, we define a __begin_fn class with an overloaded function call operator in the std::__detail namespace that returns the result of making an unqualified call to begin. At this point in the source code, the name begin refers to a function overload set.

Finally, we define a std::begin object of type std::__detail::__begin_fn in a round-about sort of way, the details of which are not too relevant. The important bit is that std::begin is a function object.

Implementers of range-like types can hook this customization point the same way they always have: by defining a begin free function in their type’s associated namespace. See below:

namespace NS {
  struct S {};
  int * begin( S & s );
}

int main() {
  NS::S s;
  int *p = std::begin(s); // calls NS::begin(s)
}

Function Objects and Customization Points

Argument-dependent lookup and customization points are a match made in heaven. But argument-dependent lookup is only done for free functions, and my std::begin is a function object. Argument-dependent lookup isn’t done for function objects. What’s going on?

The short answer is that the std::begin function object is doing the Two-Step so that you don’t have to. If std::begin were defined this way, you could just make qualified calls to std::begin and the right thing would happen. You could also do the Two-Step, bringing std::begin into scope with a using declaration, and calling it unqualified, and get the same behavior. Either way, if there is a begin free function defined in the argument’s associated namespace, it will get used.

A subtle but important point is that, if you do the Two-Step, the call is still routed through the std::begin function object. I mean that in the code below:

using std::begin;
begin( v );

…if std::begin were an object instead of a function, then what looks like an unqualified function call is not; it’s a call to std::__detail::__begin_fn‘s overloaded function call operator. Think of this as the Generic equivalent of the Gang of Four‘s Template method pattern:

In software engineering, the template method pattern is a behavioral design pattern that defines the program skeleton of an algorithm in a method, called template method, which defers some steps to subclasses. It lets one redefine certain steps of an algorithm without changing the algorithm’s structure. This use of “template” is unrelated to C++ templates.
— Wikipedia

In this case, the “algorithm” is std::begin, and the certain steps that users can redefine is begin. What’s the point, you ask? We can do extra parameter checking in std::begin. Read on.

Customization Points and Concepts Lite

Customization points are scary in a way. In today’s language, if you define a free function called swap, it better do what the Standard Library expects swap to do. Otherwise, all hell breaks loose in the standard algorithms. Likewise, you can shoot yourself if you define a begin or end free function that doesn’t return iterators. So the Standard Library has laid claim to those names globally. That’s why customization points are such a concern for the standardization committee; the more we add, the more names we reserve globally, and the bigger the potential problem gets for the users.

Enter Concepts Lite. With Concepts Lite, we can constrain our customization points to only work with the types that model certain concepts. For instance, it should be an error to call std::begin on something that doesn’t look like a range, don’tcha think? With Concepts Lite and global function objects, we can have that. We can define std::__detail::__begin_fn like this:

// A _RangeLike is something we can call begin(r)
// and end(r) on:
concept _RangeLike<class T> =
  requires(T t) {
    typename IteratorType<T>;
    { begin(t) } -> IteratorType<T>;
    { end(t) } -> IteratorType<T>;
    requires Iterator<IteratorType<T>>;
  };

  struct __begin_fn
  {
    // LOOK! R must be _RangeLike!
    template< _RangeLike R >
    constexpr auto operator()(R && rng) const ->
      decltype(begin(forward<R>(rng)))
    {
      return begin(forward<R>(rng));
    }
  };

First we define the _RangeLike concept as something on which we can call begin and end, such that they both return iterators of the same type. (Or, if you agree with N4128, different types that are comparable.) Then we use the _RangeLike concept to constrain __begin_fn::operator() and by extension std::begin. Now std::begin won’t compile for things that aren’t sufficiently range-like, which makes it safer to lay claim to a common identifier like begin.

If std::begin is a function object as opposed to a free function, it’s not easy to route around this concept checking. Code that does the Two-Step won’t accidentally hijack some unrelated begin function in some random namespace. It will always resolve to std::begin, which will politely reject invalid code.

You don’t have to wait for Concepts Lite to reap the benefits, either. See my post on emulating Concepts Lite in C++11.

Summary

What’s all this mean? Simply:

  • Users could just call std::begin and it would do ADL for them.
  • std::begin(rng) wouldn’t compile unless:
    • it returns an iterator, and
    • std::end(rng) also compiles and returns an iterator of the same type.
  • Code that does using std::begin; begin(rng); isn’t going to dispatch to some random begin function unless the argument satisfies the constraints of std::begin.

More generally, there is a design pattern that we can use to make safe and convenient customization points. If you’re writing a generic library with customization points, I recommend using this pattern.

Addendum: An Ode to Global Function Objects

We get an additional benefit from making std::begin a global function object:

  • You can pass std::begin as an argument to higher-order functions.

That’s a benefit of function objects over free functions in general, and it’s why I generally prefer global function objects over free functions these days (except when I’m defining customization points). Defining global function objects is more work, but it has the nice effect of turning off argument-dependent lookup, which really only makes sense for operator overloads and customization points. First-order functions rule. ADL sucks (except in the few places where it’s awesome).

Update

A quick note about generic lambdas, since I’ve gotten questions. In C++14, we can define polymorphic function objects very concisely using generic lambdas. So can we use lambdas to define global function objects and save some typing, as below:

// Better?
constexpr auto begin = [](auto && rng) {
  using __detail::begin;
  return begin(forward<decltype(rng)>(rng));
};

The answer, sadly, is no for a host of reasons:

  1. Lambdas do not have constexpr constructors.
  2. I don’t know how to solve the ODR problem for lambdas. If std::begin were defined this way, then each translation unit would see different std::begin objects at different addresses. In theory, that could cause problems.
  3. I don’t know how to constrain a generic lambda.
  4. With automatic return type deduction, invalid calls to begin cause a hard error rather than being SFINAE’ed away. That might not be a huge problem for std::begin, but it most certainly is a huge problem for std::__detail::begin. The begin overloads found by ADL must use SFINAE (or concept checks); otherwise, you would end up trying to call .begin() on some object that doesn’t have a .begin() member function.

In short, even in C++14, I think we need the ugly hackery I show. Maybe C++17 will bring relief.

"\e"

"\e"

44 Replies to “Customization Point Design in C++11 and Beyond”

  1. I’m surprised you’ve gone through those __static_const hoops. I understand your suggestion to use functors has raised concerns regarding ODR, but I can’t say I’ve found them compelling. Since those functors are meant to be empty types, their value representation is made up of 0 bytes. That makes quite hard it for, say, an inline function in a header to do anything untoward.

    Sure, it’s possible to notice different addresses if you go out of your way to do it. The thing is, use of empty functors is not terribly novel in itself (to pass to higher-order algorithms and so on), and we’ve never needed all references to e.g. boost::range_detail::reverse_forwarder to refer to the same object (which, I feel the need to reiterate, is empty and as good as any other object of the same type). Nor, putting functors aside, have we had that need for e.g. std::placeholders::_1. That’s not to mention all the upcoming variable templates giving way to e.g. std::is_lvalue_reference_v (as per STL’s n3932). None of those had concerns raised about ODR, so what gives? Why the backlash against internal linkage const *now*? Perhaps I’ve missed something, in which case I would be thankful for an explanation.

    I don’t want my (somewhat fixated) tangent to be too disheartening though: if I have nothing to say on the actual matter at hand, it’s because I find your arguments ironclad. I feel that if you had mentioned the other solutions we’ve been using so far (e.g. boost::range::begin as an ADL-aware forwarder to range_begin), the advantages of yours would have stood out more, too.

    • Variable templates won’t suffer from the ODR problem since they’re templates. The compiler guarantees that all the symbols resolve to the same instantiation. With non-template global objects, though, the story is different. Every translation unit would get its own copy, and addresses would be different.

      Is it a problem worth solving? I really don’t know. A few extra keystrokes solves the problem, so why not? That’s my reasoning.

      • I see. Come C++14 then a variable template version of __static_const (or just __const or constant then!) of the same trick would probably reach my own threshold of what makes for a few extra keystrokes.

  2. As somebody pointed out on reddit: People are allowed to add template specializations of these functions. So you couldn’t do this for std::swap because somebody has probably specialized that function, and replacing it with an object would break backwards compatibility.

    That being said the solution is easy: Add another layer. So std::swap is a function which then calls a function object which then calls ADL swap. (see? easy)

    I actually use a system somewhat similar to that in my format_it library: http://probablydance.com/2014/08/30/format_it-iterator-based-string-formatting/
    I came up with that though because I needed both partial template specializations and ADL, so I had to have two layers.

    • Howard Hinnant also answered that question on stackoverflow here, where (in an update) he explains why specializing function templates in the std namespace is not the “right way” to hook these customization points. It’s true that code that specializes swap in the std namespace would not survive this transition. I’m not suggesting we change the current version. I’m floating this as a possible solution for future versions of the standard should we ever decide to break backward compatibility, and also for the design of any new customization points.

      As for the add-another-layer-of-indirection solution, I would strongly discourage that. The reason why is because code that does the Two-Step must find the function object first. Otherwise, ADL is not disabled and you’ve reintroduced the name hijacking problem.

  3. Pingback: Points de variation et ADL | C++ | Boost — Développement

  4. Quick note: I think that for the concepts lite example you don´t need the trailing return type anymore, because the _RangeLike concept will refuse non-candidates, so there is no need for SFINAE.

    Interesting post btw. But I can´t still figure out why you used an anonymous namespace for the function object.

    • Quick note: I think that for the concepts lite example you don´t need the trailing return type anymore, because the _RangeLike concept will refuse non-candidates, so there is no need for SFINAE.

      Good point.

      Interesting post btw. But I can´t still figure out why you used an anonymous namespace for the function object.

      See my reply above. It’s to prevent a linker error.

  5. I know you say “judge me on my C++, not my wordpress” but going from reading against a black background to white background like that hurts my eyes.

    As such, I had to give up reading due to visual fatigue. If you could change the colour scheme to something more eye-friendly I’d be happy to give this a thorough read.

    • Funny. Reading against a light background makes my eyes tired. I like the dark theme.

      EDIT: I tried switching to a dark theme for the source code. I don’t love it, but maybe it’s better considering my blog’s overall theme is dark.

      • Keep it like this, please, my eyes rest with your blog. I even have a plugin installed in chrome to reverse the bright white color that many pages show.

  6. Looks cool. I definitely want to check if I’m doing that swap/two step correctly. OTOH, seeing that decltype( FnType(…)){ return } offends my DRY nerves. IIRC you need the same jazz when defining a custom swap, it really needs fixing!

  7. Pingback: 在C++中实现Python的切片 - memleak.in | memleak.in

  8. Consider say, a near-future case – std::size.
    How do you think your technique compares to just having std::size(x) call std_size(x). ie have std_size be the customization point, and basically reserving all std_ names?

    I think std_size is easier and maybe even more clear/explicit – makes it explicit that I am customizing a std customization point. (I do think it is a bit “ugly” however.)

    I like making std::size a function object, but that is orthogonal to what free function it calls, I think.

    Either way, I really want to solve this problem somehow, before we add yet more customization points.

    • Sure, I considered that. That risks some confusion. Folks will really want to define size for their types, both out of ignorance about how to hook the customization point, and because they will want to have a pithy way to get the size in their namespace. Then some people will call std::size directly and not get the overloaded user-defined size, or else they’ll do the Two Step and they will get the user-defined size. Worse, when they do the Two Step, they risk picking up size overloads they shouldn’t.

      • I don’t think I misunderstood, though I skipped a bunch of reasoning.

        If you have a function N::func and an enum N::E, then how do you customise N::func(N::E)?

        Your technique works for things like std::begin operating on std::vector, because your unqualified call invokes your specialisation for ranges which ends up calling the std::vector::begin member function.

        But the unqualified call is intended to allow ADL to find any customisation with the right signature. I could define a container like this:

        namespace X {
        class MyContainer {

        // no begin() member here

        };

        MyContainer::MyIterator begin(MyContainer& c) { … }
        }

        … and a call to std::begin(instanceOfMyContainer) with your function object would work fine.

        But while this works for X::MyContainer, I couldn’t define std::vector in this way, because writing:

        namespace std {
        template <typename… Args>
        typename vector<Args…>::iterator begin(vector<Args…>& c) { … }
        }

        … would clash with the std::begin object, as that function can’t overload the object.

        You could use the Barton-Nackman trick to allow ADL to work while still having your std::begin object… in theory anyway; GCC 4.9 didn’t like it when I tried.

        Now, all this is irrelevant for std::begin and std::vector because they are simply not defined that way anyway.

        But in the general case, if you wanted to have a central N::func as a function object which provides some default behaviour but is implemented as an unqualified func(T&) call to allow for customisation, then how would this work for types inside N which need non-default behaviour?

        I mentioned enums because they don’t have member functions nor support Barton-Nackman and so are only customisable via an unqualified ADL lookup.

        Imagine std::hash was to be specified as a function object. How would I customise it for std::memory_order?

        (hope it’s now clearer what I’m getting at)

        • I understand your question now, and it’s a good one. There are two ways to deal with this. One would be to put the enum and its overload of the ADL customization point in a nested namespace, and then bring the enum into the outer namespace with a using declaration.

          The second, cuter way is to achieve the same effect with anonymous namespaces. See the code below:

          #include <cstdio>
          
          template<typename T>
          struct static_const
          {
            static constexpr T value{};
          };
          template<typename T>
          constexpr T static_const<T>::value;
          
          namespace My
          {
            namespace detail
            {
              template<typename T>
              int fun(T t)
              {
                std::printf("In default fun\n");
                return 42;
              }
              struct fun_fn
              {
                template<typename T>
                auto operator()(T t) const ->
                  decltype(fun(t))
                {
                  std::printf("In customization point\n");
                  return fun(t);
                }
              };
            }
          
            namespace
            {
              constexpr auto && fun =
                static_const<detail::fun_fn>::value;
            }
          
            namespace {
              namespace
              {
                enum Enum { One = 1, Two, Three };
                int fun(Enum e)
                {
                  std::printf("In specialized fun\n");
                  return (int) e;
                }
              }
            }
          }
          
          int main()
          {
            My::Enum e = My::One;
            My::fun(e);
          }
          

          For me, this prints:

          In customization point
          In specialized fun
          

          Hope this helps.

          • Would placing the enum/name in the nested unamed namespace, make this declaration declare different entity in every translation unit where the file is inluded? As far as I know unamed namespace is equivalent with namespace with unique identifier. So if have example enum declaration in header file and two source files A and B then the compiler will create different version (not compatible) of this enums. And this will cause a lot of problems if this type is used as function parameter.

          • Right. If this type needs external linkage, then an anonymous namespace is the wrong answer. You would need to use the named namespace with the using declaration.

  9. “If the type of a and b overloads swap in its own namespace, a qualified call to std::swap won’t find it. It’s an all-too-easy mistake to make.”

    And that’s why I always inserted my customizations into std namespace. I couldn’t even think of putting it into my namespace, why would I? If anything, just create a new namespace “customizations” and put it like this:

    namespace std {
    using namespace customizations_of_something;
    }

    • That doesn’t work reliably, but you need to understand 2-phase look-up to know why. When parsing the body of a function template like std::sort, when the compiler sees an unqualified call to swap it does normal lookup to find candidates. This is in phase 1, before the types of template parameters are known. Only those functions that have been seen so far (reading the file top to bottom) and that are in scope are added to the candidate set. In phase 2, once the types of the function arguments are known, the function is looked up again in the arguments’ associated namespaces (i.e., your namespaces).

      By putting your overloads in namespace std, you are hoping that phase 1 lookup will find them. But that is order dependent, since the compiler parses code in a linear fashion. If phase 1 happens not to see your overload before it sees the call to swap, your overload will not be considered. (Unless by chance namespace std is an associated namespace of your type, in which case phase 2 will find it.)

      TL;DR: Dont’ do that.

      • I always put my std::swap (or actually std::to_string specialization) in the same header as my class, so when it needs to be called it should be always visible. Is there any way this scenario could fail? (BTW, gcc gives errors like “specialization of after instantiation”, so I assume that such errors don’t happen to me).

        • (side note: your commenting system ate the word from my comment, is that by design? I wonder if it shows up here…)

        • If you still think this is OK, you haven’t understood me. Let me try again. Let’s say you have a header Foo.h that defines class Foo and an overload of swap in namespace std. Now someone does:

          #include <algorithm>
          #include "Foo.h"
          

          … after which they try to std::sort an array of Foo. The definition is sort is found in <algorithm>. The body of sort was parsed before the compiler saw your swap overload, so that overload isn’t considered by 1st phase lookup. And since there is no swap in Foo‘s associated namespace, 2nd phase lookup won’t find it either. Again, DON’T DO THIS.

          You will only get “specialization after instantiation” errors if you are actually specializing the primary template. Chances are, you’re overloading the function, not specializing the template. You’ll get no warning until your code stops compiling or silently does the wrong thing.

  10. Consider this:

    namespace NS {
    struct S {};
    int * begin( S & s )
    {
    int* it = std::begin(); // Use the “default” version
    return ++it;
    }
    }

    int main() {
    NS::S s;
    int *p = std::begin(s); // calls NS::begin(s)
    }

    That particular example is a bit useless, but it’s a common technique to relying on the default version of an algorithm, and then, personalized it with the returned results. For example, an iter_swap overload which increments a counter or log something as side effect, to see how many times your iter_swap method has been called.

    With your approach, overload for calling the “built-in” implementation and do something else would be impossible to do (unless you make public and with a different name the default version).

Leave a Reply

Your email address will not be published. Required fields are marked *

*