Aussie AI

Chapter 20. Move Semantics

Book Excerpt from "C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations"

by David Spuler, Ph.D.

Chapter 20. Move Semantics

What are Move Semantics?

Whoever invented move semantics deserves the Nobel prize. Move semantics refers to a beautiful and elegant addition to C++ class definitions added in C++11. The syntax is concise and the internal definition is semantically consistent in many ways. But the most beautiful part of move semantics: it’s all about making C++ even faster!

Move semantics were about making C++ more efficient at a very high level. The issues were unnecessary calls to class constructors and copy assignment operators in a number of situations, such as:

Temporary object creation
Returning a class type from a function
Overloaded operator return types

Most of the changes in C++11 that brought in move semantics were done in a way that maintained backward compatibility. The new features available in classes included:

Move constructors
Move assignment operators

Whereas the new special members needed to be added to existing classes, there were also a number of automatic compiler optimizations that were enhanced to take advantage of move semantics:

Copy elision
Return Value Optimization (RVO)
Named Return Value Optimization (NRVO)

Some parts of copy elision rely on move operations, whereas other cases of copy elision and RVO are actually independent of move semantics, and can be used without move special functions. But the optimal choice is to use all of them together.

Copy Elision

Copy elision is an automatic C++ compiler optimization that “elides” (removes) various “copy” operations on objects. I guess “copy removal” just didn’t have the same ring to it?

Copy elision works in particular situations in the C++ language. These situations include:

Class-type return statements — the main situation.
throw expressions (and handlers)
Coroutines

The effect of copy elision is to avoid a full object copy. Instead, the place where the new object is used simply refers to the old object, which would have been copied without this optimization.

Technically, there are other unusual situations, and there are two variants of copy elision:

Removal of copying, or
Downgrading copying to a move operation.

You don’t need to modify your code to get the benefits of copy elision. In fact, you also don’t need to turn the optimizer up to eleven. Copy elision is a normal part of the C++ standard.

Return Value Optimization

Returning an object type is a special case where the old code used to be inefficient. The good news:

Return Value Optimization (RVO) is an automatic compiler optimization.
Nothing you need to do!

Well, actually you do need to declare a move constructor and a move assignment operator to get the full benefits, but you were doing that already, right?

Why was RVO needed? Because return statements used to cause lots of copying of objects. This could be worked-around by declaring a reference object parameter, which was returned back, instead of having an object return type. But that’s inconvenient, and there are also cases where it’s not possible:

Binary operator overloads (non-assignment) — e.g., binary “+” operator.
Unary operator overloads (non-increment/decrement) — e.g., unary “-” operators.
Postfix increment/decrement operators — must return the old object (not the current one).

Operator overloading was one of the most beautiful parts of C++ signatures. Shame that it used to be inefficient, but now it’s not.

Any function can return a class object, rather than a pointer or reference, but the effect is that the function itself needs to declare a local object to be returned. Consider this code:

    MyClass func(int x)
    {
        MyClass ret(x);  // Create object
        return ret;   // Copy object
    }

And then it gets copy constructed again when we call the function:

    MyClass m = func(3);

Move semantics solve this problem, in combination with copy elision. This special case is called Return Value Optimization (RVO), and allows the compiler to do “one-two-skip-a-few” for object copying.

To get even more technical, this situation is called Named Return Value Optimization (NRVO), when a function returns a named local variable (i.e., “ret” here). The non-named version of RVO occurs when the function returns an unnamed object, such as a temporary object created as the result of a construction or operator.

Some types of RVO are implementation-specific and optional for the compiler to do. However, NRVO is “mandated” by the C++17 standard when returning a named local object variable. I guess unnamed RVO will be mandated at some time in the future, too.

RVO is very efficient in that it doesn’t just convert copying to moving, but can in fact avoid the complete creation of temporary variables. The compiler can optimize the above code so that the return statement constructs or moves the object directly into the place where it was called from. This means not only we avoid various copies/moves, but also the avoidance of that temporary object’s constructor and destructor, too.

Moving Multiple Objects

Moving multiple objects arises as an inefficiency in C++ because there’s no multi-move semantics. Some examples where you want to move multiple contiguous objects to a different memory location include:

Move capabilities for a custom multi-object container.
Shuffling objects along in a sorted array on insertion or deletion.
Auto-resizing a std::vector container (bigger or smaller).

There’s no multi-move constructors or assignment operators in the standard C++ language, so there’s only single object moving methods. In practice, you can move multiple objects in various ways, such as:

Moving them one-by-one
std::move(begin, end, dest) overload

Note that this is the std::move overload that does real runtime work, not the simpler version that’s just a type-cast to an R-value reference.

Unfortunately, all of these ideas are calling the move constructors for every single object. This is fine for scalar types or classes with simple inlined versions, but it’s still not optimal.

The workaround for your own class is simply to define a non-special member function to do fast moving, which you can call explicitly. But this doesn’t solve the general problem of using your new class in a container that may need to bulk-move your objects at some point.

Generic Move Operator

Some types of objects are “relocatable” and can used an optimized move method. The basic ideas of move semantics refer to the difference between a “shallow copy” (also called a “bitwise copy” or a “byte copy”) versus a “deep copy”. The basic idea is this:

Copy assignment or constructor — deep copy
Move assignment or constructor — shallow copy

The copy constructor has to make a full copy of every data member of the other object to create a new object. The old object is unchanged.

The move constructor needs to transfer all of the data members from the old object to the new object. And then the old object needs to be “cleared” in some way, which leaves it in a “valid” state (so that its destructor doesn’t crash or deallocate memory it no longer owns). Hence, why not do these steps in general as an optimization:

Shallow move old data members to new object — bitwise copy of all bytes.
Clear old object’s data members — zero the old bytes.

This idea of a relocatable object is similar to the type trait “std::is_trivially_move_constructible” (C++11). However, this isn’t quite what we want, which is a way to specify that our object is relocatable. The type trait instead only detects some cases where this is true. Perhaps we could set this type trait to “true” for our own class, and the standard container classes will honor this type trait setting, but I have my doubts.

Instead, let’s think about generalizing the idea to all relocatable class types. We can even code up the idea:

    template<typename T>
    T& generic_move_assignment_buggy(T& newobj, T& oldobj)
    {
        memcpy(&newobj, &oldobj, sizeof(T));  // Move (bitwise copy)
        memset(&oldobj, 0, sizeof(T));
        return newobj;
    }

Well, that has an aliasing bug if the new and old object are the same. So, let’s fix that first:

    template<typename T>
    T& generic_move_assignment_safer(T& newobj, T& oldobj)
    {
        if (&newobj != &oldobj) {  // Avoid aliasing
                memcpy(&newobj, &oldobj, sizeof(T));  // Move (bitwise copy)
                memset(&oldobj, 0, sizeof(T));
        }
        return newobj;
    }

Does this idea work?

The short answer is: yes and no. Yes, this idea can be used very often, and is efficient.

Let’s look at the good news first. This approach works for all these situations:

Scalar types — moving an integer is a bitwise copy anyway.
Simple object data members — if this move approach also works for the sub-object.
Virtual functions — yes, the hidden “vptr” pointer in the old object is also moved by the bitwise copy.

However, technically the full answer is “no,” because there are some problem areas when using this approach:

1. Self-referring pointer data members.

2. Virtual function problems — vptr is nulled in the old object.

3. Virtual destructor problems — a problematic special case.

4. External pointers into the old object (invalidated).

5. Obscure portability problems with zero byte representations.

Self-referencing data member problems. This is a problem when the object is relying internally on its own address. Self-referring internal pointers (or references) are data members inside the object that point to another part of the object. These are uncommon, and seem like bad programming style anyway.

Note that pointers pointing outside of the object are just fine. In fact, that’s why this copy-and-zero approach is efficient, because we don’t need to copy and reallocate any pointer data members. A bitwise copy of a pointer or reference is still pointing to the right place.

Virtual function problems. The memset() function has cleared every byte to zero, including any of the hidden “vptr” pointers to the virtual function table. If there’s any virtual function in a class, then it has a hidden pointer inside the object. There are also other places that may have another vptr, including:

Base class — but it usually shares a single vptr with the derived class.
Multiple inheritance — requires multiple vptr’s in the object.
Subobject data members — if they are of a class that has its own virtual functions.

If your code calls any of these virtual functions after it’s been nulled, I’m betting against you. Nevertheless, we might be able to work around this by simply not calling any virtual functions after this move sequence.

Virtual function problems. Destructors make it a little more difficult, because it’s hard to stop the C++ compiler from calling them. And every class with any other virtual function is supposed to make its destructor also virtual. Just ask Scott Meyers in the very first edition of his Effective C++ book, which was good advice in the 1990s, and still remains so.

Hence, if our object has a virtual destructor, it may try to access the null vptr at some point. There’s no simple workaround to “just avoid calling the destructor,” since it’s called implicitly.

External pointers into the object. I feel like we can live with this idea. If there are any pointers or references to refer to the old object’s internal data, they are now invalidated. But that’s true anyway, because the whole idea of a moved object is that it’s going away.

All bytes zero portability. There’s a theoretical portability problem when using memset to clear an object to have all its bytes equal to zero. I’m not sure it even applies anymore, as I don’t know of any platform where this is a real problem. The concern is whether clearing all the individual bytes to zero will actually clear multi-byte data to its equivalent zero or null value. In practice, these are all true:

Characters — byte zero is always character zero.
Integers (signed and unsigned) — all bytes zero is integer zero.
Floating-point — all bits zero is floating-point positive zero in the IEEE 754 standard.
Pointers — all bytes zero is the nullptr in any platform I know.

Hence, I’m not sure it’s a real problem, but every book on C++ portability I’ve read has mentioned it, so now I have, too.

Workaround for fast move problems. I hate to give up on a really efficient idea, so we can point to the limitations where we need to ensure:

“Relocatable objects” with no internal pointers or references.
No virtual functions
No virtual destructor

Maybe we can work around the virtual function problems by not clearing the vptr. Here’s the idea:

    memset((char*)&oldobj + sizeof(void*), 0, sizeof(T) - sizeof(void*));

This assumes that there’s only one vptr, and it’s shared by the base class and derived class. Unfortunately, this idea still fails for subobjects with their own virtual functions and multiple inheritance where objects can have more than one hidden vptr. Anyway, it’s a worthy try, and we could always ban virtual functions, which aren’t that efficient anyway!

Multi-move generic function. This idea can be generalized to moving a contiguous array of multiple objects at once. The need for such a “multi-move” capability is less often required, but can arise when containers resize, and we also need it to implement sorted array insertions and deletions.

The above “generic” version only works for one object. Let’s think about generalizing the idea of bytewise moves and then clearing to zero. Here are some thoughts:

1. The idea still generally works on a mult-object block, because it’s similar to moving one object at a time.

2. Overlapping ranges of objects are a problem, because the memset will wrongly clear some of the newly moved objects.

Amusingly, note that we did deal with the “overlapping blocks” problem in the single-object generic move. It’s the same as the “aliasing” check!

Detecting overlapping ranges more generally is a bit more intricate to code. Here’s my attempt at updating the generic move method to support multiple objects:

    template<typename T>
    T& generic_multimove_assignment(T * destarr, T* srcarr, int n)
    {
        if (destarr == srcarr) {  // Same exact block
            // Nothing to do
        }
        else {
            T* enddest = destarr + n;
            T* endsrc = srcarr + n;
            if (enddest > src && enddest < endsrc) {
                // Overlapping (moving left)
                memmove(destarr, srcarr, n * sizeof(T));  // Move (overlapping safely)
                int num_overlap = enddest - src;  // Pointer arithmetic
                memset(enddest, 0, (n - num_overlap) * sizeof(T));  // Clear non-overlapping part
            }
            else if (endsrc > dest && endsrc < enddest) {
                // Overlapping (moving right)
                memmove(destarr, srcarr, n * sizeof(T));  // Move (overlapping safely)
                int num_overlap = endsrc - dest;  // Pointer arithmetic
                memset(src, 0, (n - num_overlap) * sizeof(T));  // Clear non-overlapping part
            }
            else {
                // Non-overlapping blocks
                memcpy(destarr, srcarr, n * sizeof(T));  // Move all (bitwise copy)
                memset(srcarr, 0, n * sizeof(T));  // Clear old block
            }
        }
        return newobj;
    }

Compiler support? Even with the restrictions to scalar and relocatable objects, and other problems listed above, this idea of just moving memory blocks around is so efficient that maybe the compiler should provide this as an option automatically? Is this the default assignment operator? No, not quite, because the default move constructor or assignment operator is a “member-wise move” of all of the data members. This is the same as a bitwise move if all data members are trivial, but any complex classes as subobjects will need their own move constructors called.

I like this whole idea a lot more than the normal move member functions, where you have to fiddle endlessly with every single data member. Come on, the single object version is only two statements! Hence, I’m hereby recommending to the standards committee that, like the “=default” specifier, there needs to be a new “=fast” specifier added to the C++26 language.

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: C++ Ultra-Low Latency

C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:

Low-level C++ efficiency techniques
C++ multithreading optimizations
AI LLM inference backend speedups
Low latency data structures
Multithreading optimizations
General C++ optimizations

Get your copy from Amazon: C++ Ultra-Low Latency

Aussie AI

Chapter 20. Move Semantics