Aussie AI

Chapter 19. Modern C++ Containers

Book Excerpt from "C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations"

by David Spuler, Ph.D.

Chapter 19. Modern C++ Containers

Standard C++ Containers

Contiguous data containers. The general-purpose containers with contiguous data are called “sequence containers” and include several that are well-known and often used:

std::string — dynamic character arrays.
std::vector — dynamic everything arrays.
std::array — static fixed-size arrays.
std::bitset — fast bit vectors.

Associative containers and sets. The associative key-value data structures are more commonly called a “map,” “dictionary,” or “symbol table” design pattern. Note that the “set membership” idiom is usually very similar to the associative containers, because the search is the same, but the sets don’t have a payload at the end. The main types of modern C++ containers for searching include the choice between two main types of underlying data structures:

Red-black balanced binary trees — logarithmic complexity for search, insert and delete.
Hash tables (with chaining) — constant-time average complexity (fast!), but linear worst-case (slow!).

The containers include these red-black tree versions:

std::map — key-value lookup (dictionary idiom).
std::set — key-only set membership lookup.

And these are the hash tables (my favorite data structure!):

std::unordered_map — dictionary hash table for key-value pairs.
std::unordered_set — hash table for set membership.

There are also variants that allow duplicates, which means multiple copies of the same key stored separately in the container. Examples include:

std::multiset
std::multimap
std::unordered_multiset
std::unordered_multimap

Linked list containers. Some of the containers to manage data in dynamically-allocated linked lists include:

std::list — double-linked list
std::forward_list — singly-linked list

Note that the hash table containers (e.g., std::unordered_map) also belong on this list because they use “chaining” for collision resolution. This approach effectively hangs linked lists off every bucket of the hash table.

Sorted “flat” containers. There are some newer containers in C++23 that are “flat” in the sense that they maintain data in sorted order. These classes include:

std::flat_set
std::flat_map
std::flat_multiset
std::flat_multimap

Special semantics containers. Some of the general-purpose containers with different semantics to searching include:

std::stack — dynamic FIFO structure.
std::queue — queue data structure (single-ended).
std::dequeue — double-ended queue.
std::priority_queue — implements the “heap” data structure.

View containers. The various types of “view” containers include:

std::string_view
std::span
std::mdspan — multidimensional view class.

Bit-level data structures. Modern C++ supports both class libraries and utility functions for a variety of low-level bit manipulation tasks. Some examples include:

std::bitset
Bit manipulation utilities in <bit>

Small utility data structures. Some of the more generic types of “mini-data structures” include:

std::pair
std::tuple
Ranges
std::optional
Permutations

Multithreading data structures. Parallel coding with synchronization and locking is supported in modern C++ with libraries such as:

std::thread
std::mutex
std::lock
std::condition_variable
std::atomic
std::latch
std::barrier

And that’s not the full list of primitives available in the concurrency library. Many of these multithreading capabilities have been available since C++11.

Upcoming C++26 containers. Some of the upcoming containers include:

std::hive (C++26)
std::inplace_vector (C++26)

What’s missing? I feel ungrateful to even be writing this list, given the amazing amount of work that’s gone into coding up all the above data structures in the standard C++ library. Nevertheless, some of my favorites aren’t on the list yet! Data structures that are missing from the standard C++ containers library include:

Sorted array — indirectly supported only (e.g., std::sort).
Tries — 26-way tree for storing text keys based on letters.
B-tree — multi-way tree data structure good for disk storage.
Graphs — depth-first search, breadth-first search, topological sort.
Tri-state Boolean — indirectly supported via std::optional.

General Container Optimizations

Containers have a lot of commonalities in their performance patterns. Some general comments apply to multiple types of container classes, and making them run faster. Consider the following when implementing the usage patterns of your containers:

Choose an initial size — avoid container auto-resizing slowdowns.
Minimize insertions and deletions — yeah, right, those actions are why we use containers!
Auto-resizing of containers — watch out for silent slugs!
Remove all elements with clear() rather than a loop.
Container destruction can be slow.

Choose your containers wisely:

Prefer hash tables when you need fast searching (e.g., std::unordered_map).
Don’t use a key-value associative container if you only need a set.
Consider whether you need sorted or unsorted scanning of all elements.
Prefer contiguous memory containers — std::array and std::vector have good cache locality.

Optimizations in relation to the types of data to use in containers:

Choose scalar types — objects have more risks of slowdowns from calls to constructors, destructors, move operators, etc.
Prefer integer keys — faster than std::string or char* in key-value pairs.
Reduce the sizes of keys and values — minimizes overall container memory size and improves cache locality.

Choosing Containers

This should be a short section because it’s very easy: use std::vector or std::unordered_map, and forget the rest. Oh, maybe std::queue and std::stack if you must.

I’m only half joking, because there are two things that you often want to do quickly:

Scanning — std::vector is an array with contiguous data (cache locality).
Searching — std::unordered_map is a hash table with O(1) average complexity for search, insert, and delete.

So, that’s covered most of the basic data processing requirements. You’re either scanning through a set of data to work on it repeatedly. Or you’re looking a key up in a dictionary, so you need search to be fast.

What about the other dynamic classes? Somebody’s spent a whole lot of time on them, so surely they’re useful for something?

There are situations where you might want to consider alternatives to arrays and hash tables. For example, there’s std::map, which uses red-black trees and has logarithmic complexity for searching, inserting and deletion. But this is not as good as O(1) of a hash table. The situations where a hash table might not be the best include:

Scanning sorted data — neither std::vector nor std::unordered_map are good at this.
Real-time latency-critical situations — where the worst-case linear performance of searching a hash table is too risky.

But if you ask me, you can still use only arrays and hash tables in combination. Hash tables aren’t great at scanning because it’s a non-contiguous linked list scan. Here’s a funny thought:

1. Insert repeatedly into the hash table, and then

2. Linearize the hash table in an array.

Those are your main trade-offs. Beyond that, if you’re only searching a set of keys, but don’t need to map the key to any other data, then use a set rather than a dictionary (officially called an “associative container”). There’s a (slow) red-black binary tree in std::set, but fortunately there’s a (faster) hash table for that called std::unordered_set.

Linearizing Containers

One common optimization is to perform some “preprocessing” before doing a lot of sequential processing of the data. This applies when the startup does a lot of insertions, but the main processing is mostly about scanning the data. In this case, we can switch to a linearized version of a dynamic container for faster scanning. Here’s example code for linearizing a linked list:

    // Linearize linked list to vector
    std::list<int> mylist;
    std::vector<int> vec;
    // ....
    int n = mylist.size();
    vec.reserve(n);
    for (auto& iter : mylist) {
        vec.push_back(iter);
    }

This code to linearize is not particularly efficient, because it’s forced to linearly scan the linked list, and then insert into the vector one-at-a-time. However, I can’t see a way to do a bulk-insert out of a linked list.

As an alternative, if we no longer needed the linked list version, we could use the merge() member function (C++17) to transfer items from the list container to the vector. This is particularly effective because merge() changes the internal container pointers, but doesn’t call any copy or move methods.

Changing Containers

Another idea is to convert our insertion-friendly container to one that’s best for fast searches. One idea that goes from binary trees to hash tables is this:

Handle the insertion phase with std::map — logarithmic insertion complexity with red-black trees.
Convert to std::unordered_map (hash table) for faster searches.

Note that C++17 has the std::merge() member functions for splicing one container into another. There’s also extract() to remove a single item. Note that these routines don’t move or copy any user data, but only update container internal pointers. This avoids the need for erasing data from one container and re-inserting all the data into the other container.

On the other hand, hash tables also have fast insertion with constant time on average, which is better than logarithmic (on average), so why do we need the red-black trees at all? One reason is that hash tables can degrade to linear performance in the worst case. Another reason is that the trees are good at fast processing of the data in sorted order, whereas hash tables have unsorted data.

Maybe we should do the reverse, handling insertions with our hash table, and then converting to a red-black tree for scanning in sorted order. No, not really. If we want sorted scanning of data, we’d probably do better to export the hash table to a std::array or std::vector, and then use std::sort() on the array or vector.

So many choices, so little time!

Useful Member Functions

Optimizing containers is about choosing the best one for your requirements, and then making the best usage of the interfaces that are provided. You don’t need to write your own if you can do better with the standard containers.

Memory management of the various containers can be optimized in a number of ways. Firstly, you can consider things like whether the container is “full” and what “capacity” it has. The main member functions include:

size() — number of elements in the data structure.
capacity() — maximum allowed with current memory.
reserve() — request an amount of memory.
resize() — reorganize to a bigger or smaller size.
clear() — quickly remove all elements.
shrink_to_fit() — request a smaller memory size.

The hash table containers, such as std::unordered_map, also have member functions to control the number of buckets and the resizing policies:

bucket_count() — size of the hash table array.
bucket_size() — length of a chain at an index.
load_factor() — number of keys divided by hash table size.
max_load_factor() — read or set the load factor that will trigger a rehash.
rehash() — manually trigger a hash table size change and rehash (at your discretion).

You can use these member functions to track how effective the hash table is performing. This also allows taking control of the policy of when it will auto-resize and rehash into a bigger hash table with more buckets.

These C++17 member functions are useful sometimes for removing or moving multiple elements in a container:

extract() — pulls a node out of the container data structure.
merge() — efficiently combines two containers.

Hidden Auto-Resize Slugs

The auto-resizing capabilities of many C++ containers makes them dynamic and easy to use. However, it also hides a common efficiency that has existed since the earliest days of the STL: hidden calls to special functions. In fact, there are multiple reasons that you might want to avoid container auto-resizing:

Slow performance — every object might get moved.
Iterator invalidation — all objects could be at new addresses.

Auto-resizing of a container is probably something you want to avoid for performance reasons. In the worst case, it can trigger a significant delay when you’re inserting into a container. The cost of an auto-resize may include:

Memory allocation — e.g., allocating a new memory block or a hash table array.
Move assignment calls — not for all container classes.
Re-hashing — re-computing this for all the objects.

Note that some containers will call the move assignment operators, whereas others will resize the container without actually putting the stored objects in new locations. Here’s how it works for some:

std::vector — calls move assignments if the allocated block changes.
std::unordered_map — zero move operator calls.

The situation with the hash table is complex, but basically it moved internal pointers around, but not your objects. The hash container doesn’t need to move the objects inside the nodes on the chained linked list, so doesn’t call move operators for the user’s objects on those nodes. However, it does have to do other container-internal computations:

Re-compute the hash function for every node’s key, and
Re-attach the node to a different chained linked list.

There’s no overall mechanism to control the resizing properties of all containers, but we can use various different methods. The main solutions are:

Reserve maximum memory, or
Manually manage the resizing process.

Initialization with maximum size. The first idea for avoiding auto-resizing is to guess the maximum number of elements we could possibly need to store in the containers, and call the reserve() function at the definition of the container object. For example, the code could be:

    std::vector<int> v;
    v.reserve(1000);

But not this, which will run 1,000 default constructors in a vector of non-scalar type:

    v.resize(1000);  // Slow!

And this also would create 1,000 new objects and run their constructors:

    std::vector<int> v(1000);  // Slug!

This reservation of memory is a type of “preallocation” optimization. We ensure that all memory that could be required is allocated during the initialization phase, which ensures that no memory allocations are performed later in the hotpath.

Detecting auto-resizing. Alternatively, we can detect when an insertion is likely to trigger an auto-resize. The standard container interfaces allow us to know this, before we do an insertion:

    if (v.size() + 1 > v.capacity()) {
        // Resizing likely on insertion!
    }

Unfortunately, there’s not a lot that we can do in this situation. I mean, we could just “not insert” as a strategy, but that doesn’t sound great.

Deferring container auto-resizing. Alternatively, we could detect the situation 10 insertions ahead of time, still insert the single item, and then do something later to manage the resizing, perhaps in a lower-priority thread.

    const int n_lookahead = 10;
    if (v.size() + n_lookahead > v.capacity()) {
        // Resizing will be soon!
    }

In classes that are more dynamic than std::vector, such as std::unordered_map, we can defer the auto-resizing to a more convenient time. This is only possible for the dynamic classes based on linked lists or binary trees. Note that the hash table classes actually used linked lists, because of linear chaining as the collision resolution mechanism.

We can initialize the hash table to a particular size in the constructor. The bucket count is an optional integer parameter to the constructor.

    std::unordered_map<std::string, int> hmap(1000);

This only works well if we know the maximum size that we need. For more dynamic handling, we can also use the bucket management functions in the std::unordered_map interface to detect when the hash table is getting full, and take appropriate action.

The “load factor” is the number of elements stored in the container, divided by the hash table array size (i.e., the number of “buckets”). There’s no target load factor in the standard definition, but an implementation will typically aim for a load factor around 0.5 to 1.0. The container implementation also has a “maximum load factor” that will trigger a rehash into a bigger hash table when it’s exceeded.

When the load factor is near the maximum value, this means the class will soon be increasing the hash table size, and possibly re-hashing every single element. Here’s the idea coded up:

    // Detect rehash risk
    std::unordered_map<int,string> h;
    int n_lookahead = 10;
    float load_estimate = (h.size() + n_lookahead) / (float) h.bucket_count();
    if (load_estimate >= h.max_load_factor()) {
        // Rehash is likely!
    }

In the case of a hash table, we can actually ensure that it won’t rehash by manipulating the maximum load factor setting. The max_load_factor method has overloads allowing us to both get and set the value. Hence, a solution that defers rehashing: increase the maximum load factor setting, insert our new object, and then reset the maximum load factor:

    float old_load_factor = h.max_load_factor();
    h.max_load_factor(old_load_factor * 2.0f);  // Avoid rehash
    h.insert({ x, s });  // Insert the object without fear!
    h.max_load_factor(old_load_factor);  // reset
}

Note that we have to be careful, lest we introduce another hidden slug: never-resizing our hash tables. Don’t defer it forever!

If you forget to ever rehash your hash table, it won’t crash, but becomes a hidden slowdown. The use of chaining means that the standard hash table containers won’t fail if they never get auto-resized, but they will degrade to the linear performance of a linked list for all operations.

Hand-Coding Containers

The standard containers are elegant and beautiful, but they are designed to be very general. Hence, they can sometimes be slower than you could achieve on your own. Some of the problems with standard container performance include:

Too many allocations and deallocations with new and delete.
Non-contiguous storage in dynamic containers (e.g., linked lists, binary trees).
No way to change the algorithm — e.g., you can’t change std::unordered_map to not use linked list chaining for collision resolution.
General containers may not meet the requirements of your specific application.

In short: sometimes you can do better!