Aussie AI

Chapter 21. Arrays

Book Excerpt from "C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations"

by David Spuler, Ph.D.

Chapter 21. Arrays

Arrays are wonderfully efficient! They’re the most basic data structure known to humanity. The main features to note about an array include:

Contiguous memory storage — great for cache locality.
Single type of data — no need to be worried about the type.

In modern C++, there are several ways to create an array data structure:

std::array
std::vector
std::inplace_vector (C++26)

There are also some older methods of using arrays that still work in modern C++ code:

Fixed-size array variable: int arr[10];
Allocated fixed-size array: new int[10];
Old-style allocated array: malloc(sizeof(int)*10);

Note that the size of arrays in these examples don’t need to be a compile-time constant in C++. They can be a variable, where the size of the declared array is sorted out at run-time.

Array Operation Complexity

There are two main types of arrays to store objects: sorted and unsorted. Well, actually, there’s other types of arrays with different semantics (e.g., stacks, queues, heaps, ring buffers), but let’s just look at searching and sorting for now.

Are they fast? Here’s the 10,000 foot view:

Unsorted arrays — very fast insertions/deletions, but slow searches (linear) and even slower to sort the data.
Sorted arrays — faster search (logarithmic), slower insertions/deletions, and great if you need sorted data.

In more detail, here’s the overall complexity analysis of the basic searching methods:

Searching — unsorted is O(n) (linear search) and O(log n) for sorted (binary search).
Inserting — unsorted is O(1) (add to the end), but O(n) if sorted (shuffle required).
Deleting — this is O(1) if unsorted (tricky swap method!), but O(n) if sorted (also shuffles).
Print unsorted — both are O(n) with a linear scan of the array.
Print sorted — unsorted is O(n log n) because it requires a sort, but only O(n) if already sorted.

And some other algebraic operations:

Maximum/minimum — unsorted is O(n) because it requires a scan, but only O(1) if already sorted (choose first or last element).
Top-k elements — unsorted requires an O(n log n) sort or at least a “partial sort”; only O(k) for a sorted array.
Sum or average — both are O(n) because the whole array must be scanned.

Modern C++ Arrays

We’re going to implement our own sorted and unsorted arrays to examine the algorithms. Standard C++ already has two types of unsorted arrays in std::array and std::vector. We could just wrap around those types, but I’m going to use low-level raw arrays to show the algorithms in more detail.

Sorted arrays are trickier. Note that there’s no “sorted array” class in the standard C++ library. However, there are some primitives we can use to achieve sorted arrays:

std::sort() — modern C++ version with a hybrid quicksort/heapsort algorithm.
qsort() — old-style quicksort with function pointers (not recommended).

There is also some builtins for “binary search” on a sorted array:

std::binary_search() — modern C++ implementation for a sorted array.
std::equal_range() — binary search that handles duplicate elements in the array.
bsearch() — old-style binary search with function pointers (not recommended).

If we are inserting into a sorted array, we don’t need binary search exactly, because we’re assuming the element isn’t already in the array. Instead, we need a “binary-like search” method of finding the index location to insert a new item. In other words, we need to find the spot where the item fits in the array, but do it logarithmically, rather than using a slow linear scan.

Writing a binary-like search algorithm to find the insertion point is very fiddly coding! Fortunately, the standard C++ library has two methods that code it for us:

std::lower_bound() — generalizes binary search for use with insertions.
std::upper_bound() — similar version that finds the location above.

Strictly speaking, std::binary_search() in the C++ standard only requires a “partitioned” array rather than a “sorted” array. But for a scalar type with well-defined comparisons, this is the same thing.

Custom Array Implementation

Anyway, let’s look at some of the basic operations in our custom versions of array algorithms. We’ll examine the unsorted array version, but the sorted version is almost identical. Here’s the overall class members:

    template<typename T, int N>
    class UnsortedArray {
    private:
        T arr_[N];
        int capacity_ = N;
        int count_ = 0;
        //...
    };

Note that “capacity_” is somewhat redundant if we’re templating based on a compile-time array size. However, it would be useful if we were dynamically constructing our arrays at runtime.

Here are some of the basic “getter” functions:

    int size() { return count_; }
    int count() { return count_; }
    int capacity() { return N; }

And here are some of the basic utility functions:

    bool empty() { return count_ == 0; }
    bool full() { return count_ == N; }

Container Deletion Pitfalls

While we’re on the topic of deletions, let’s look at some common mistakes with deletions from C++ containers. There are at least two major pitfalls in using the erase() method to remove an object from a C++ container. Here’s the basic first attempt:

    for (auto iter : container) {
        if (want_to_delete(*iter)) {
            container.erase(iter);  // Kaboom!
        }
    }

This will crash with a big mushroom cloud. The problem is that we’ve assumed the iterator stays valid, whereas the erase() method actually returns an updated iterator that we need to use. We can’t use a range for loop to do this, so we have to use begin() and end() manually:

    for (auto iter = container.begin(); iter != container.end(); ++iter) {
        if (want_to_delete(*iter)) {
            iter = container.erase(iter);  // Use return value
        }
    }

This is not a crash, but still a major bug. The iterator loop skips over the next item after the erased object. There are two increments in the deletion sequence:

1. erase() returns the next valid iterator (after the removed object), and

2. ++iter skips to the next element (again!).

To get it correct, we need to change the idiom to avoid ++iter if we erase anything.

    for (auto iter = container.begin(); iter != container.end(); /*Not here!*/ ) {
        if (want_to_delete(*iter)) {
            iter = container.erase(iter);  // Use return value
        }
        else {
            ++iter;  // Only if not erasing!
        }
    }

And now the code finally works!

Bypassing Interfaces

The std::array and std::vector classes are designed to allow you to get access to the stored data via the data() member function. It’s also guaranteed that the data is stored in contiguous memory locations. Note that this is also true of std::string, which has a data() member and also c_str(), which returns the same address.

The data() method allows direct access via pointers or low-level array types to the data in the standard array or vector containers. Whether doing this is any faster is unclear, and needs benchmarking, since many of the member functions are simple pass-through inlined functions that work on the internal data anyway.

But there’s certainly a few pitfalls! The address returned by the data() member is not guaranteed forever. There are at least two major types of bugs:

Object is destroyed, or
Object is moved or modified.

Since you have a pointer to an object’s data, you want that object to stick around. But the object can disappear in a few ways:

Stack object goes out of scope (triggering the destructor and unwinding the stack).
Allocated object is deallocated by the delete operator.
Object is moved by a container (e.g., an auto-resize or other “iterator invalidation” situation).

Even if the object stays around to watch your skills, there’s another problem. If the underlying object is modified, then the internal address of the data that you have may become invalid. The issues are very similar to the well-known “invalidated iterator” problems with containers. Changes to the container that probably invalidate the data() pointer include:

Insertions and deletions
reserve()
resize()
shrink_to_fit()

Any of these members that modify the object are allowed to move the data. For example, they might allocate a different memory block, and move the whole array away from your pointer. But there are a huge number of other situations under which an iterator into a container may become invalidated, which presumably also invalidates an old address returned from the data() member function. Watch out!

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: C++ Ultra-Low Latency

C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:

Low-level C++ efficiency techniques
C++ multithreading optimizations
AI LLM inference backend speedups
Low latency data structures
Multithreading optimizations
General C++ optimizations

Get your copy from Amazon: C++ Ultra-Low Latency

Aussie AI

Chapter 21. Arrays