Aussie AI

Chapter 10. Arrays

Book Excerpt from "Advanced C++ Memory Techniques: Efficiency and Safety"

by David Spuler, Ph.D.

Chapter 10. Arrays

Arrays are wonderfully efficient! They’re the most basic data structure known to humanity. The main features to note about an array include:

Contiguous memory storage — great for cache locality.
Single type of data — no need to be worried about the type.

In modern C++, there are several ways to create an array data structure:

std::array
std::vector
std::inplace_vector (C++26)

There are also some older methods of using arrays that still work in modern C++ code:

Fixed-size array variable: int arr[10];
Allocated fixed-size array: new int[10];
Old-style allocated array: malloc(sizeof(int)*10);

Note that the size of arrays in these examples don’t need to be a compile-time constant in C++. They can be a variable, where the size of the declared array is sorted out at run-time.

Array Operation Complexity

There are two main types of arrays to store objects: sorted and unsorted. Well, actually, there’s other types of arrays with different semantics (e.g., stacks, queues, heaps, ring buffers), but let’s just look at searching and sorting for now.

Are they fast? Here’s the 10,000 foot view:

Unsorted arrays — very fast insertions/deletions, but slow searches (linear) and even slower to sort the data.
Sorted arrays — faster search (logarithmic), slower insertions/deletions, and great if you need sorted data.

In more detail, here’s the overall complexity analysis of the basic searching methods:

Searching — unsorted is O(n) (linear search) and O(log n) for sorted (binary search).
Inserting — unsorted is O(1) (add to the end), but O(n) if sorted (shuffle required).
Deleting — this is O(1) if unsorted (tricky swap method!), but O(n) if sorted (also shuffles).
Print unsorted — both are O(n) with a linear scan of the array.
Print sorted — unsorted is O(n log n) because it requires a sort, but only O(n) if already sorted.

And some other algebraic operations:

Maximum/minimum — unsorted is O(n) because it requires a scan, but only O(1) if already sorted (choose first or last element).
Top-k elements — unsorted requires an O(n log n) sort or at least a “partial sort”; only O(k) for a sorted array.
Sum or average — both are O(n) because the whole array must be scanned.

Modern C++ Arrays

We’re going to implement our own sorted and unsorted arrays to examine the algorithms. Standard C++ already has two types of unsorted arrays in std::array and std::vector. We could just wrap around those types, but I’m going to use low-level raw arrays to show the algorithms in more detail.

Sorted arrays are trickier. Note that there’s no “sorted array” class in the standard C++ library. However, there are some primitives we can use to achieve sorted arrays:

std::sort() — modern C++ version with a hybrid quicksort/heapsort algorithm.
qsort() — old-style quicksort with function pointers (not recommended).

There is also some builtins for “binary search” on a sorted array:

std::binary_search() — modern C++ implementation for a sorted array.
std::equal_range() — binary search that handles duplicate elements in the array.
bsearch() — old-style binary search with function pointers (not recommended).

If we are inserting into a sorted array, we don’t need binary search exactly, because we’re assuming the element isn’t already in the array. Instead, we need a “binary-like search” method of finding the index location to insert a new item. In other words, we need to find the spot where the item fits in the array, but do it logarithmically, rather than using a slow linear scan.

Writing a binary-like search algorithm to find the insertion point is very fiddly coding! Fortunately, the standard C++ library has two methods that code it for us:

std::lower_bound() — generalizes binary search for use with insertions.
std::upper_bound() — similar version that finds the location above.

Strictly speaking, std::binary_search() in the C++ standard only requires a “partitioned” array rather than a “sorted” array. But for a scalar type with well-defined comparisons, this is the same thing.

Custom Array Implementation

Anyway, let’s look at some of the basic operations in our custom versions of array algorithms. We’ll examine the unsorted array version, but the sorted version is almost identical. Here’s the overall class members:

    template<typename T, int N>
    class UnsortedArray {
    private:
        T arr_[N];
        int capacity_ = N;
        int count_ = 0;
        //...
    };

Note that “capacity_” is somewhat redundant if we’re templating based on a compile-time array size. However, it would be useful if we were dynamically constructing our arrays at runtime.

Here are some of the basic “getter” functions:

    int size() { return count_; }
    int count() { return count_; }
    int capacity() { return N; }

And here are some of the basic utility functions:

    bool empty() { return count_ == 0; }
    bool full() { return count_ == N; }

Sorted Arrays

There is no standard C++ sorted array class, so we’ve got to implement our own. A sorted array has a good search lookup cost, being logarithmic in the number of elements, by using the “binary search” lookup algorithm. However, that’s not as good as a hash table (e.g., std::unordered_map), which has O(1) average search cost.

Insertions and deletions have a poor O(n) theoretical complexity, although the first phase of finding where to insert or delete is also logarithmic, using an algorithm very similar to binary search. The linear cost arises because once they find the location, they then need to shuffle elements:

Make a gap (insertion), or
Close a gap (deletion).

If we’re using a class object for our array, such as std::array or std::vector, we can use the insert() method. This is doing a shuffle behind the scenes.

The main advantage of a sorted array is that it’s, well, sorted, so if we want to process the array elements in sorted order, then it’s already done for us. That’s desirable because sorting an unsorted array is expensive with an O(n log n) complexity (e.g., std::sort typically uses a quicksort-heapsort hybrid).

If we need sorted data, there are other options in C++ containers. The std::map container is implemented as a balanced binary tree, called a “red-black tree,” and this has logarithmic complexity for all major operations: search, insertions and deletions. However, a sorted array has good memory cost because it used contiguous storage, so it should not be underestimated!

Shuffling Array Elements

Shuffling of array elements along by one location is required for both insertion and deletion in sorted arrays. Shuffle right to create a gap for a new insertion, and shuffle left to close a gap after deletion. We can also use this idea for unsorted arrays, but there are faster tricks, as examined later in this section.

In practice, shuffling of sorted arrays is quite efficient for scalar types via a memory block copy, using the memmove() standard function. Note that memmove() is an older function that does a bytewise copy of the memory that ignores object constructors and move operators. Presumably, the standard insert() method is using fast byte copies for scalar types.

Here’s an obscure pitfall: we cannot use various other copying methods because the shuffle involves overlapping source and destination memory blocks. There does not seem to be a version of C++ copying that permits overlaps. These functions would be incorrect and lead to undefined behavior on overlapping memory blocks, which is definitely true of any array shuffle:

std::memcpy (old C-style)
std::copy_n

However, we can use the overloads of the std::move function that work on ranges of multiple objects. These version of std::move have a real runtime cost, unlike the basic version, which is a compile-time type-cast that converts to a movable R-value reference (with no runtime code generated). We also need to pay attention to whether we are shuffling to the left or right, because these functions don’t work for all overlapping arguments.

std::move or std::copy — moving or copying left (i.e., close a gap for deletion).
std::move_backward or std::copy_backward — moving or copying right (i.e., create a gap for insertion).

Note that using std::copy or std::copy_backward functions also work here, but copying is slower than moving for non-scalar types. Hence, the std::move versions are more general, but still have some downsides:

Expensive for non-scalar objects.
Iterators are invalidated on the array.
Invalidates any pointers or references to specific objects.

Unfortunately, the shuffle cost is terrible for complex objects that will require their move operators called for every single object. I can’t say that I recommended sorted arrays for those types. Note that there are also various types of objects where we could still use a memory block move to do a “shallow move” of the objects (i.e., “relocatable objects”), rather than individually moving each element. However, using this idea requires tricks to prevent the C++ container from doing its move thing, such as using a low-level raw array rather than std::vector.

Binary-Like Sorted Array Insertion

Sorted arrays are logarithmic for searches, but not quite as good for insertions and deletions. Inserting a new element into a sorted array is a three-phase algorithm:

1. Find the location to insert,

2. Shuffle elements to the right (create a gap), and

3. Insert the new element at the location.

There are three ways to find the location in a sorted array:

1. Linear search from the front.

2. Linear search from the back.

3. Binary-like search (faster!)

Linear search over a sorted array doesn’t use equality, but finds the first element the bigger than the new element. Or to go in reverse, start at the end and look for the first element that’s smaller than the new one.

The advantage of starting at the end is that we can shuffle as we go, but it’ll have terrible cache locality problems in accessing memory addresses in reverse. CPU memory prefetch algorithms usually assume a forward access order.

Anyway, neither of the linear algorithms are fast and they aren’t typically used. Instead, binary-like search for the insertion point is much faster, with a logarithmic complexity.

Binary-like search for insertion involves splitting up the array into two intervals, and choosing between the two based on the midpoint value. This is not exactly the same as binary search, because we’re assuming that the element is not already in the array. Hence, it’s like binary search, but we’re looking for smaller versus bigger elements in comparison to the new element, rather than seeking equality.

Sorted Array Deletion

Deletion of an element in a sorted array is easier than insertion. There are two major phases:

1. Find the element using binary search.

2. Shuffle the elements left to close the gap.

Note that we’re using real binary search, not the binary-like search for insertion, because we assume the element is present. We can’t delete an element that’s not in the array. Hence, we can use std::binary_search to find the element.

The deletion phase is a left shuffle of all the array elements. As discussed above, we can do a byte copy such as memmmove() or std::move, which both are well-defined with overlapping memory blocks.

These methods can be efficient for scalar and other trivial types where bitwise shallow copying is allowed, but may trigger a cascade of move constructors or move assignments on complex classes. Thus, sorted arrays can be potentially inefficient for non-scalars because of the hidden costs of shuffling objects.

Unsorted Arrays

Unsorted arrays are not an all-star data structure, and don’t get a lot of use for basic search requirements. The main features include:

Slow search lookups in cases like associative arrays or sets (linear scan cost).
Fast insertions and deletions (constant cost, without any “shuffle”).
Sorting an unsorted array is costly with O(n log n) complexity.

Unsorted arrays are very useful if we want fast insertions and deletions, but rarely need to search or sort the array. Insertion is very fast with constant time, just by adding the new element at the end of the array. Deletions can also be implemented in constant time, but only via a trick of swapping the to-be-deleted element with the last element.

Interestingly, we can always fix our unsorted array by sorting it, and that turns out to be a decent idea. Let’s examine the two ways to get a sorted array:

Build an unsorted array, then sort it, or
Incrementally maintain a sorted array.

The first plan costs O(n) in total to do all the n insertions (unsorted), and then costs O(n log n) to sort it with std::sort. The second plan costs O(n) for every one of the n insertions into a sorted array, and so we get to O(n^2) quadratic complexity for the incremental sorted array approach. In summary, our analysis suggests:

Unsorted array (sort it later) — complexity of O(n log n).
Sorted array (incremental) — quadratic O(n^2) complexity.

An unsorted array might be the way to go? However, as discussed above, it’s not as bad as that sounds if we have scalar types in a sorted array, because the “shuffle” is a single memory block copy.

Note that an unsorted array is actually sorted in a weird way: by the order of insertions. Hence, if you have an ordered sequence of data, they are mapped into the array sequence according to the order in which they are processed. If these objects have an associated timestamp, your supposedly unsorted array may well be sorted implicitly according to the timestamp field.

Unsorted arrays are underestimated, and can be efficient in practice. An array that is unsorted functions as a list of items, but is stored in contiguous memory, which can make scanning the array efficient in terms of cache locality (e.g., faster than linked lists in std::list or red-black binary trees in std::map).

Unsorted arrays can be useful for semantics other than basic search lookups. An array can efficiently implement a fixed-size stack, but a fixed-size queue is better implemented using a ring buffer that progresses around the array in a circular fashion. You can also put a balanced binary tree or a heap data structure into an array, but we’re getting far away from a basic unsorted array in doing that.

Linear Search of Unsorted Arrays

Linear search is the worst part of unsorted arrays. There’s not really a better way to search an unsorted array. Here’s a simple hand-coded linear search of the array to demonstrate the algorithm that’s happening:

    int find_linear_search(const T &item)
    {
        for (int i = 0; i < count_; i++) {
            if (item == arr_[i]) 
                return i;  // found
        }
        return -1; // not found
    }

The above assumes we’re stored our data in a raw array type as the data member. If we choose to store the data as std::array or std::vector, we could use standard member functions to search the array, such as find().

Note that if we were doing a lot of searches of an array without many insertions or deletions, here’s an idea: pre-sort the array! This gives us this approach:

1. Pre-sort the array with std::sort

2. Use binary search on our newly sorted array.

The use of binary search reduces our searches to logarithmic complexity, which is much faster than linear search.

Template Value vs Reference Parameters

Templating based on a type has a common conundrum about how to choose between passing function parameters by reference or value. The desirable efficient that we want is usually:

Small integer types — pass-by-value.
Large class types — pass-by-reference.

Which signature should we use?

    int find_linear_search(const T &item)  // Const reference
    int find_linear_search(T item)  // Pass-by-value

Which one we desire for larger non-class types, such as long or double, is somewhat implementation-dependent and you need to benchmark it!

Unfortunately, there’s no way to alter the signature of a templated function according to a compile-time setting. I don’t think there’s even a way to do it in type traits.

However, the most common modern C++ style is to use const reference parameters. The reasons are:

Large class types — const& references are much faster.
Small integer types — it’s not much worse.

In one sense, I’m not sure about the last point, because:

1. It’s a micro-optimization, and

2. The compiler may auto-optimize it anyway.

But there is a simple solution whereby you can use const& reference parameters for generic types, but use pass-by-value for small integers. Template specialization to the rescue! Just define specialized versions of templated functions for the handful of small integer types:

    int find_linear_search(int item)  // Pass-by-value
    {
        // etc...
    }

Now you only have to define about 27 more versions for every single integral and floating-point type.

Fast Linear Search

You’re thinking that this doesn’t exist, and the heading is an oxymoron. But there are situations where linear search on an unsorted array can be faster than the alternatives:

Small number of elements
Sentinel search optimization
Low-level support for searching
Parallel linear search

Let’s examine all of these techniques in turn.

Sentinel linear search optimization. This is an optimization attributable to Knuth (1973) in the Mix programming language. The idea is to remove the conditional test in the loop (i.e., removing “i < count”) by guaranteeing a successful search. The trick is to add an extra element at the end of the array, which equals what we’re searching for.

Note that this requires that we declare our array data member with one more item than the capacity. We always need a spare element at the end, even if the array is full to capacity.

        T arr_[N + 1];  // Extra dummy element

Sentinel-based searching is only good for arrays of scalar types, because it requires making a copy of the search element, which is created at the end. The sentinel search of an unsorted array still has linear complexity, but has a lower complexity constant because each loop iteration is faster in practice.

Low-Level Search Support

Some types of CPU have explicit instructions that support scanning a memory block for a value. If we’re using an array of characters or bytes, there are these candidates:

std::find — on an array, vector, or string type.
strchr — old-style character strings (null-terminated)
memchr — low-level memory blocks of bytes.

The modern C++ code using std::find looks something like this:

    bool find_standard(const T& item)
    {
        auto iter = std::find(arr_, item);
        return iter != arr_.end();
    }

The version that returns the integer index of the element in the array is:

    int find_standard_index(const T &item)
    {
        auto iter = std::find(arr_, item);
        if (iter == arr_.end()) return -1;  // Fail
        return iter - arr.begin();  // Pointer arithmetic
    }

Note that this idea only works for arrays of contiguous memory. Pointer arithmetic doesn’t work well on general iterators for dynamic memory containers.

Parallel Linear Search

There are multiple ways that we could parallelize our linear search algorithm. It just depends on our budget! Here are some options:

CPU SIMD instructions (e.g., AVX or ARM Neon)
Multithreading (on CPU)
GPU hardware

SIMD instructions allow use to test multiple values in parallel on a CPU. For example, an x86 CPU from Intel or AMD allows the AVX sets of instructions, of which there are a few versions:

AVX — 128 bits (4 x 32-bit integers).
AVX-2 — 256 bits (8 x 32-bit integers).
AVX-512 — 512 bits (16 x 32-bit integers).
AVX-10 — 1024 bits (32 x 32-bit integers).

CUDA C++ GPU linear search. If we have an NVIDIA GPU, the type of parallelism is much more extensive. In fact, we can create 1024 threads, and each thread can compare only a few elements with our search key. This sounds like an almost constant-time algorithm on the GPU, but it’s not quite that good. In practice, there are two phases:

1. Compare each loop element in parallel, and

2. Collate the results.

The GPU can compare all the array elements 1024 at a time. Hence, it’s not constant time, but it’s still linear time divided by 1024.

Also, at the end we have a synchronization problem with detecting which of the threads had a successful result of the comparison. It’s not quite as bad as a “horizontal reduction” of the array (e.g., max or sum), but we have to synchronize the results in shared memory or global memory. We could use “warp shuffle” instructions that coordinate via faster GPU registers, but these only work within each warp of 32 threads, so it ends up being like a horizontal reduction over each warp.

Unsorted Array Insertions

Inserting into an unsorted array is very fast because we can just insert it at the end. This is very efficient with constant time complexity. The code example for insertion at the end:

    void insert_end(const T & obj)
    {
        if (full()) {
            throw std::overflow_error("Insert on full array");
        }
        else {
            arr_[count_++] = obj;
        }
    }

There’s nothing much to this code: only one statement! It’s very efficient to insert at the end of an array.

Insertion at an Index

Inserting in the middle of an unsorted array seems to be an O(n) operation. If we needed to insert into the middle, it would seem slower because of the need to shuffle the other elements out of the way. And that would certainly be true of a sorted array, where a shuffle is needed to maintain the sorted array.

But, no, we’re talking about an unsorted array here. Let’s ban the shuffle.

There’s a move trick to insert into the middle of an unsorted array at a given index in O(1) time. The trick is to note that in an unsorted array we only need to move a single element out of the way. The idea is two short phases:

1. Move the existing element “out of the way” and to the end.

2. Insert the element at that location.

Here’s a coded version of the “move away to the end” optimization. One fast way is to use std::move, which is like a type cast with no runtime code, and this causes move assignment on a complex object (or simple byte copying on a scalar type). Here’s the code:

    void insert_at_offset(const T & obj, int offset)
    {
        if (full()) {
            throw std::overflow_error("Insert on full array");
        }
        else {
            arr_[count_ + 1] = std::move(arr_[offset]);  // Move to end
            arr_[offset] = obj;  // Insert at location
            count_++;
        }
    }

Note that this only works for an unsorted array, not a sorted array. If we wanted a sorted order, or we need the implicit order-of-insertion in an unsorted array, then this “move to end” idea cannot be used as it will ruin the ordering.

Fast Unsorted Array Deletion

There’s a trick for deleting an arbitrary element from an unsorted array that is often missed. Unsorted array deletion need not be O(n) complexity, but can be done in O(1) time.

Deletion of an item from an unsorted array is a two-phase operation: find and destroy. Here’s the code to find the element, which uses linear search to find its offset, and is thus O(n) unavoidably:

    void delete_key(const T& item)
    {
        int offset = find_linear_search(item);
        if (offset == -1) {
            throw std::invalid_argument("Delete object not found");
        }
        else {
            delete_offset_swap(offset);
        }
    }

The naive idea for deleting from an unsorted array that we’ve found here is to remove the element and “shuffle” the rest of the elements downwards (to the left) so that there’s no “gap” in the array. Doing a shuffle isn’t so bad for scalar types, where it’s probably just one call to memmove behind the scenes. But for non-scalar objects, we’re moving a lot of objects. Either way, our unsorted array deletion with a shuffle has cost complexity of O(n) time.

There is a faster way!

First, let’s get rid of the special cases: if there’s only one element in the array, just erase it, and set the count to zero. And if the erase location is the end-most object, just erase it there, and decrement the count. Otherwise, if the object we want to remove is at the front or middle of the array, we do a tricky swap with the end element:

Swap arr[i] with arr[n-1]
Erase at arr[n-1]
Decrement n

This swap idea has changed our unsorted array deletion from O(n) time to the optimal O(1) complexity. There’s no loops anywhere!

Note that we can use std::swap here, and we may need to explicitly run the destructor of objects being destroyed (optional for scalar types). Here’s what the code looks like:

    void delete_offset_swap(int offset)
    {
            if (empty()) {
                throw std::underflow_error("Delete on empty array");
            }
            else if (count_ == 1) { // ***
                if (!std::is_trivially_destructible<T>::value) {
                    arr_[0].~T(); // Explicit destructor (if needed)
                }
                count_ = 0;
            }
            else {
                if (offset != count_ - 1) {
                    // Swap with the end element
                    std::swap(arr_[offset], arr_[count_ - 1]);
                }
                if (!std::is_trivially_destructible<T>::value) {
                    arr_[count_ - 1].~T(); // Explicit destructor (at end)
                }
                count_--;
            }
        }

The above code uses “type traits” from modern C++ to detect whether or not we need to explicitly run the destructor when destroying an object in the array. This is very efficient because type traits are evaluated to compile-time constants, so the compiler should optimize out the path if not needed (i.e., using “dead code elimination”). There are several options available in the type traits library, depending on exactly what types we want to support in our array:

std::is_trivially_destructible<T>::value
std::is_destructible<T>::value
std::is_scalar<T>::value

Actually, the above code has a minor inefficiency. The giveaway is that two code sequences with is_trivially_destructible are similar. Can you see it? We don’t need to expressly test for count==1 (marked with stars), because the general code in the else clause also works for that special case as well.

And also, what was I thinking? There’s no need to swap the element to the end, only to destroy it there. That’s two hidden moves inside std::swap, when we only need one moved element. The better idea than swapping is to destroy the object where it is, and then move the end element down:

    if (!std::is_trivially_destructible<T>::value) {
        arr_[offset].~T(); // Destroy in place
    }
    if (offset != count_ - 1) {
        // Move down the end element
        arr[offset] = std::move(arr_[count_ - 1]);
    }
    count_--;

Note that std::move() here is only a compile-time type cast operation. It will ensure that the move assignment operator is used on complex class types, and is also efficient for scalar and other trivial types.

Yes, moving the end element to the middle of the unsorted array changes some addresses. It will certainly invalidate iterators over the container. But so would the shuffle of elements, so we’re okay there.

Note that this only works for an unsorted array data structure. If we did this on a sorted array, we’d ruin the sorting order in the array by moving the biggest element into the middle of the sequence. Sorted arrays need to do the shuffle.

One final point is that this fast deletion trick with swapping will break the unofficial ordering of the array by its insertion order. If we have timestamps associated with our array elements, swapping the end element into the middle will ruin that implicit ordering.

Container Deletion Pitfalls

While we’re on the topic of deletions, let’s look at some common mistakes with deletions from C++ containers. There are at least two major pitfalls in using the erase() method to remove an object from a C++ container. Here’s the basic first attempt:

    for (auto iter : container) {
        if (want_to_delete(*iter)) {
            container.erase(iter);  // Kaboom!
        }
    }

This will crash with a big mushroom cloud. The problem is that we’ve assumed the iterator stays valid, whereas the erase() method actually returns an updated iterator that we need to use. We can’t use a range for loop to do this, so we have to use begin() and end() manually:

    for (auto iter = container.begin(); iter != container.end(); ++iter) {
        if (want_to_delete(*iter)) {
            iter = container.erase(iter);  // Use return value
        }
    }

This is not a crash, but still a major bug. The iterator loop skips over the next item after the erased object. There are two increments in the deletion sequence:

1. erase() returns the next valid iterator (after the removed object), and

2. ++iter skips to the next element (again!).

To get it correct, we need to change the idiom to avoid ++iter if we erase anything.

    for (auto iter = container.begin(); iter != container.end(); /*Not here!*/ ) {
        if (want_to_delete(*iter)) {
            iter = container.erase(iter);  // Use return value
        }
        else {
            ++iter;  // Only if not erasing!
        }
    }

And now the code finally works!

Bypassing Interfaces

The std::array and std::vector classes are designed to allow you to get access to the stored data via the data() member function. It’s also guaranteed that the data is stored in contiguous memory locations. Note that this is also true of std::string, which has a data() member and also c_str(), which returns the same address.

The data() method allows direct access via pointers or low-level array types to the data in the standard array or vector containers. Whether doing this is any faster is unclear, and needs benchmarking, since many of the member functions are simple pass-through inlined functions that work on the internal data anyway.

But there’s certainly a few pitfalls! The address returned by the data() member is not guaranteed forever. There are at least two major types of bugs:

Object is destroyed, or
Object is moved or modified.

Since you have a pointer to an object’s data, you want that object to stick around. But the object can disappear in a few ways:

Stack object goes out of scope (triggering the destructor and unwinding the stack).
Allocated object is deallocated by the delete operator.
Object is moved by a container (e.g., an auto-resize or other “iterator invalidation” situation).

Even if the object stays around to watch your skills, there’s another problem. If the underlying object is modified, then the internal address of the data that you have may become invalid. The issues are very similar to the well-known “invalidated iterator” problems with containers. Changes to the container that probably invalidate the data() pointer include:

Insertions and deletions
reserve()
resize()
shrink_to_fit()

Any of these members that modify the object are allowed to move the data. For example, they might allocate a different memory block, and move the whole array away from your pointer. But there are a huge number of other situations under which an iterator into a container may become invalidated, which presumably also invalidates an old address returned from the data() member function. Watch out!

Extensions

Benchmark the unsorted array implementation above using a raw array type versus an alternative approach of using a std::vector member object to store the data.
Benchmark the sorted array implementation with a raw array versus using std::vector as the internal data array, especially to see if our hand-coded binary search is fast or not.
Explore the use of “shallow copying” on sorted arrays containing “relocatable objects” in the shuffle needed for insertions and deletions in a sorted array data structure.
Explore the efficiency of calls to move constructors in a “shuffle” for a sorted array implemented using std::vector or std::array.
Implement the binary-like search algorithm to find the insertion location in a sorted array. (Note that deletion is just the normal binary search to find the element.)
Benchmark inserting into an unsorted array and then sorting using std::sort, because incrementally maintaining a sorted array. Do the results differ for a scalar integer type versus arrays of an object like std::string (which has move operators)?
Implement a hybrid binary-linear search where the binary search reverts to linear search once the interval is small enough.
Implement an AVX SIMD version of linear search over integers that tests a number of integers in the array at once.
Implement a “cache-aware” binary search that chooses the middle index at the start of a cache line (where possible), and tests all values in that cache line immediately using an unrolled linear search.
Implement a binary search that is both cache-aware and uses AVX SIMD instructions to test all elements in the same cache line more efficiently.