Aussie AI

Chapter 45. String Optimizations

Book Excerpt from "C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations"

by David Spuler, Ph.D.

Chapter 45. String Optimizations

Efficient Strings

The C++ std::string class is a beautiful and elegant class that has been well-designed and near-optimally implemented. Its main advantages include:

High-level abstraction of string coding
Automates management of memory buffer allocation
Safety (e.g., no buffer overflows when appending or concatenating)
Moderately efficient

Note that I only said efficiency was “moderate”! As classes go, it’s one of the most efficient, with lots of inline member functions and implementations super-optimized by compiler engineers. Some of the fast parts of the standard string class include:

Small String Optimization (SSO)
Fast to copy
Fast move semantics

But it’s still not as efficient as bypassing the string interfaces and doing low-level string processing directly with char* pointers and arrays.

So, here we have a perfect example of the maxim: don’t optimize prematurely! I’m not advocating to replace all strings with C-style string operations, but if your profiler finds a hot-spot in a C++ string operation, you can do better. Furthermore, if you’re doing a very string-intensive application, such as text processing, the lowest level kernels that spin through the document probably shouldn’t use the string class.

Common String Operations

If you have a string, and you want to do some work on that string, the std::string class is often very fast. In the situations where it’s not, you can also revert to old-style coding on char* pointers by using the data() or c_str() methods to get to the raw character array.

String length. The length() method is extremely fast, and always so. The comparison goes like this:

length() — always blazingly fast.
strlen() — slow on very long strings.

Since the string class maintains the string length incrementally as a data member, it’s already been precalculated. Hence, it’s an inlined access to an already-computed integer.

In comparison, C-style null-terminated strings must scan for the null byte. Hence, strlen() is slow on very long strings, whereas length() is still fast.

String Equality Comparisons. Which method is faster is unclear, depending on the implementation of operator==, but my money’s on the string class. In particular, it can compare the lengths quickly, since it has that precomputed for both strings. The full list of ways to compare strings:

operator==() — fast version.
compare() — explicit method version.
strcmp() — old-style string comparisons.

Case-Ignoring String Equality Comparisons. There’s not a standard case-ignoring version of the compare() method. However, there are non-standard implementations:

stricmp() — Windows (MSVS)
strcasecmp() — Linux (GCC)

String Search. This is a very simple and long-standing requirement. Your options are pretty obvious:

find() — simple and fast!
strstr() — the old C function.

Case-Ignoring String Search. There’s not a standard method function named “ifind” or “stristr”, but there are ways to get there:

strcasestr() — Linux
StrStrIA() on Windows in shlwapi.h

Reverse String Search. There the string class method rfind() for reverse string searching. There’s not really a good alternative in the older C-style libraries.

Character Search. Searching a string for the first occurrence of a string characters. The options include:

find(char) — string class overload.
strchr() — old-style C function.

Reverse Character Search. The options here are:

rfind(char) — another class overload.
strrchr() — reverse long-standing C function.

Note that the rfind() version is likely faster than the older function on very long strings, because it has the string length precalculated in the string object and can jump straight to the end, whereas strrchr() has to scan from the beginning of the string.

Multi-Character Search. If you want to search for a prefix or suffix of a set of characters, rather than just one, then the C++ string class has what you need:

find_first_of() — first character from a set.
find_first_not_of() — first character not in the set.

The suffix versions are:

find_last_of()
find_last_not_of()

Prefix and Suffix Tests. The standard C++ methods on the string class are:

starts_with() (C++20)
ends_with() (C++20)

Other options include:

string::find() — search forwards
string::rfind() — reverse search
LastIndexOf — Win32 version

There’s also some other options:

remove_prefix() in string_view (C++17)
remove_suffix() in string_view (C++17)

You can always code your own versions:

    inline bool STRPREFIX(const char *s, const char *prefix) {
        return strncmp(s, prefix, strlen(prefix)) == 0;
    }

Here’s a modern C++ style version:

    inline bool string_prefix(const std::string& str, const std::string& prefix)
    {
        return str.find(prefix) == 0;
    }

And here’s the same idea for suffix, using the “reverse find” method:

    inline bool string_suffix(const std::string& str, const std::string& suffix)
    {
        return str.rfind(suffix) + suffix.length() == str.length(); // Buggy!
    }

Actually, that’s a bit careless of the failure return -1 from rfind(). Here’s a fixed version:

    inline bool string_suffix(const std::string& str, const std::string& suffix)
    {
        int offset = str.rfind(suffix);
        if (offset == -1) return false;  // not found
        return offset + suffix.length() == str.length();
    }

Note that rfind is needlessly inefficient here if the string is very long and the suffix is not present. It keeps on scanning all the way to the start of the string, rather than quitting early. There’s certainly a faster way to do it, such as comparing the two lengths, using them to compute the address of where the suffix would be, and then use basic string equality testing.

Case-Ignoring Prefix and Suffix Tests. There’s not much help with this in the standard libraries, so you’ll have to roll your own with strnicmp (Windows) or strncasecmp (Linux):

    inline bool STRIPREFIX(const char *s, const char *prefix) {
        return strncasecmp(s, prefix, strlen(prefix)) == 0;
    }

Here’s my attempt at a fast suffix version, which mixes C++ and C coding, but won’t be slow on a long string:

    inline bool string_strisuffix(const std::string& str, const std::string& suffix)
    {
        int strlen = str.length();
        int suffixlen = suffix.length();
        if (suffixlen > strlen) return false;
        int offset = strlen - suffixlen;
        const char* raw = str.c_str();
        raw += offset;
        const char* suffixraw = suffix.c_str();
        return stricmp(raw, suffixraw) == 0;
    }

I’m sure that you could do better!

String Class Inefficiencies

What’s so bad about the standard string class? Nothing, unless you want to do a lot of processing of strings. Here’s a list of some of its problems:

1. It’s a large object (e.g., 40 bytes).

2. Sequences of binary + operators.

3. Too many calls to new and delete.

4. No way to use a larger non-allocated buffer.

5. Cannot use reference counting and copy-on-write.

A lot of these concerns can be summarized: it’s too easy to use!

Programmers tend to get comfortable with the very convenient ways that std::string can be used in C++ programs. In comparison, doing C-style string processing with low-level character buffers is painful! Hence, there’s a tendency to forget that C++ strings are significant objects that invoke memory allocation on all but the smallest of text strings.

String Memory Layout

The std::string class creates objects of a reasonable size, unlike C-style char* The string class is quite complicated, although great compiler engineers have made it look easy. Some of the main points about string efficiency are:

Small String Optimization (SSO) is standard (with a small internal buffer).
Reference counting is not enabled (and nor is Copy-On-Write).

The use of SSO makes sense because otherwise even just declaring an empty string object would cause a memory allocation call to the new operator:

    std::string s1;   // No memory allocation!

We can interrogate the string objects about their features using standard member functions such as data(). If the pointer to the data is inside the object itself, then we’re using SSO. And if two objects created from each other (via copy constructor and/or assignment operator) have the same data buffer address, then reference counting is enabled.

Here is some code that uses standard string member calls to determine some details about the layout of a string object.

    void print_string_details()
    {
        std::string str;
        cout << "Sizeof std::string = " << sizeof(std::string) << " bytes" << endl;
        int bytes = str.capacity() + 1;
        int header = (sizeof(str) - bytes);
        cout << "Capacity std::string = " << str.capacity() << " characters (" 
                                << bytes << " bytes)" << endl;
        const char* datastr = str.data();
        char* saddr = reinterpret_cast<char*>(& str);
        bool is_sso = datastr >= saddr && datastr < saddr + sizeof(std::string);
        cout << "Short String Optimization (SSO): " << (is_sso ? "yes" : "no") << endl;
        cout << "Reference counting: " << (string_is_reference_counted(bytes*100) ? "yes" : "no") << endl;
        int offset = (int)(datastr - saddr);
        if (offset == 0) {
            cout << "Character buffer at start of string object (offset=0)" << endl;
        }
        else if (offset + bytes == sizeof(std::string)) {
            cout << "Character buffer at end of string (offset = " << offset << ")" << endl;
        }
        else {
            cout << "Character buffer in middle of string (offset = " << offset << ")" << endl;
        }
        cout << "Header block bytes = " << header << " (" 
             << offset << " before buffer, " << (header - offset) << " after buffer)" << endl;     
    }

And here are the results in MSVS on my Windows laptop:

    Sizeof std::string = 40 bytes
    Capacity std::string = 15 characters (16 bytes)
    Short String Optimization (SSO): yes
    Reference counting: no
    Character buffer in middle of string (offset = 8)
    Header block bytes = 24 (8 before buffer, 16 after buffer)

As to the 24 header bytes here, that could be 3 pointers (8 bytes or 64-bits each), or maybe it’s 1 pointer to the buffer and 2 different 64-bit integers for length and capacity. We can go exploring in the memory layout of the header block inside a string object to try to answer that question. It’s non-standard coding that is implementation-specific, but plenty of people have done it!

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: C++ Ultra-Low Latency

C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:

Low-level C++ efficiency techniques
C++ multithreading optimizations
AI LLM inference backend speedups
Low latency data structures
Multithreading optimizations
General C++ optimizations

Get your copy from Amazon: C++ Ultra-Low Latency

Aussie AI

Chapter 45. String Optimizations