Aussie AI

Chapter 4. Pragmatic Safe C++

Book Excerpt from "Safe C++: Fixing Memory Safety Issues"

by David Spuler

Don't Wait!

There are many actions you can do now to improve C++ safety and resilience. Many techniques can improve the quality of the code and harden it against bugs and security glitches. I am certainly not an advocate of switching to Rust, and these techniques will improve C++ safety while awaiting the full compile-time guarantees of the Safe C++ standard.

Many of these approaches are internal to the software development processes, and do not impact the execution speed of the product at the customer's site at all. Examples of these zero-impact approaches include:

Automated testing (i.e., unit testing, regression testing)
Source code analysis tools
Using runtime memory checkers in the lab

However, there are a few approaches that may impact performance for customers, whether this means paying external customers or the internal "customers" using your code in production. Approaches that can reduce speed include:

Leaving self-testing code in the production build (e.g., assertions, self-tests, parmaeter validation).
Running dynamic "debug libraries" that detect and mitigate various memory issues.

The performance impact can range from minimal (e.g., testing return codes) to quite expensive (e.g., memory address validation to the same level as valgrind). I don't think your customers want that last one.

Hardening C++

Here are some ways to "harden" C++ code against both bugs and security vulnerabilities:

Unit testing
Assertions
Self-testing code blocks
Debug tracing
Function parameter validation
Module-level self-tests
Error checking of function return codes

Tools and environments are another area to optimize settings for safety:

Compiler safety options
Linters and static analyzers
Debug wrapper libraries
Memory error detection tools (sanitizers)

Automation of the "nightly builds" and other build-related automatic testing can be improved:

Warning-free builds
Run unit tests via CI/CD approval automation (run-time hardening)
Run longer regression tests on nightly builds (if too slow for CI/CD).
Build on multiple compilers and platforms (compile-time and run-time hardening)
Build with different optimization levels
Have multiple build paths with more or less warnings from compilers and/or linters.
Make sure someone's fixing all these warnings!

General policies around the development of C++ for greater reliaiblity include:

Coding policies
Code review on pull requests
CI/CD automation
Automated testing harnesses

Software development management issues include overall reliability of the whole workflow:

Source code control systems (i.e., git, svn, cvs)
Bug tracking systems
Support case tracking systems
Third-party library management and updates policy
Release management
Executables and debug versions management
Backups policy

Safer Production C++

There are some safe C++ techniques that are fast enough to be considered for production release. As already mentioned, a lot of assertions, function return checks, parameter validation checks, and simple self-tests can be optionally left in production code. The assumption here is that these are all only single value comparisons, and are thus not costly.

Let us examine a few more speedy self-tests. Some of my targets in this section include:

Uninitialized memory usage.
Already-deallocated memory usage.
Double-deallocation erorrs.

These are whole categories of C++ memory safety errors that could be prevented. Let us examine these ideas in more detail.

Uninitialized memory usage. The simplest idea is to initialize all memory. The main primitives that create uninitialized memory are malloc and new, and there is also realloc whenever it expands the block. There are also uninitialized stack memory blocks from alloca. There are two ways to fix this:

Auto-intercepts via macros or link-time wrappers that zero the memory.
Coding policies requiring an immediate call to memset after these routines.

Surely using memset is very efficient, and this policy will prevent a whole swathe of common memory errors. Note that this does not address all types of uninitialized memory, as it does not intercept stack-local variables, such as simple variables and uninitialized non-static arrays. These can be addressed by a coding policy of never declare any variable without an initializer. This is not necessary for global variables or static variables as these are already initialized to zero as part of standard C++.

De-allocated memory detection. A simple trick to prevent most de-allocated memory usage errors is to write a four-byte magic value into the first bytes of the deallocated block. Since this memory is being de-allocated, you can write whatever you want into it. This requires the macro or link-time interception of free and delete.

This method can especially prevent (and detect) any double-deallocation memory errors. It is easier to do this check because both free and delete should always be given an address at the start of an allocated memory block. Hence, these deallocation primitives should first check for the magic value (and avoid deallocation if found), before setting the magic value themselves before deallocation. This method can trigger a few false positives, resulting in only memory leaks, and also requires intercepted memory allocation primitives to ensure that no blocks are less than four bytes.

Safer coding policies. If your preference is the use of coding policy guidelines for safer C++ (rather than macro-interception of primitives), some of the ways to address memory safety include:

Prefer calloc to malloc
Follow calls to malloc or non-object new operations with memset to zero.
Precede calls to free with memset, if the size is known.
Add memset at the end of destructors (assuming the size is known).
Alternatively, write magic values before free or non-object delete, and at the end of destructors.

There are various other coding policies available on the internet for safer C++ coding. Many of these are focused on secure C++ coding, which mostly achieves the same thing.

Building QA-Enabled Products

Developers love to get assigned work by the QA department. Hence, it's beneficial to build testing enablement capabilities directly into the product itself, to make the life of the QA staff easier.

Some ideas for building testing enablement into your product:

Build separate "debug" versions of your executable with more enabled self-testing code (this is not just the debug symbols, but enabling memory checking, stack canaries, or other internal safety features).
Command-line interface for easier automated testing with scripts and test harnesses.
Test-containing version: a debug version that is linked with the unit tests and has a "-test" command-line option that runs the self tests itself.
Add a "-safe" command-line option that enables additional internal memory safe-checking.

Many of these ideas can also be combined with "supportability" initiatives. After all, product support is like on-site QA. Some of the opportunities to increase supportability for customers include:

Simple way to detect full context details (e.g., build dates and numbers, versions, etc.)
Unique error codes in all error messages that customers might see.
Printing error context details or full stack backtrace for serious failures or also in logging of less serious problems such as "soft assertions."

Note that these product features are not just for the QA process, since these capabilities can be used during development in the automated test runs or the nightly builds.

Triggering Bugs Earlier

A lot of bugs can be found using the techniques already mentioned. The above approaches are very powerful, but they can be limited in some less common situations:

Intermittent bugs — hard to reproduce bugs.
Silent bugs — how would you even know?

You can’t really find a bug with gdb or the valgrind memory checker if you can’t reproduce the failure. It's probably a memory error nevertheless. On the other hand, an intermittent failure might be a race condition or other synchronization error.

Silent bugs are even worse, because you don’t know they exist. I mean, they’re not really a problem, because nobody’s logged a ticket to fix it, but you just know it’ll happen in production at the biggest customer site in the middle of Shark Week.

How do you shake out more bugs? Here are some thoughts:

Add self-testing code with more complex sanity checks.
Consider debug wrapper functions with extra self-testing.
Add more function parameter validation
Auto-wrap function calls ensure error return checking for all calls.

Consider non-memory bugs with changes such as:

Arithmetic overflow or underflow is a very silent bug for both integers and floating-point (e.g., check unsigned integers aren’t so high they’d be negative if converted to int).
Add assertions on arithmetic operations (e.g., tests for floating-point NaN or negative zero).

With all of these things, any extra runtime testing code requires a shipping policy choice: remove it for production, leave it in for production, only leave it in for beta customers, leave in only the fast checks, and so on.

If you’re still struggling with an unsolvable bug, here are a few “hail Mary” passes into the endzone:

Review the latest code changes; it’s often just a basic mistake hidden by “code blindness.”
Add a lot of calls to synchronization primitives or run single-threaded to rule out concurrency issue.
Try memset after malloc or new, or change to calloc.

And some other practical housekeeping tips can sometimes help with detecting new bugs as soon as they enter the source code and planning ahead for future failures:

Examine compiler warnings and have a “warning-free build” policy.
Have a separate “make lint” build path with lots more warnings enabled.
Keep track of random number generator seeds for reproducibility.
Add some portability sanity checks, such as: static_assert(sizeof(float)==4);

I guarantee that last one will save your bacon one day!

Compiler Vendor Safety

The various compiler vendors could assist in increasing the level of safety in C++. Let us examine the use of additional safety in existing practices, as an interim step before moving to full memory safety in a future Safe C++ standard. The ideas below assume the compiler vendors could make changes to:

(a) The code generation features of some operators, and

(b) The Standard C++ library routines (e.g., malloc and new).

This is also just a first commentary. I am sure that the compiler designers who do this kind of stuff all day long could come up with a much more extensive proposal, perhaps with additional levels of trade-offs and individual settings for various sub-types of techniques.

The focus here is to go beyond what is possible via macro or link-time intercepts in your own safety wrapper library. There are additional techniques that can only be applied by the compiler, which are difficult or impossible to do via intercepts. The idea of this section is not necessarily for the safety modes to be used in production, although perhaps the lower levels could be, but to allow multiple levels of safety runs to be used in development practices. For example, a separate build path to run the unit tests at each different safety level.

The basic idea is a simple option:

    -safe

This would turn on safety levels for all of the code by default. This could be overridden by unsafe and safe blocks, as in the Safe C++ proposal. The situations where some memory blocks are allocated or initialized in an unsafe block needs to be considered carefully.

The extra capabilities that this first-level safety option could enable would include ways to increase the overall safety of the program, focused on tolerance rather than detection. Such ideas include:

malloc and new should initialize memory bytes to zero
realloc also, when it extends the memory
Stack memory for automatic variables should initialize all to zero.
The alloca stack allocation primitive should also zero the bytes.

These methods would not allow toleration of errors, but only reducing their occurrence. Other possibilities involving both detection and toleration include:

The standard library functions should all tolerate a null pointer argument without crashing for all routines.
Invalid parameters to the standard library should also be blocked (e.g., zero size to memset).
The * and [] operators should prevent null pointer dereferences with a basic test.
free and delete should use bit flags in the memory header block to detect and avoid many invalid address de-allocations.
Similarly, double deallocation errors could be detected and made harmless.

There are many other possibilities on the theme of "only requires an integer or pointer test" for higher safety with detection and/or toleration. Obviously, the compiler could offer each of these capabilities as a separate option, too, rather than grouped into a particular safety level.

Level 2 Safety

The next level of safety would be possible via a compiler option:

    -safe=2

The idea of the second-level safety is to use magic values and/or canaries as part of the safety net. The initial check is an integer test of a simple four-byte magic value (or canary), which indicates the high likelihood of an error, in which case a more expensive analysis of an address cna be performed. Overall, this would be very efficient, but occasionally having false positives. This level two safety check goes beyond using simple integer or pointer tests, and mostly adds a superset of additional safety checks. Examples of capabilities include:

malloc and new add magic bytes in their header control block, before their allocated bytes as usual, to detect overwrites from array underruns. Alternatively, the header control block itself can be checked for consistency, without using extra bytes.
Automatic simple variables are initialized to zero.
Array memory blocks on the stack have a canary region at both ends, and a magic value in their first bytes if not initialized.
The alloca stack allocation function similarly uses two canary regions and a magic value at the start.
Global arrays and memory blocks also have a small canary region at both ends (but they are initialized to zero in standard C++, so there is no magic value in the first bytes).
Similarly, they add extra bytes (e.g., four) at the end with canary magic bytes to detect array overruns.
malloc and new put a single magic value, possibly a four-byte integer, at the start of their block indicating "not yet initialized." The rest of the bytes inside the block are zeroed for safety.
Deallocation by free or delete would set a marker in the header control block, and also a magic value at the first address of the memory block to indicate "already freed" status. (They would first validate the magic value to detect freeing uninitliazed memory, and check canary overruns at both ends of the block.)
Library routines that use or write to an address can check for this magic byte at the start of the memory block, and only if the value appears to be a faulty magic byte (indicating never-initliazed or already-freed blocks), initiate more complex tests to check if it's an invalid block. This only detects cases involving addresses at the start of a block, rather than in the middle.

The goal of level two safety is to check magic values, typically a single integer value, which is relatively efficient. This finds an increasing level of errors, but does not detect all cases:

Addresses in the middle of a block are not easily validated.
Array overruns or underruns via * and [] operators are not detected immediately, but may be detected by overruns.

It would be too expensive to modify every * and [] operator to detect these memory errors. However, compilers are already able to auto-detect certain types of loops (e.g., auto-vectorization), in which case the very first access could be checked, and possibly the full extent of the loop could be analyzed to determine array overruns by the end of the loop.

Level 3 Safety

The idea of level 3 safety is similar to what is available from sanitizers such as valgrind. As such, its performance may be sluggish and inappropriate for prodution usage. Since this type of performance is already available via sanitizers, it may be unnecessary for compiler vendors to add this functionality directly. However, it should be noted some some sanitizers have limitations (e.g., valgrind cannot detect overruns on non-allocated global memory or stack memory blocks), whereas compiler designers could offer greater capabilities.

The basic design of this capability involves:

All addresses are validated and checked, even those in the middle of a block, or completely outside all blocks.
All types of memory blocks are validated, including stack memory, read-only memory, and global memory, whereas the previous levels focused on heap blocks.
All library functions have their address parameters validated.
All address-related operations, such as * and [], will have their addresses validate.

I remain unconvinced as to the necessarity of this high-level safety capability in compilers. This would be extremely slow to run, and serves as a reminder that what is really needed is the compile-time memory safety guarantees in the Safe C++ proposal, as these do not have an performance impact at all!

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: Safe C++: Fixing Memory Safety Issues