Aussie AI

Chapter 14. Safe Standard C++ Library

  • Book Excerpt from "Safe C++: Fixing Memory Safety Issues"
  • by David Spuler

Debug Standard Library Versions

Various vendor capabilities exist in terms of debugging features and debug versions of the standard library. GCC appears to be leading the way, and Clang has a lot of features too. The Microsoft Visual Studio capabilities are less fully formed in this area, although it does have static analysis capabilities, runtime checks and some other debugging features.

GCC Debug Library. GCC does have a debug version of its glibcxx library, which you can link in by using the /usr/lib/debug path. You can add linker options like "-L/usr/lib/debug" for static libraries, or "LD_LIBRARY_PATH=/usr/lib/debug" for dynamic libraries.

But that's not really what I'm talking about. This is a version which has the symbolic information retained for better debugging, rather than a version which offers additional safety and debugging features. But GCC also has one of thoese!

GCC has a debug-enabled version of libstdc++ with additional error checking enabled. Interestingly, this is implemented via a "wrapper model" on top of the production versions of the standard library. The GCC debug library with error checking is _GLIBCXX_DEBUG and there is also a flag _GLIBCXX_ASSERTIONS.

Clang debug capabilities. Clang has been following GCC with a lot of common features. For example, Clang has options to launch a variety of sanitizer tools at runtime (e.g., ASan, UBSan, TSan, MSan, etc.), and has the Clang Static Analyzer and clang-tidy for source code analysis.

Safe Standard Libraries

This section is a list of bugs and undefined behaviors that could probably be detected by a linkable debug version of the standard libraries. This could mean:

  • Macro-intercepted debug wrapper library, or
  • Vendor offered debug versions of the standard library.

The standard library could perform a lot more internal self-tests to detect and prevent serious internal errors. This could be done better by compiler vendors, but a lot could also be done by users with a debug wrapper library based on macro or link-time intercepts.

For error detection and prevention, I'm thinking in terms of what could be done quickly, just by testing a few flags or magic values, rather than a full-scale memory debugging library (i.e., not to the extent of valgrind or other sanitizers). A lot of internal errors could be changed from crashes to harmless logged warnings.

Error preventions. The standard library is in a position to prevent several categories of memory errors. For example, why don't malloc and new clear their memory to zero? Similarly, the alloca function could clear stack memory. This would prevent a whole range of "uninitialized memory use" errors. Surely, this wouldn't be very slow, or it could be offered as an option to users of the library.

Memory errors. The heap management libraries need to have a hidden block of data for each allocated memory block, anyway. So, add some more flags and magic values in there, which are then checked by all of the primitives trying to access a memory block. Many of the usual suspects could be caught relatively simply:

  • Double de-allocation
  • Non-heap address for de-allocation
  • Mistmached allocation/de-allocations

Detecting usage of uninitialized memory from malloc or new is less simple, because although you can set a flag to say "newly allocated block," it's harder to know if a simple array access has written to the memory block, thereby initializing it. However, this is also doable with reasonable efficiency via a single magic value in the first few bytes of a block.

Memory overruns are harder to detect only in a library. However, there are two ways to detect this:

  • Add canary memory ranges at the end of the allocated blocks (i.e.,a magic value just a few bytes afterwards).
  • Intercept the standard C++ library byte manipulation library calls (e.g., memset, strcpy, etc.)

Many of the techniques that can be used by vendor standard libraries are discussed in the chapter on debug wrapper libraries. It's the same problem.

File pointers and file descriptors. All of the file descriptors (integers), file pointer structures, or fstream classes manage a block of memory for an open file, which could contain bit flags of its status (e.g., open, just written, just read, closed, etc.). Furthermore, these file data structures are only used or modified via standard library primitives, so a debug version of a standard library could surely detect most errors. Also, the pointers for these file pointers are usually within a fixed-size array, so it's two pointer comparisons to validate the file pointer is inside this block. And finally, the most recent type of file operation could be tracked in a bit flag (e.g., recently read, recently written, recently seeked).

It seems like these file blocks could be self-tested on any file access. Hence, my list of file-related errors and undefined behaviors that could be (a) detected, and (b) made harmless, includes:

  • Null file pointer value.
  • Invalid file descriptor (or file pointer or fstream) is outside the range of valid pointer addresses, or an invalid integer file handle.
  • File descriptor is a "never-opened" file (i.e., does not have an "opened" or "closed" bit flag set).
  • Double-close (i.e., the "closed" bit flag is already set).
  • Read/write/seek operation on already-closed file.
  • Read/write sequence without intervening seek on file pointer (a weird "undefined behavior").
  • fflush on an input file (e.g., fflush(stdin) is a common mistake).

All of these file-related errors would become suddenly harmless. I've certainly had crashes myself from "double fclose" errors. These are fixable at the performance cost of a few bit flag and pointer comparisons. Worth doing!

Going further, this debug file management library could have trace capabilities, so you can view not only the above serious internal errors, but also application-level happenings such as what files are opened and closed. This could already record "warnings" for things like "file not found" or read or write failed to read/write enough bytes. As such, odd file occurrences could get telegraphed to the developer earlier.

Character functions. Various functions could be more carefully implemented to check errors and tolerate unusual usage. Examples include:

  • Character category function arguments (e.g., islower) should warn and tolerate out-of-range values (e.g., more than 255) and consistently handle negative values (i.e., from signed character types).
  • Character category function return values limited to Boolean status (e.g., islower should return only true or false, rather than zero versus non-zero).

String functions. There are various failures that could be detected in libraries.

  • Memory addresses — the standard string functions, such as strcpy, should be incorporated into the memory safety checking.
  • Standard return values — e.g., strcmp should return explicitly -1, 0, or 1, rather than only requiring less-than, equal-to, or greater-than zero (this makes harmless a common misuse).

Math functions. At first thought, it seems that there's not that many errors that can be auto-detected by changes to the standard mathematical libraries, without needing changes to the C++ operators. However, I came up with a few that can be done just inside the functions themselves:

  • Degrees versus radians — cos(60.0) is confusing radians and degrees. The library could warn if any trigonometric function is using large values that are unusual for radians, or degree-like exact integer values (30, 45, 60, 90, etc.)
  • Any errno-setting happening in the math library could issue or log a warning, rather than relying on the caller to check errno.
  • Any arithmetic function that returns a result NaN could log a warning.
  • Invalid ranges of arguments could also log a warning and be made harmless (e.g., division by zero, or infinite results like tan(0.0) is one).

Container classes. The various standard C++ container classes could improve their quality, such as:

  • Out-of-bounds array accesses on std::vector should be expressly handled with warning and make-harmless processing.
  • Tracing and logging of uncommon issue, such as std::vector automatically resizing itself (a common slug).

Extra Builtin Functions for Debugging

It's somewhat difficult to build your own debug library, whether macro-intercepted or linked, because some things are hidden in the compiler implementation layer. Some API primitives that would be useful if provided by compiler vendors include:

  • Address categorization — is this address on the stack? In the heap? Global? Read-only data? String literal?
  • Address is an allocated block? — valid heap block start address? Inside a valid block? Which block?
  • Address of the start of an allocated block, if given an address inside it.
  • Size of the allocated memory block, if given the start address of a heap block.
  • Address of start of a stack block given an address inside.
  • Size of a stack block if given the starting stack block address.
  • Heap statistics — size of the heap, remaining free memory, block counts, etc.
  • Callbacks that are callable on various internal events (e.g., memory allocation, de-allocation, various primitives called, etc.).

Note that some of these could be offered in two or more versions, with a slower and faster option. Fast memory address checking could be based on a magic value or other trick. Slower memory address checking would, for example, scan the entire heap to confirm it's a valid address.

These would be valuable for users to create their own debug library infrastructure. It would also be valuable for coding policies aimed at memory safety, whereby the user could decide whether to incur the runtime cost of checking a memory address before using it.

Some of these issues are flat out impossible to do in any C++ platforms. A few of them are possible in a very non-portable way. It seems to me that a compiler vendor could offer a lot of these functions with a relatively low amount of work, because they have a great deal more information available behind the scenes.

 

Online: Table of Contents

PDF: Free PDF book download

Buy: Safe C++: Fixing Memory Safety Issues

Safe C++ Safe C++: Fixing Memory Safety Issues:
  • The memory safety debate
  • Memory and non-memory safety
  • Pragmatic approach to safe C++
  • Rust versus C++
  • DIY memory safety methods
  • Safe standard C++ library

Get it from Amazon: Safe C++: Fixing Memory Safety Issues