Aussie AI

Chapter 15. Safety Wrapper Functions

Book Excerpt from "Safe C++: Fixing Memory Safety Issues"

by David Spuler

Why Use Wrapper Functions?

The idea of debug wrapper functions is to fill a small gap in the self-checking available in the C++ ecosystem. There are two types of self-testing that happen when you run C++ programs:

Self-tests such as error return checks, assertions, and wrappers in the main C++ code.
valgrind or sanitizer detection of numerous run-time errors.

Both of these methods are highly capable and will catch a lot of bugs. To optimize your use of these capabilities in debugging, you should:

Test all error return codes (e.g., a fancy macro method), and
Run valgrind and/or other sanitizers on lots of unit tests and regression tests in your CI/CD approval process, or, when that gets too slow, at least in the nightly builds.

But this is not perfection! But there’s two main reasons that some bugs will be missed:

Self-testing doesn’t detect all the bugs.
You have to remember to run sanitizers on your code.

Okay, so I’m joking about “remembering” to run the debug tests, because you’ve probably got them running automatically in your build. But there’s some real cases where the application won’t ever be run in debug mode:

Many internal failures trigger no visible symptoms for users (silent failures).
Customers cannot run valgrind on their premises (unless you ask nicely).
Your website “customers” also cannot run it on the website backends.
Some applications are too costly to re-run just to debug an obscure error (I’m looking at you, AI training).

Hence, in the first case, there’s bugs missed in total silence, never to be fixed. And in the latter cases, there’s a complex level of indirection between the failure occurring and the C++ programmer trying to reproduce it in the test lab. It’s much easier if your application self-diagnoses the error!

Fast Debug Wrapper Code

But it’s too slow, I hear you say. Running the code with valgrind or other runtime memory checkers is much slower than without. We can’t ship an executable where the application has so much debug instrumentation that they’re running that much slower.

You’re not wrong, and it’s the age-old quandary about whether to ship testing code. Fortunately, there are a few solutions:

Use fast self-testing tricks like magic numbers in memory.
Have a command-line flag or config option that turns debug tests on and off at runtime.
Have “fast” and “debug” versions of your executable (e.g., ship both to beta customers).

At the very least, you could have a lot of your internal C++ code development and QA testing done on the debug wrapper version that self-detects and reports internal errors.

As the first point states, there are “layers” of debugging wrappers (also ogres, like Shrek). You can define very fast or very slow types of self-checking code into debug wrapper code. These self-tests can be as simple as parameter null tests or as complex as detecting memory stomp overwrites with your own custom code. In approximate order of time cost, here are some ideas:

Parameter basic validation (e.g., null pointer tests).
Magic values added to the initial bytes of uninitialized and freed memory blocks.
Magic values stored in every byte of these blocks.
Tracking 1 or 2 (or 3) of the most recently allocated/freed addresses.
Hash tables to track addresses of every allocated or freed memory block.

I’ve actually done all of the above for a debug library in standard C++. Make sure you check the Aussie AI website to see when it gets released.

Wrapping Memory Functions

You can use macros to intercept various standard C++ functions. For example, here’s a simple interception of malloc:

    // intercept malloc
    #undef malloc
    #define malloc aussie_malloc
    void*aussie_malloc(int sz);

Once intercepted, the wrapper code can perform simple validation tests of the various parameters. Here’s a simple wrapper for the malloc function in a debug library for C++ that I’m working on:

    void *aussie_malloc(int sz)
    {
        // Debug wrapper version: malloc() 
        AUSSIE_DEBUGLIB_TRACE("malloc called");
        AUSSIE_DEBUG_PRINTF("%s: == ENTRY malloc === sz=%d\n", 
             __func__, sz);

        g_aussie_malloc_count++;
        AUSSIE_CHECK(sz != 0, "AUS007", "malloc size is zero");
        AUSSIE_CHECK(sz >= 0, "AUS008", "malloc size is negative");

        // Call the real malloc
        void *new_v = NULL;
        new_v = malloc(sz);
        if (new_v == NULL) {
                AUSSIE_ERROR("AUS200", "ERROR: malloc failure");
                // Try to keep going?
        } 
        return new_v;
    }

This actually has multiple levels of tests:

Validation of called parameter values.
Detection of memory allocation failure.
Builtin debug tracing macros that can be enabled.

A more advanced version could also attempt to check pointer addresses are valid and have not been previously freed, and a variety of other memory errors. Coming soon!

Standard C++ Debug Wrapper Functions

It can be helpful during debugging to wrap several standard C++ library function calls with your own versions, so as to add additional parameter validation and self-checking code. Some of the functions which you might consider wrapping include:

malloc
calloc
memset
memcpy
memcmp

If you’re doing string operations in your code, you might consider wrapping these:

strdup
strcmp
strcpy
sprintf

Note that you can wrap the C++ “new” and “delete” operators at the linker level by defining your own versions, but not as macro intercepts. You can also intercept the “new[]” and “delete[]” array allocation versions at link-time.

There are different approaches to consider when wrapping system calls, which we examine using memset as an example:

Leave “memset” calls in your code (auto-intercepts)
Use “memset_wrapper” in your code instead (manual intercepts)

Macro auto-intercepts: You might want to leave your code unchanged using memset. To leave “memset” in your code, but have it automatically call “memset_wrapper” you can use a macro intercept in a header file.

    #undef memset // ensure no prior definition
    #define memset memset_wrapper  // Intercept

Note that you can also use preprocessor macros to add context information to the debug wrapper functions. For example, you could add extra parameters to “memset_wrapper” such as:

    #define memset(x,y,z)  memset_wrapper((x),(y),(z),__FILE__,__LINE__,__func__)

Note that in the above version, the macro parameters must be parenthesized even between commas, because there’s a C++ comma operator that could occur in a passed-in expression. Also note that these context macros (e.g., __FILE__) aren’t necessary if you have a C++ stack trace library, such as std::stacktrace, on your platform.

Variadic preprocessor macros: Note also that there is varargs support in C++ #define macros. If you want to track variable-argument functions like sprintf, printf, or fprintf, or other C++ overloaded functions, you can use “...” and “__VA_ARGS__” in preprocessor macros as follows.

    #define sprintf(fmt,...)  sprintf_wrapper((fmt),__FILE__,__LINE__,__func__, __VA_ARGS__ )

Manual Wrapping: Alternatively, you might want to individually change the calls to memset to call memset_wrapper without hiding it behind a macro. If you’d rather have to control whether or not the wrapper is called, then you can use both in the program, wrapped or non-wrapped. Or if you want them all changed, but want the intercept to be less hidden (e.g., later during code maintenance), then you might consider adding a helpful reminder instead:

    #undef memset
    #define memset dont_use_memset_please

This trick will give you a compilation error at every call to memset that hasn’t been changed to memset_wrapper.

Example: memset Wrapper Self-Checks

Here’s an example of what you can do in a wrapper function called “memset_wrapper” from one of the Aussie AI projects:

    void *memset_wrapper(void *dest, int val, int sz)  // Wrap memset
    {
        if (dest == NULL) {
                aussie_assert2(dest != NULL, "memset null dest");
                return NULL;
        }
        if (sz < 0) {
                // Why we have "int sz" not "size_t sz" above
                aussie_assert2(sz >= 0, "memset size negative");
                return dest;  // fail
        }
        if (sz == 0) {
                aussie_assert2(sz != 0, "memset zero size (reorder params?)");
                return dest;
        }
        if (sz <= sizeof(void*)) {
                // Suspiciously small size
                aussie_assert2(sz > sizeof(void*), "memset with sizeof array parameter?");
                // Allow it, keep going
        }
        if (val >= 256) {
                aussie_assert2(val < 256, "memset value not char");
                return dest; // fail
        }
        void* sret = ::memset(dest, val, sz);  // Call real one!
        return sret;
    }

It’s a judgement call whether or not to leave the debug wrappers in place, in the vein of speed versus safety. Do you prefer sprinting to make your flight, or arriving two hours early? Here’s one way to remove the wrapper functions completely with the preprocessor if you’ve been manually changing them to the wrapper names:

    #if DEBUG
        // Debug mode, leave wrappers..
    #else // Production (remove them all)
        #define memset_wrapper memset
        //... others
    #endif

Compile-time self-testing macro wrappers

Here’s an idea for combining the runtime debug wrapper function idea with some additional compile-time tests using static_assert.

    #define memset_wrapper(addr,ch,n) ( \
        static_assert(n != 0), \
        static_assert(ch == 0), \
        memset_wrapper((addr),(ch),(n),__FILE__,__LINE__,__func__))

The idea is interesting, but it doesn’t really work, because not all calls to the memset wrapper will have constant arguments for the character or the number of bytes, so the static_assert commands will fail in that case. You could use standard assertions, but this adds runtime cost. Note that it’s a self-referential macro, but that C++ guarantees it only gets expanded once (i.e., there’s no infinite recursion of preprocessor macros).

Generalized Self-Testing Debug Wrappers

The technique of debug wrappers can be extended to offer a variety of self-testing and debug capabilities. The types of messages that can be emitted by debug wrappers include:

Input parameter validation failures (e.g., non-null)
Failure returns (e.g., allocation failures)
Common error usages
Informational tracing messages
Statistical tracking (e.g., call counts)

Personally, I’ve built some quite extensive debug wrapping layers over the years. It always surprises me that this can be beneficial, because it would be easier if it were done fully by the standard libraries of compiler vendors. The level of debugging checks has been increasing significantly (e.g., in GCC), but I still find value in adding my own wrappers.

There are several major areas where you can really self-check for a lot of problems with runtime debug wrappers:

File operations
Memory allocation
String operations

Wrapping Math Functions

It might seem that it's not worth wrapping the mathematical functions, as their failures are rare. However, these are some things you can check:

errno is already set on entry.
errno is set afterwards (if not already set).
Function returns NaN.
Function returns negative zero.

Most of these can be implemented as a single integer test (e.g., errno) or as a bitwise trick on the underlying floating-point representation (e.g., convert float to an unsigned). There are also builtin library functions to detect floating-point categories such as NaN.

In this way, a set of math wrapper functions has automated a lot of your detection of common issues. These aren't as common as memory issue, but it's yet another way to move towards a safe C++ implementation.

Wrapping File Operations

Many of the file operations are done via function calls, and are a good candidate for debug wrapper functions. Examples of standard C++ functions that you could intercept include:

fopen, fread, fwrite, fseek, fclose
open, read, write, creat, close

Note that intercepting fstream operations in this way is not workable. They don't use a function-like syntax for file operations.

Using the approach of wrapping file operations can add error detection, error prevention, and tracing capabilities to these operations. Undefined situations and errors that can be auto-detected include:

File did not open (i.e., trace this).
Read or write failed or was truncated.
Read and write without intervening seek operation.

Link-Time Interception: new and delete

Macro interception works for C++ functions like the standard C++ functions like malloc and free, but you can’t macro-intercept the new and delete operators, because they don’t use function-like syntax. Fortunately, you can use link-time interception of these operators instead, simply by defining your own versions. This is a standard feature of C++ that has been long supported.

Note that defining class-level versions of the new and delete operators is a well-known optimization for a class to manage its own memory allocation pool, but this isn’t what we’re doing here. Instead, this link-time interception requires defining four operators at global scope:

new
new[]
delete
delete[]

You cannot use the real new and delete inside these link-time wrappers. They would get intercepted again, and you’d have infinite stack recursion.

However, you can call malloc and free instead, assuming they aren’t also macro-intercepted in this code. Here’s the simplest versions:

    void * operator new(size_t n)
    {
        return malloc(n);        
    }

    void* operator new[](size_t n)
    {
        return malloc(n);        
    }

    void operator delete(void* v)
    {
        free(v);
    }

    void operator delete[](void* v)
    {
        free(v);
    }

This method of link-time interception is an officially sanctioned standard C++ language feature since the 1990s. Be careful, though, that the return types and parameter types are precise, using size_t and void*, as you cannot use int or char*. Also, declaring these functions as inline gets a compilation warning, and is presumably ignored by the compiler, as this requires link-time interception.

Here’s an example of some ideas of some basic possible checks:

    #define AUSSIE_ERROR(mesg, ...) \
        ( printf((mesg) __VA_OPT__(,) __VA_ARGS__ ) )

    void * operator new(size_t n)
    {
        if (n == 0) {
            AUSSIE_ERROR("new operator size is zero\n");
        }
        void *v = malloc(n);        
        if (v == NULL) {
            AUSSIE_ERROR("new operator: allocation failure\n");
        }        
        return v;
    }

Note that you can’t use __FILE__ or __LINE__ as these are link-time intercepts, not macros. Maybe you could use std::backtrace instead, but I have my doubts.

Destructor Problems with Debug Wrappers

The use of a debug wrapper library can be very valuable. However, there are a few problematic areas:

Destructors should not throw an exception.
Destructors should not call exit or abort.
Destructor issues with assert.

Any of these happenstances can trigger an infinite loop situation. Exception handlers can trigger destructors, which in turn trigger exceptions again. Exiting or aborting in a destructor may trigger global variable destruction, which calls the same destructor, which tries to exit or abort again. Be careful of the system assert macro inside destructors, because it's a hidden call to abort if it fails.

Although these infinite-looping problems are serious, it would seem that these are minor issues to add to your coding standards: don't do these things inside a destructor. However, we're talking about debug wrapper libraries, rather than explicit calls, and destructors often have need to:

De-allocate memory
Close files

Both of these tasks are often intercepted by debug wrapper libraries, whether macro-intercepted or at link-time. Hence, the issue we have is that any failure detected by the debug wrapper code may trigger one of the above disallowed calls, depending on our policy for handling a detected failure.

Unfortunately, I'm not aware of an API that checks if "I'm running a destructor" in C++. Hence, it's hard for the debug library to address this issue itself. There are a few mitigations you can use in coding destructors:

Recursive re-entry detection inside destructors using a static local variable.
Modify the debug library's error handling flags on entry and exit of a destructor
Have global flags called "I'm exiting" or "I'm failing" that are checked by all your destructors, in which case it should probably do nothing.

Alternatively, you could manage your own global flag "I'm in a destructor" in every destructor function. More accurately, this is not a flag, but a counter of destructor depth. This flag or counter is then checked by the debug library to check if it's in a destructor before it throws an exception, exits, or aborts.

But I'm not sure what the debug library should do instead? Maybe it can itself set a global flag saying "I want to exit soon" and then it will later detect this flag is set on the next intercepted call to the debug library, provided that it's not still inside a destructor. Perhaps your application's main processing loop could regularly check with the debug library whether it wants to quit, by just checking that global variable often.

Ugh! None of that sounds workable.

A better plan is probably that your debugging library wrapper functions should never throw an exception, exit, abort, or use the builtin system assert function, because it can't ever be sure it's not inside a destructor. Instead, report errors and log errors in another way, but try to keep going, which is a good idea anyway.

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: Safe C++: Fixing Memory Safety Issues