Aussie AI

Chapter 14. CUDA Assertions

Book Excerpt from "CUDA C++ Debugging: Safer GPU Kernel Programming"

by David Spuler

Why Use Assertions?

Of all the self-testing code techniques, my favorite one is definitely assertions. They’re just so easy to add! The use of assertions in CUDA C++ programs can be a very valuable part of improving the quality of your work over the long term. They ensure that you find bugs early in the life cycle of code, and they don’t have much impact on performance (if used correctly). I find them especially useful in getting rid of obvious glitches when I’m writing new code, but then I usually leave them in there.

The standard C++ library has had an “assert” macro since back when it was called C. In CUDA C++, things are a little more complex, because there are two aspects of using assertions:

Device code assertions — use the assert builtin function or printf.
Host code assertions — lots of options!

The use of assertions in kernel C++ code is very limited, but host code runs on standard C++ compilers on the CPU, so you can use the many available techniques for host platforms.

The simplest idea is just to use the builtin assert macro, which works in both device and kernel code. The assert macro is a convenient method of performing simple tests. The basic usage is illustrated to validate the inputs of a simple vector kernel:

    #include <assert.h>
    ...
    __device__ vector_sum(float v[], int n)
    {
        assert(v != NULL);  // Easy!
        // ... etc
    }

Compile-Time Assertions: static_assert

Runtime assertions have been a staple of C++ code reliability since the beginning of time. However, there’s often been a disagreement over whether or not to leave the assertions in production code, because they inherently slow things down.

The modern answer to this conundrum is the C++ “static_assert” directive. This is like a runtime assertion, but it is fully evaluated at compile-time, so it’s super-fast. Failure of the assertion triggers a compile-time error, preventing execution, and the code completely disappears at run-time.

Unfortunately, there really aren’t that many things you can assert at compile-time. Most computations are dynamic and stored in variables at runtime. However, the static_assert statement can be useful for things like blocking inappropriate use of template instantiation code, or for portability checking such as:

    static_assert(sizeof(float) == 4, "float is not 32 bits");

This statement is an elegant and language-standardized method to prevent compilation on a platform where a “float” data type is 64-bits, alerting you to a portability problem.

Device code assertions

There are two basic methods to implement assertions on device code:

CUDA assert primitive
Custom macro with printf

Note that you can also use static_assert in both host and device code, assuming you have a compile-time condition (e.g., const or constexpr result).

CUDA assert method for devices. Since compute capability 2.0, there has been an “assert” primitive in the CUDA runtime library that works on devices. It’s in the C++ Programmer Guide, with declaration:

    void assert(int expression);

If the assertion is successful with a non-zero expression value, nothing happens. If the assertion fails with a zero value, there are a few effects:

Thread termination for all threads where it fails.
Sets the cudaErrorAssert return error code (value 710).
Error message printed to standard error (after kernel completion).
Next CUDA call with synchronization with fail with error cudaErrorAssert

Here’s an example of an assertion failure message:

    aussie-clear-vector-test-assertions.cu:40: 
    void aussie_clear_vector_kernel_basic(float *, int): block: [0,0,0], thread: [5,0,0] 
    Assertion `id < 5` failed.

An important point is that the assertion failure message does not appear immediately. Rather, the assertion message appears when the kernel has finished, at the next synchronization with the host code. This behavior is the same as the built-in printf function, when executed in kernel code. It gets buffered until the CPU is ready to print it out. However, note that printf output goes to stdout, whereas assertion failures print to stderr.

I’m not sure if assertion messages are printed if the host code on the CPU has already exited (i.e., didn’t wait to synchronize). The behavior for buffered device printf messages is they are discarded in this situation, so maybe assertions use the same mechanism.

Removing device assertions from production code. This method is very similar to the original standard C assertions, which were declared in <assert.h>. As with the old-school C assertions, you can remove CUDA assertions from device code by defining the “NDEBUG” preprocessor macro at compile-time. Hence, the production build needs re-compilation of all CUDA C++ source files, not just re-linking.

Should you leave assertions in production code? There’s a school of thought that it’s worth the expense of extra assertion checking to get the supportability benefits of having your users finding your bugs for you. However, CUDA kernels are probably not the right place for this idea, since efficiency is critical in these code sections, but you might want to consider this policy for host code.

Custom printf assertions for device code. Since printf statements are allowed in device code, you can also declare your own custom assertion macro. For example, you might want an “assert warning” macro that doesn’t abort the thread, or perhaps have a more graceful shutdown of the kernel in some way.

On the other hand, a custom device assertion is not really the preferred method in general, because CUDA assert failures are more properly handled, and meshed into the CUDA return code handling. If a CUDA device assertion fails, the error code cudaErrorAssert (710) is returned by the next synchronization primitive on the host.

Note that you can’t even use fprintf in device code, but only printf, so it’s hard to print to stderr. Here’s an error message for device code:

    aussie-clear-vector-test-assertions.cu(41): error: 
    calling a __host__ function("fprintf") from a __global__ function
    ("aussie_clear_vector_kernel_basic") is not allowed

If you want to define your own custom assertion macro for device code, make sure it has these features:

Prints a message (obviously)
Compiles to nothing if NDEBUG is declared.
Aborts the thread (optionally), such as by assert(0) or asm("trap;").

Assertions for both host and device code. If you want consistent assertion handling in both types of CUDA C++ code, there are a few options:

assert primitive
Custom assertion macro with printf

The builtin CUDA assert macro is a little idiosyncratic across device versus host code. For example, assert works fine in device code without any header include, but gets a compilation error in host code. I had to include <assert.h> for it to work in host code.

Note that it’s somewhat difficult to define your own custom assertion macro in a way that it works on both device and host code. For example, I don’t know of an easy way to get your custom assertion failure message to appear on stderr, since you’re limited to using printf on devices. You could launch your kernels in a subprocess and use freopen to redirect the file pointers, but that seems a bit extreme to me.

Note that trying to define two different versions of the same assertion macro on device versus host code is very difficult to do with the same macro name. There are at least two obstacles:

(a) you can’t use the __CUDA_ARCH__ macro to separate them, because this macro is actually undefined in host code, and

(b) nvcc is always in host compilation mode in header files.

You can, of course, declare two different assertion macros with different names for device and host C++ code.

    #define aussie_assert_HOST(cond)   // etc...
    #define aussie_assert_DEVICE(cond) // etc...

If you want to ensure they get used in the right code, just trigger compiler errors by putting fprintf in the host version, and __CUDA_ARCH__ in the device version. Actually, no, that idea of using __CUDA_ARCH__ to prevent misuse didn’t quite work, but you can instead use assertion macros that includes calls to wrong cross-mode functions, by declaring two “assertion failure” functions to call, which are declared as either __device__ or __host__.

Custom Assertion Macros

An important point about the default “assert” macro on both host and device code is that its failure handling may not be what you want. The default device code assertion failure will trigger a cudaErrorAssert CUDA Runtime error when the condition fails. And the default C++ assert macro on the host CPU code will literally crash your program by calling the standard “abort” function, which triggers a fatal exception on Windows or a core dump on Linux.

That is fine for debugging, but it isn’t usually what you want for production code. Hence, most professional C++ programmers declare their own custom assertion macros instead.

For example, here’s my own “aussie_assert” macro in my own header file:

    #define aussie_assert(cond) ( (cond) || aussie_assert_fail(#cond, __FILE__, __LINE__) )

This tricky macro uses the short-circuiting of the “||” operator, which has a meaning like “or-else”. So, think of it this way: the condition is true, or else we call the failure function. The effect is similar to an if-else statement, but an expression is cleaner in a macro.

The __FILE__ and __LINE__ preprocessor macros expand to the current filename and line number. The filename is a string constant, whereas the line number is an integer constant. The expression “#cond” is the “stringize” operator, which only works in preprocessor macros, and creates a string constant out of its argument.

Note that you can add “__func__” to also report the current function name if you wish. There’s also an older non-standard __FUNCTION__ version of the macro. Note that the need for all these macros goes away once there is widespread C++ support for std::stacktrace, as standardized in C++23, in which case a failing assertion could simply report its own call stack in an error message.

When Assertions Fail. This aussie_assert macro relies on a function that is called only when an assertion has failed. And the function has to have a dummy return type of “bool” so that it can be used as an operand of the || operator, whereas a “void” return type would give a compilation error. Hence, the declaration is:

    bool aussie_assert_fail(char* str, char* fname, int ln);  // Assertion failed

And here’s the definition of the function:

    bool aussie_assert_fail(char* str, char* fname, int ln)  
    {
        // Assertion failure has occurred...
        g_aussie_assert_failure_count++;
        printf("AUSSIE ASSERTION FAILURE: %s, %s:%d\n", str, fname, ln);
        return false;  // Always fails
    }

This assertion failure function must always return “false” so that the assertion macro can be used in an if-statement condition.

Assertion Failure Extra Message

The typical assertion macro will report a stringized version of the condition argument (i.e., #cond is the special stringize operator), plus the source code filename, line number, and function name. This can be a little cryptic, so a more human-friendly extra message is often added. The longstanding hack to do this has been:

    aussie_assert(fp != NULL && "File open failed");   // Works

The trick is that a string constant has a non-null address, so && on a string constant is like doing “and true” (and is hopefully optimized out). This gives the extra message in the assertion failure because the string constant is stringized into the condition (although you’ll also see the “&&” and the double quotes, too). Note that an attempt to be tricky with comma operator fails:

    aussie_assert(fp != NULL, "File open failed");   // Bug

There are two problems. Firstly, it doesn’t compile because it’s not the comma operator, but two arguments to the aussie_assert macro. Even if this worked, or if we wrapped it in double-parentheses, there’s a runtime problem: this assertion condition will never fail. The result of the comma operator is the string literal address, which is never false.

Optional Assertion Failure Extra Message: The above hacks motivate us to see if we could allow an optional second parameter to assertions. We need two usages, similar to how “static_assert” currently works in C++:

    aussie_assert(fp != NULL);
    aussie_assert(fp != NULL, "File open failed");

Obviously, we can do this if “aussie_assert” was a function, using basic C++ function default arguments or function overloading. If you have faith in your C++ compiler, just declare the functions “inline” and go get lunch. But if we don’t want to call a function just to check a condition, we can also use C++ variadic macros.

Variadic Macro Assertions

C++ allows #define preprocessor macros to have variable arguments using the “...” and “__VA_ARG__” special tokens. Our aussie_assert macro changes to:

    #define aussie_assert(cond, ...) \
       ( (cond) || \
          aussie_assert_fail(#cond, __FILE__, __LINE__, __VA__ARG__) )

And we change our “aussie_assert_fail” to have an extra optional “message” parameter.

    bool aussie_assert_fail(char* str, char* fname, int ln, char *mesg=0);

This all works fine if the aussie_assert macro has 2 arguments (condition and extra message) but we get a bizarre compilation error if we omit the extra message (i.e., just a basic assertion with a condition). The problem is that __VA_ARG__ expands to nothing (because there’s no optional extra message argument), and the replacement text then has an extra “,” just hanging there at the end of the argument list, causing a syntax error.

Fortunately, the deities who define C++ standards noticed this problem and added a solution in C++17. There’s a dare-I-say “hackish” way to fix it with the __VA__OPT__ special token. This is a special token whose only purpose is to disappear along with its arguments if there’s zero arguments to __VA_ARG__ (i.e., it takes the ball and goes home if there’s no-one else to play with). Hence, we can hide the comma from the syntax parser by putting it inside __VA_OPT__ parentheses. The final version becomes:

    #define aussie_assert(cond, ...) \
       ( (cond) || \
          aussie_assert_fail(#cond, __FILE__, __LINE__ \
              __VA_OPT__(,) __VA__ARG__) )

Note that the comma after __LINE__ is now inside of a __VA_OPT__ special macro. Actually, that’s not the final, final version. We really should add “__func__” in there, too, to report the function name. Heck, why not add __DATE__ and __TIME__ while we’re at it? Why isn’t there a standard __DEVELOPER__ macro that adds my name?

Assertless Production Code

Not everyone likes assertions, and coincidentally some people wear sweaters with reindeer on them. If you want to compile out all of the assertions from the production code, you can use this:

   #define aussie_assert(cond)  // nothing

But this is not perfect, and has an insidious bug that occurs rarely (if you forget the semicolon). A more professional version is to use “0” and this works by itself, but even better is a “0” that has been typecast to type “void” so it cannot be accidentally used in any expression:

    #define aussie_assert(cond) ( (void)0 )

The method to remove calls to the aussie_assert variadic macro version uses the “...” token:

    #define aussie_assert(cond, ...) ( (void)0 )

Personally, I don’t recommend doing this at all, as I think that assertions should be left in the production code for improved supportability. I mean, come on, recycle and reuse, remember? Far too many perfectly good assertions get sent to landfill every year.

Assertion Return Value Usage

Some programmers like to use an assertion style that tests the return code of their assert macro:

    if (assert(ptr != NULL)) {  // Risky
        // Normal code
        ptr->count++;
    }
    else {
        // Assertion failed
    }

This assertion style can be used if you like it, but I don’t particularly recommend it, because it has a few risks:

1. The hidden assert failure function must return “false” so that “if” test fails when the assertion fails.

2. Embedding assertions deeply into the main code expressions increases the temptation to use side effects like “++” in the condition, which can quietly disappear if you ever remove the assertions from a production build:

    if (assert(++i >= 0)) { // Risky
        // ...
    }

3. The usual assertion removal method of “((void)0)” will fail with compilation errors in an if statement. Also using a dummy replacement value of “0” is incorrect, and even “1” is not a great option, since the “if(assert(ptr!=NULL))” test becomes the unsafe “if(1)”. A safer removal method is a macro:

    #define assert(cond) (cond)

Or you can use an inline function:

    inline void assert(bool cond) { } // Empty

This avoids crashes, but may still leave debug code running (i.e., a slug, not a bug). It relies on the optimizer to remove any assertions that are not inside an “if” condition, which just leave a null-effect condition sitting there. Note also that this removal method with “(cond)” is also safer because keeping the condition also retains any side-effects in that condition (i.e., the optimizer won’t remove those!).

Generalized Assertions

Once you’ve used assertions for a while, they begin to annoy you a little bit. They can fail a lot, especially during initial module development and unit testing of new code. And that’s the first time they get irritating, because the assertion failure reports don’t actually give you enough information to help debug the problem. However, you can set a breakpoint on the assertion failure code when running in cuda-gdb, so that’s usually good enough.

The second time that assertions are annoying is when you ship the product. That’s when you see assertion failures in the logs as an annoying reminder of your own imperfections. Again, there’s often not enough information to reproduce the bug.

So, for your own sanity, and for improved supportability, consider extending your own assertion library into a kind of simplified unit-testing library. The extensions you should consider:

Add std::stacktrace capabilities if you can, or use Boost Stacktrace or GCC backtrace as a backup. Printing the whole stack trace on an assertion failure is a win.
Add extra assertion messages as a second argument.
Add __func__ to show the function name, if you haven’t already.

And you can also generalize assertions to cover some other common code failings.

Unreachable code assertion
“Null pointer” assertion
Integer value assertions
Floating-point value assertions
Range value assertions

Creating specialized assertion macros for these special cases also means the error messages become more specific.

Unreachable code assertion

This is an assertion failure that triggers when code that should be unreachable actually got executed somehow. The simple way that programmers have done this in the past is:

   aussie_assert(0);  // unreachable

And you can finesse that a little with just a better name:

    #define aussie_assert_not_reached()   ( aussie_assert(false) )
    ...
    aussie_assert_not_reached(); // unreachable

Here’s a nicer version with a better error message:

    #define aussie_assert_not_reached() \
        ( aussie_assert_fail("Unreachable code was reached", __FILE__, __LINE__) )

Once-only execution assertion

Want to ensure that code is never executed twice? A function that should only ever be called once? Here’s an idea for an assertion that triggers on the second execution of a code pathway, by using its own hidden “static” call counter local variable (only works in host code):

    #define aussie_assert_once()  do { \
        static int s_count = 0; \
            ++s_count; \
            if (s_count > 1) { \
            aussie_assert_fail("Code executed twice", \
                    __FILE__, __LINE__); \
            } \
       } while(0)

Restricting any block of code to once-only execution is as simple as adding a statement like this:

    aussie_assert_once();   // Not twice!

This can be added at the start of a function, or inside any if statement or else clause, or at the top of a loop body (although why is it coded as a loop if you only want it executed once?). Note that this macro won’t detect the case where the code is never executed. Also note that you could customize this macro to return an error code, or throw a different type of exception, or other exception handling method when it detects double-executed code.

Function Call Counting

The idea of once-only code assertions can be generalized to a count. For example, if you want to ensure a function isn’t called too many times, use this code:

    aussie_assert_N_times(1000);

Here’s the macro, similar to aussie_assert_once, but with a parameter:

    #define aussie_assert_N_times(ntimes)  do { \
            static int s_count = 0; \
                ++s_count; \
                if (s_count > (ntimes)) { \
                aussie_assert_fail( \
               "Code executed more than " #ntimes " times", \
                                __FILE__, __LINE__); \
                } \
       } while(0)

This checks for too many invocations of the code block. Checking for “too few” is a little trickier, and would need a static smart counter object with a destructor. Again, this only works in host code, as we don’t have static local parameters in device code, and we’d need some other approach.

Detecting Spinning Loops

Note that the above call-counting macro doesn’t work for checking that a loop isn’t spinning. It might seem that we can use the above macro at the top of the loop body to avoid a loop iterating more than 1,000 times. But it doesn’t work, because it will count multiple times that the loop is entered, not just a single time. If we want to track a loop call count, the counter should not be a “static” variable, and it’s more difficult to do in a macro. The simplest method is to hand-code the test:

    int loopcount = 0;
    while (...) {
        if (++loopcount > 1000) {  // Spinning?
            // Warn...
        }
    }

The upside of using a simple approach: this should work on both device and host code.

Generalized Variable-Value Assertions

Various generalized assertion macros can not only check values of variables, but also print out the value when the assertion fails. The basic method doesn’t print out the variable’s value:

    aussie_assert(n == 10);

A better way is:

    aussie_assertieq(n, 10);  // n == 10

The assertion macro looks like:

   #define aussie_assertieq(x,y) \
        (( (x) == (y)) || \
         aussie_assert_fail_int(#x "==" #y, \
             (x), "==", (y), \
            __FILE__, __LINE__))

The assertion failure function has extra parameters for the variables and operator string:

    bool aussie_assert_fail_int(char* str, int x, char *opstr, int y, char* fname, int ln)  
    {
        // Assert failure has occurred...
        g_aussie_assert_failure_count++;
        fprintf(stderr, "AUSSIE INT ASSERT FAILURE: %s, %d %s %d, %s:%d\n", str, x, opstr, y, fname, ln);
        return false;  // Always fails
    }

If you don’t mind lots of assertion macros with similar names, then you can define named versions for each operator, such as:

aussie_assertneq — !=
aussie_assertgtr — >
aussie_assertgeq — >=
aussie_assertlss — <
aussie_assertleq — <=

If you don’t mind ugly syntax, you can generalize this to specify an operator as a parameter:

   aussie_assertiop(n, ==, 10);

The macro with an “op” parameter is:

   #define aussie_assertiop(x, op, y) \
        (( (x) op (y)) || \
         aussie_assert_fail_int(#x #op #y, \
             (x), #op, (y), \
            __FILE__, __LINE__))

And finally, you have to duplicate all of this to change from int to float type variables. For example, there’s macros named “aussie_assertfeq”, “aussie_assertfop”, and a failure function named “aussie_assert_fail_float”. There’s probably a fancy way to avoid this using function overloading or C++ templates and compile-time type traits, but only if you’re smarter than me.

Assertions for Function Parameter Validation

Assertions and toleration of exceptions have some tricky overlaps. Consider the modified version of vector summation with my own “aussie_assert” macro instead:

    __device__ float vector_sum(float v[], int n)
    {
        aussie_assert(v != NULL);
        // etc..
    }

What happens when this assertion fails in a custom assertion macro? In both host and device code, the execution will progress after the assertion, in which case any use of v will be a null pointer dereference. The code is not very resilient.

Hence, the above code works fine only if your custom “aussie_assert” assertion macro throws an exception on the host. This doesn’t work on the host, but your custom macro could call the builtin assert primitive on the device. This requires that you have a robust exception handling mechanism in place above it, for the caught exception on the host, or the cudaErrorAssert code from the device, which is a significant amount of work.

The alternative is to both assert and handle the error in the same place, which makes for a complex block of code:

   aussie_assert(v != NULL); 
   if (v == NULL) {
        return 0.0;  // Tolerate
    }

Slightly more micro-efficient is to only test once:

    if (v == NULL) {
        aussie_assert(v != NULL); // Always triggers
        return 0.0;  // Tolerate
    }

This is a lot of code that can get repeated all over the place. Various copy-paste coding errors are inevitable.

Assert Parameter and Return

An improved solution is an assertion macro that captures the logic “check parameter and return zero” in one place. Such a macro first tests a function parameter and if it fails, the macro will not only emit an assertion failure message, but will also tolerate the error by returning a specified default value from the function.

Here’s a generic version for any condition:

    #define aussie_assert_and_return(cond,retval) \
        if (cond) {} else { \
            aussie_assert_fail(#cond " == NULL", __FILE__, __LINE__); \
            return (retval); \
        }

The usage of this function is:

    float aussie_vector_something(float v[], int n) 
    {
            aussie_assert_and_return(v != NULL, 0.0f);
        ...
    }

The above version works for any condition. Here’s another version specifically for testing an incoming function parameter for a NULL value:

    #define aussie_assert_param_tolerate_null(var,retval) \
        if ((var) != NULL) {} else { \
            aussie_assert_fail(#var " == NULL", __FILE__, __LINE__); \
            return (retval); \
        }

The usage of this function is:

    aussie_assert_param_tolerate_null(v, 0.0f);

If you want to be picky, a slightly better version wraps the “if-else” logic inside a “do-while(0)” trick. This is a well-known trick to make a macro act more function-like in all statement structures.

    #define aussie_assert_param_tolerate_null2(var,retval) \
        do { if ((var) != NULL) {} else { \
            aussie_assert_fail(#var " == NULL", __FILE__, __LINE__); \
            return (retval); \
        }} while(0)

The idea of this macro is to avoid lots of parameter-checking boilerplate that will be laborious and error-prone. But it’s also an odd style to hide a return statement inside a function-like preprocessor macro, so this is not a method that will suit everyone.

Next-Level Assertion Extensions

Here are some final thoughts on how to further improve your assertions:

Change any often-triggered assertions into proper error messages with fault tolerance. Users don’t like seeing assertion messages. They’re kind of like gibberish to ordinary mortals.
Add extra context information in the assertion message (i.e., add an extra information string). This is much easier to read than a stringized expression, filename with line number, or multi-line stack trace.
Add unique codes to assertion messages for increased supportability. Although, maybe not, because any assertion that’s triggering often enough to need a code, probably shouldn’t remain an assertion!
inline assertion function? Why use macros? Maybe these assertions should instead be an inline function in modern C++? And it could report context using std::backtrace. All I can say is that old habits die hard, and I still don’t trust the optimizer to actually optimize much.

The downside of assertions is mainly that they make you lazy as a programmer because they’re so easy to add. But sometimes no matter how good they seem, you have to throw an assertion into the fires of Mordor. The pitfalls include:

Don’t use assertions instead of user input validation.
Don’t use assertions to check program configurations.
Don’t use assertions as unit tests (it works, but bypasses the test harness statistics).
Don’t use assertions to check if a file opened.

You need to step up and code the checks of input and configurations as part of proper exception handling. For example, it has to check the values, and then emit a useful error code if they’ve failed, and ideally it’s got a unique error code as part of the message, so that users can give a code to support if they need. You really don’t want users to see the dirty laundry of an assertion message with its source file, function name, and line number.

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: CUDA C++ Debugging: Safer GPU Kernel Programming

The new CUDA C++ Debugging book:

Debugging CUDA C++ kernels
Tools & techniques
Self-testing & reliability
Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging