Aussie AI

Chapter 7. Non-Memory Safety Issues

Book Excerpt from "Safe C++: Fixing Memory Safety Issues"

by David Spuler

The Other 30%

If Microsoft and Google both report that 70% of issues are related to memory safety problems, that means there are 30% that are not. What type of non-memory errors are problematic? Some of the main examples include:

Logic errors (i.e., simple programming mistakes and other reasons).
Arithmetic overflow and underflow
File I/O errors
Multithreading concurrency problems (e.g., race conditions).

Some of these are language-specific in relation to C++ syntax, whereas others arise in all programming languages. Multithreaded code or other types of concurrency is simply an order-of-magnitude more complex for programmers than sequential coding. Programmers are also prone to making all sorts of dumb mistakes or simple misunderstandings within the algorithm they are coding.

Code Blindness and Copy-Paste Errors

Serious errors in C++ software don't need to come from intrinsic properties of the programming language. Nor are they all related to memory safety or other undefined arithmetic issues. There can also be simple logic errors arising from programmer fallibility. They are also very common in any use of "copy-paste" in coding.

There are many programming idioms that are commonly used by programmers and yet carry the risk of occasional serious errors. One of the main ways these errors get introduced is "copy-paste" of a block of code.

For example, one of the most common idioms is the use of an integer loop variable in a for loop. A correct for loop header looks like:

    for (i = 1; i <= 10; i++)

However, when programmers "copy-and-paste" program statements there are some errors that often arise. When asked to loop down from 10 to 1, a lazy programmer will copy and change the above for loop header, a highly error-prone practice. One such error is that ++ is not changed to -- as below:

    for (i = 10; i >= 1; i++) // ERROR

This will cause a loop that is (almost) infinite. It will terminate only when integer overflow causes i to become negative.

A similar use of copy-and-paste without due care has caused a similar error in the code below with nested loops:

    for (i = 1; i < n; i++)
        for (j = 0; j < n; i++) // ERROR
            arr[i][j] = 0;

Can you see the bug? It's hidden by "code blindness" if you can't.

Arithmetic Overflow and Underflow

Surprisingly, arithmetic errors are a reasonably common attack vector for malicious actors. Many C++ programs largely ignore the issue of overflow. However, when you consider that integers are often used as indices in arrays and strings, it becomes clear that intentionally overflowing an index variable could cause modification to memory in any other locations. This is one level of indirection removed from buffer overflows, but it gets to the same thing.

What are the solutions to arithmetic overflow?

Compiler-supported "safe arithmetic" modes.
Manual self-checks for overflow and underflow.
Safe integer wrapper classes.

Having a compiler safe mode that fixes arithmetic overflow is likely to be prohibitively expensive. Consider having every operator needing to check for integer overflow, or similarly all floating-point arithmetic needing to check for NaN or similar problems.

Fortunately, the effects of arithmetic overflow and underflow are well-defined in practice, even if they are officially "undefined behavior" in code. The effects are:

Signed integer overflow — from INT_MAX to a large negative (INT_MIN).
Signed integer underflow — from INT_MIN (negative) to INT_MAX (positive).
Unsigned integer overflow — from UINT_MAX around to zero.
Unsigned integer underflow — from zero around to UINT_MAX.

Self-testing arithmetic overflow. Self-tests for integer overflow can be things like this:

    int i = sz * sizeof(float);
    assert(i > 0);

However, note that the above may detect overflow in the lab, but if you have "soft assertions" that don't abort, then it doesn't actually prevent a malicious actor from abusing it. Instead, you could manually define the code:

    if (i < 0) {
        assert(i > 0);
        abort();
    }

However, that block of code reeks of copy-paste errors. Maybe you need a method of defining "hard assertions" like:

    assert_abort(i > 0);

Testing for signed integer overflow becomes:

    i++:
    assert_abort(i > 0);

Testing for signed integer underflow becomes:

    i--;
    assert_abort(i < 0);

Testing for unsigned integer overflow is:

    u++;
    assert_abort(u != 0);

Testing for unsigned integer underflow is:

    i--;
    assert_abort(u != UINT_MAX);

Note that all of these are tests after the overflow or underflow has already occurred. This idea of "post-testing" for integer overflow also generalizes to other arithmetic operations, such as addition or multiplication. There is also hardware support in some CPUs for detecting an arithmetic operation that caused an overflow.

Pre-testing. Post-testing is probably acceptable, since it's not the actual arithmetic overflow or underflow that causes the vulnerability, but the misuse of the integer variable afterwards. However, you can also do "pre-testing" of simple forms of integer overflow, such as from increment or decrement.

    assert_abort(i != INT_MAX);   // Integer overflow pre-test
    i++;
    assert_abort(i != INT_MIN);   // Integer underflow pre-test
    i--;
    assert_abort(u != UINT_MAX);   // Unsigned overflow pre-test
    u++
    assert_abort(u != 0);   // Unsigned underflow pre-test
    u--;

Note that unsigned arithmetic testing also applies to various commonly-used builtin types, such as size_t.

Insidious C++ Coding Errors

If you're one of the many who often ignore C++ compiler warnings, here's a few examples of things that cause insidious program failures. The only redeeming point: many of them get warnings from the C++ compiler.

Aliasing in the overloaded assignment operator

The definition of an overloaded "operator=" function for a class should always check for an assignment to itself (i.e., of the form "x=x"). Consider the following simple MyString class:

    class MyString {
    private:
        char* m_str;
    public:
        MyString() { m_str = new char[1]; m_str[0] = '\0'; }
        MyString(char* s)
        {
                m_str = new char[strlen(s) + 1]; strcpy(m_str, s);
        }
        void operator =(const MyString& s);
        ~MyString() { delete[] m_str; }
        void print() { printf("STRING: %s\n", m_str); }
    };

    void MyString::operator = (const MyString& s)
    {
        delete[] m_str; // delete old string
        m_str = new char[strlen(s.m_str) + 1]; // allocate memory
        strcpy(m_str, s.m_str); // copy new string
    }

The above code looks fine, but this contains a hidden error that appears only if a string is assigned to itself. Consider the effect of the code:

        MyString s("abc");
        s = s;
        s.print();

When the assignment operator is called, the argument s is the same as the object to which the member function is applied. Therefore, the addresses m_str and s.m_str are the same pointer, and the delete operator deallocates an address that is immediately used in the subsequent strlen and strcpy function calls. Thus, these operations apply to an illegal address with undefined behavior, and it fails with a crash or garbage output.

This error is an example of a general problem of aliasing in the use of overloaded operators, especially the = operator. The object to which the operator is applied is an alias for the object passed as the argument. Any modifications to the data members also affect the data in the argument object. This type of error is very difficult to track down because it occurs only for one particular special case, and this case may not occur very often. This error is not restricted to operator= ,although this is its most common appearance. Similar aliasing errors may also occur in other operators such as +=, or in non-operator member functions that accept objects of the same type.

The correct idiom to avoid this problem of aliasing is to compare the implicit pointer, this, with the address of the argument object (which must be passed as a reference type). If these addresses are the same, the two objects are identical and appropriate action can be taken for this special case. For example, in the MyString class the correct action when assigning a string to itself is to make no changes, and the operator= function becomes:

    void MyString::operator = (const MyString& s)
    {
        if (this != &s) {  // Correct!
                delete[] m_str; 
                m_str = new char[strlen(s.m_str) + 1]; 
                strcpy(m_str, s.m_str); 
        }
    }

Accidental empty loop

A common novice error with loops is to place a semicolon just after the header of a for or while loop. Syntactically, this is correct, so the compiler gives no error message. However, it changes the meaning of the loop. For example, consider the code:

    for (i = 1; i <= 10; i++); // Extra semicolon
    {
        ... // body of loop
    }

This is interpreted as:

    for (i = 1; i <= 10; i++)
        ;   // empty loop
    {
        ... // body of loop executed only once
    }

Semicolons are statements in C++. The effect of this is that the body of the loop is assumed to be an empty loop by the compiler. The block after the loop header (the real loop body) is only executed after the loop has finished, and is executed only once. Worse still, the accidental empty loop may cause an infinite loop if the condition is not being changed in the header.

Dangling else error

The rule that an else always matches the closest if is usually satisfactory. However, there are occasions where "dangling else" errors can arise in nested if statements such as:

    if (y < 0)
        if (x < 0)
            x = 0;
    else   // Bug!
        y = 0;

Based on the indentation used by the programmer, the else clause is presumably intended to match the first if. However, the compiler matches the else with the second (closest) if, and compiles the code as if it were written as:

    if (y < 0) {
        if (x < 0)
            x = 0;
        else
            y = 0;
    }

The method of avoiding this error is to always use braces around the inner if statement when using nested if statements.

    if (y < 0) {    // Correct
        if (x < 0)
            x = 0;
    }
    else
        y = 0;

sizeof array parameter

There is another situation when the sizeof operator computes surprising results when applied to a function parameter of array type. The error is illustrated by the following function:

    void test_sizeof(int arr[3])
    {
        printf("Size is %d\n", (int) sizeof(arr));
    }

The computed size is expected to be 3*sizeof(int), usually 12. However, the actual result will usually be 4 or 8. This is because the sizeof operator is actually being applied to a pointer type. An array parameter is converted to the corresponding pointer type and it is this type that sizeof applied to. Therefore, the output result is exactly sizeof(int*), which is the size of a pointer, commonly 4 or 8.

Accidental string literal concatenation

String concatenation is a relatively obscure feature of C++ that allows consecutive string literals to be merged into a single string literal. Concatenation of string literals takes place after the usual preprocessing tasks (i.e., after macro expansion), but before parsing.

An example of its usage is that the following code:

    char *prompt = "Hello "
                   "world";

This looks like a typo to beginner C++ programmers, but is totally valid C++ that will be equivalent to:

    char *prompt = "Hello world";

Once you get used to it, this is a very helpful C++ feature that is most useful for writing long string literals on multiple lines. In particular, it avoids the pitfalls that line splicing, with backslashes at the end of a line, has involving whitespace inside string literals.

Unfortunately, the fact that the compiler (or preprocessor) performs this concatenation automatically without any warning can also lead to strange errors. Consider the following definition of an array of strings:

    char *arr[] = { "a", "b" "c" };   // Bug (missing comma)

The absence of the second comma causes "b" and "c" to be concatenated to produce "bc" and arr is defined to hold 2 strings instead of 3. Even if the array size were explicitly declared as 3 (i.e., char*arr[3]) many compilers would still not produce a warning, since having too few initializers is not an error.

Octal integer constants

Any integer constant beginning with 0 is treated as an octal constant. This creates no problem with 0 itself since its value is the same in both octal and decimal, but there are dangers in using prefix zeros on integer constants. Nevertheless, the temptation to use initial zeros can arise occasionally. For example, consider representing 4-digit phone extension numbers as integers:

    struct { char *name; int ext_number; } arr[] = {
        { "Mary", 7234 },
        { "John", 3467 },
        { "Elaine", 0135 }   // Bug!
    };

The phone number 0135 will be interpreted as an octal constant, and won't equal decimal 135. It's value in octal is 1*64+3*8+5=93.

Nested Comments Hide Statements

Nested /* comments are not allowed in C++, although they might trigger a warning. This creates an insidious problem if you accidentally leave off the closing */ for a comment. Note that there's no similar issue with the // style of commenting. Consider this CUDA C++ code:

     __global__ void matrix_add_safe_puzzle8(
        float *m3, const float *m1, const float *m2, 
        int nx, int ny)
    {
       int x = blockIdx.x * blockDim.x + threadIdx.x;
       int y = blockIdx.y * blockDim.y + threadIdx.y;
       if (x < nx /*X* / && y < nx /*Y*/ ) {
           int id = x + y * nx; // Linearize
           m3[id] = m1[id] + m2[id];
       }
    }

There’s a nested comment problem that will comment-out the “y < nx” test, because there’s an accidental space between “*” and “/” in the first comment. You’d probably get a compiler warning, and hopefully you pay attention to them!

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: Safe C++: Fixing Memory Safety Issues