Aussie AI Blog
Safe C++ Text Buffers with snprintf
- 
                                            Oct 29th, 2024
- 
                                            by David Spuler, Ph.D.
C++ sprintf is unsafe
The C++ sprintf function is a long-standing part of C and C++,
but it's also unsafe.
It can easily overflow a buffer,
and there's no way to know
without inspecting the parameters in greater detail.
Consider this code:
    char buf[100];
    sprintf(buf, "%s", str);   // Buffer overflow?
One marginally safer way is to use the precision markers, such as in:
    char buf[100];
    sprintf(buf, "%.100s", str);   // Still overflows
In this way, the output is limited to 100 bytes, but this is still an overflow because of the +1 for the null byte. We really need this:
    char buf[100];
    sprintf(buf, "%.99s", str);   // No buffer overflow
snprintf is safer
The snprintf function is safer than sprintf.
On some platforms, there is also the sprintf_s safe function.
Here's how snprintf works:
    char buf[100];
    snprintf(buf, 100, "%s", str);   // Safer
We can write this more portably:
    char buf[100];
    snprintf(buf, sizeof buf, "%s", str);   // Safer
Problems with snprintf
Although using snprintf will avoid a buffer overrun and a crash
(whereas sprintf didn't),
there are still some limitations:
- Not easy to detect if any overflow occurs (i.e., was prevented).
- Difficult to use snprintfin the middle of a string.
- Appending with snprintfis similarly tricky.
Detecting truncated overflows with snprintf
In many applications, you might want to know that a buffer overflow was avoided,
such as by emitting an error message or throwing an exception.
By default, snprintf will quietly truncate the output and do nothing else.
It is possible to examine the return value of snprintf to know
whether an overflow has been prevented and the output truncated.
The returned value is an integer and it's rather weird:
The bytes that would have been output if there were enough room in the buffer..
If there's no overflow, then snprintf returns the bytes output (excluding the terminating null byte),
just like unsafe sprintf.
If there's an overflow, then the return value will be more than (or equal to) the size of the buffer.
This seems odd, but it's actually quite useful,
because the way to detect an overflow is simply
to compare the return code to the buffer size:
    int bufsize = sizeof buf;
    int ret = snprintf(buf, bufsize, "%s", s);
    if (ret < 0) {
        // snprintf failure... (can this really occur?)
    }
    else if (ret >= bufsize) {
        // Overflow has occurred! (Truncated text)
    }
    else {
        // Normal case. 
        // The string and its null byte fit in the buffer.
    }
Note that if the return code exactly equals the buffer size (i.e., ret==bufsize), this is still an overflow
because the extra null byte didn't fit, 
and snprintf has truncated one character from the output string
so as to leave room for the null byte.
Macro wrapping snprintf return codes
The above code sequence is rather a lot of typing if you're
going to do that for every call to snprintf.
Here's a way to automate it, using a preprocessor macro intercept
and an inline function to check the return code:
    #undef snprintf
    #define snprintf(dest, bufsize, ...) \
        aussie_snprintf_return_check(snprintf(dest, bufsize,__VA_ARGS__), \
                  bufsize, __func__, __FILE__, __LINE__)
This looks dangerous since the macro snprintf is also in the macro value.
However, C++ preprocessor macros that are self-referential are only expanded once.
This is standard functionality since inception for both C and C++.
Note that this is using variable-arguments C++ macros, which are also standard C++ for many years now.
These include the "..." and the "__VA_ARGS__" tokens.
There's also a useful __VA_OPT__ macro, but we don't need it here.
The above macro simply wraps the call to snprintf with another function
whose only task is to check the return value.
Here's an example of that definition:
    inline int aussie_snprintf_return_check(
        int snprintf_retval, int bufsize, 
        const char* func, const char* file, int line
        )
    {
	// PURPOSE: Wrapper for snprintf return value 
	// ... snprintf_retval is the value that was returned by snprintf 
        // ... (sent here by macro interception of snprintf)
	if (snprintf_retval < 0) {
		AUSSIE_ERROR_CONTEXT("AUS053", "snprintf returned negative failure", func, file, line);
		return snprintf_retval;  // pass through
	}
	else if (snprintf_retval >= bufsize) {
		int bytes_truncated = snprintf_retval - bufsize + 1;
		// TODO: report the bytes truncated, bufsize, etc., as extra error context...
		AUSSIE_ERROR_CONTEXT("AUS054", "snprintf overflow truncated buffer", func, file, line);
		return snprintf_retval;  // pass through
	}
	return snprintf_retval;  // pass through
    }
Unsafe Buffer Appending with sprintf
It's tricky to append to a string using sprintf or snprintf.
Here's the basic idiom for unsafe sprintf appending using strlen:
    char xbuf[1000] = "";
    sprintf(xbuf + strlen(xbuf), "abc");
    sprintf(xbuf + strlen(xbuf), "def");
    sprintf(xbuf + strlen(xbuf), "xyz");
Note that this works even for the special case of an empty string,
where strlen will return 0, and add nothing to the location.
If you do this a lot, or the buffer is a massive text string (e.g., a long HTML document in memory),
then the call to strlen is a slug.
Marginally better is to maintain an incremental buffer pointer, so that the strlen calls
are only from the current location, which is faster.
    char* where = xbuf;
    sprintf(where, "abc");
    where += strlen(where); // append
    sprintf(where, "def");
    where += strlen(where); // append
    sprintf(where, "xyz");
And you can micro-optimize this using the return code,
which works for sprintf, which returns the number of bytes output.
    char* where = xbuf;
    where += sprintf(where, "abc");
    where += sprintf(where, "def");
    where += sprintf(where, "xyz");
But beware a pitfall: don't do this trick for snprintf, because it doesn't always
return the actual bytes output, but returns the bytes it would have output,
had it been in the right frame of mind.
There's only one problem with all those appending tricks: none of them are safe!
Safe Buffer Appending with snprintf
How do we append safely to a buffer? We want to do this:
    char xbuf[1000] = "abc";
    snprintf_append(xbuf, sizeof xbuf, "def");
But this function doesn't exist. We have to try to define our own via a macro:
    #define snprintf_append(dest, bufsize, ...) \
        do { \
         int snplentmp = (int)strlen((char*)dest); \
         snprintf((char*)(dest) + snplentmp, (bufsize) - snplentmp, __VA_ARGS__); \
        } while(0)
As you can see, this figures out how far along the buffer to append using strlen.
Then it adds that byte count to the location, but also reduces the buffer size by that amount.
It's difficult to return the value of snprintf in this statement-like macro.
However, if we're using the macro intercept 
with #define snprintf (as in prior sections), then the wrapped return value checking
will also be occurring in this usage of snprintf,
so maybe we don't need to return the value to the caller.
Again, the call to strlen can become a slug for large buffers,
because it's always scanning from the very start of the buffer.
The alternative is to maintain a pointer to the end of the string,
which is the location from which to append.
Pointer arithmetic can compute the byte count more efficiently.
    #define snprintf_append_end(dest, bufsize, endstr, ...) \
        do { \
         long int snplentmp = (long) ( (char*)endstr - (char*)dest); \
         snprintf((char*)(dest) + snplentmp, (bufsize) - snplentmp, __VA_ARGS__); \
        } while(0)
If we really do need to return the code through, then it's hard to do this in a macro,
which looks like a code block rather than a function-like macro.
Instead of using a macro, you can define a C++ function with variable arguments,
and then have it call the vsnprintf function.
    #include <stdarg.h>
    int snprintf_append_function(char *dest, int bufsize, char* format, ...)
    {
	va_list ap;
	int len = (int)strlen(dest);
	va_start(ap, format);
	int ret = vsnprintf(dest + len, bufsize - len, format, ap);
	va_end(ap);
	return ret;
    }
Again, we can avoid the slowdown from the strlen call if we maintain
another pointer to the end (or middle) of the text buffer:
    #include <stdarg.h>
    int snprintf_append_end_function(char* dest, int bufsize, char *endstr, char* format, ...)
    {
	va_list ap;
	if (*endstr != 0) endstr += strlen(endstr);  // Safe: move to the end if not already
	long int len = (long)((char*)endstr - (char*)dest);
	va_start(ap, format);
	int ret = vsnprintf(dest + len, bufsize - len, format, ap);
	va_end(ap);
	return ret;
    }
Actually, for a further optimization,
the parameter endstr probably should be a reference parameter,
so that its value is automatically updated in the calling code whenever it gets moved to the end.
And one final safety point: we need to check the return value of vsnprintf,
so that we know when an overflow caused a truncation.
This is possible
either through another macro intercept, like we did above for snprintf,
or by adding extra code directly into the above varargs functions.
Related Memory Safety Blog Articles
See also these articles:
- DIY Preventive C++ Memory Safety
- Canary Values & Redzones for Memory-Safe C++
- Use-After-Free Memory Errors in C++
- Array Bounds Violations and Memory Safe C++
- Poisoning Memory Blocks for Safer C++
- Uninitialized Memory Safety in C++
- DIY Memory Safety in C++
- CUDA C++ Floating Point Exceptions
- Memory Safe C++ Library Functions
- Smart Stack Buffers for Memory Safe C++
Safe C++ Book
|   | The new Safe C++ coding book by David Spuler: 
 Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues | 
Aussie AI Advanced C++ Coding Books
|   | C++ AVX Optimization: CPU SIMD Vectorization: 
 Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization | 
|   | C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations: 
 Get your copy from Amazon: C++ Ultra-Low Latency | 
|   | Advanced C++ Memory Techniques: Efficiency & Safety: 
 Get your copy from Amazon: Advanced C++ Memory Techniques | 
|   | Safe C++: Fixing Memory Safety Issues: 
 Get it from Amazon: Safe C++: Fixing Memory Safety Issues | 
|   | Efficient C++ Multithreading: Modern Concurrency Optimization: 
 Get your copy from Amazon: Efficient C++ Multithreading | 
|   | Efficient Modern C++ Data Structures: 
 Get your copy from Amazon: Efficient C++ Data Structures | 
|   | Low Latency C++: Multithreading and Hotpath Optimizations: advanced coding book: 
 Get your copy from Amazon: Low Latency C++ | 
|   | CUDA C++ Optimization book: 
 Get your copy from Amazon: CUDA C++ Optimization | 
|   | CUDA C++ Debugging book: 
 Get your copy from Amazon: CUDA C++ Debugging | 
More AI Research Topics
Read more about:
