Aussie AI
Chapter 18. Portability
-
Book Excerpt from "Safe C++: Fixing Memory Safety Issues"
-
by David Spuler
Portability Strategy
You do need portability if your users have different platforms. There are also various generic benefits from having most of the C++ code being standardized and portable. Using portable C++ in the general code areas means being able to run unit test on lots of utility code on developer's boxes, no matter what the deployment platform. Good code design generally dictates that the non-portable parts should at least be wrapped and isolated.
Portability in C++ programming of AI applications involves correctly running on the underlying tech stack, including the operating system, CPU, and GPU capabilities. Conceptually, in both cases, there are two levels:
1. Toleration. The first level of portability is “toleration” where the program must at least work correctly on whatever platform it finds itself.
2. Exploitation. The second level is “exploiting” the specific features of a particular tech stack, such as making the most of whatever CPU or GPU hardware is available.
This is generally true for any application, but especially true for AI engines. To get it running fast, you'll need a whole boatload of exploitation deep in your C++ backends. Hence, the basic approach to writing portable code is:
1. Write generic code portably, and
2. Write platform-specific code where needed.
Writing portable standard C++ code. Wherever your application doesn't need a GPU, your C++ code should be written in portable C++. The majority of the C++ programming language is well-standardized, and a lot of code can be written that simply compiles on all platforms in a way that it has consistent results. You just have to avoid the portability pitfalls.
Platform-specific coding.
Most C++ programmers are familiar with using #if or #ifdef preprocessor directives
to handle different platforms,
and the various flavors of this are discussed further below.
The newer C++ equivalent is “if constexpr” statements
for compile-time processing.
Small or sometimes large sections of C++ code will need to be written differently on each platform.
Likely major areas that will be non-portable include:
- Hardware acceleration (GPU interfaces)
- Intrinsic functions (CPU acceleration)
- FP16/BF16 floating-point types
- User interfaces (Windows vs Mac vs X Windows)
- Android vs iOS (not just the GUI)
- Multi-threading (Linux vs Windows threads)
- Text file differences (You've heard of
\r, right?) - File system issues (Directory hierarchies, permissions, etc.)
- “Endian” issues in integer representations.
Consider your code choices carefully. Some other areas where you can create portability pain for yourself include:
- Third-party libraries (i.e.,if not widely used like STL or Boost).
- Newer C++ standard language features (e.g., C++23 features won't be widely supported yet).
Compilation Problems
If you want your C++ code to run on both Linux and Windows, you might need to get past the compiler errors first! C++ has been standardized for decades, or it seems like that. So, I feel like it should be easier to get C++ code to compile. And yet, I find myself sometimes spending an hour or two getting past a few darn compiler errors. Most compilers have a treat-warnings-as-errors mode. Come on, I want the reverse.
Some of the main issues that will have a C++ program compile on one C++ compiler (e.g., MSVS) but not on another (e.g., GCC) include:
constcorrectness- Permissive versus conformant modes
- Pointer type casting
const correctness refer to the careful use of “const” to mark
not just named constants,
but also all unchanging read-only data types.
If it's “const” then it cannot be changed;
if it's non-const, then it's writable.
People have different levels of feelings about whether this is a good idea.
There are the fastidious Vogon-relative rule-followers who want it, and the
normal reasonable pragmatic people who don't.
Can you see which side I'm on?
Anyway, to get non-const-correct code (i.e.,mine) to compile on GCC or MSVS,
you need to turn off the fussy modes.
On MSVS, there's a “permissive” flag in “Conformance Mode” in Project Settings that you have to turn off.
Pointer type casting is another issue.
C++ for AI has a lot of problems with pointer types,
mainly because C++ standardizers back in the 1990s neglected to create a “short float” 16-bit
floating-point type.
Theoretically, you're not supposed to cast between different pointer types,
like “int*” and “char*”.
And theoretically, you're supposed to use “void*” for generic addresses,
rather than “char*” or “unsigned char*”.
But, you know, this is AI, so them rules is made to be broken,
and the C++ standardizer committees finally admitted as much when
they created the various special types of casts about 20 years later (i.e., reinterpret_cast).
Anyway, the strategies for getting a non-compiling pointer cast to work include:
- Just casting it to whatever you want.
- Turning on permissive mode
- Casting it to void* and back again (i.e.,“
x=*(int*)(void*)(char*)&c”) - Using “
reinterpret_cast” like a Goody Two-Shoes.
Runtime Portability Glitches
A bug that occurs on every platform is just that: a bug. A portability glitch is one with different behavior on different platforms. Some examples of the types that can occur:
- The code doesn't compile on a platform.
- The code has different results on different platforms.
- Sluggish processing on one platform.
- Crashes, hangs, or spins on one platform.
Some other types of weird problems that might indicate a portability glitch:
- Code runs fine in normal mode, but fails when the optimizer is enabled, or if the optimization level is increased.
- Code crashes in production, but runs just fine in the debugger (i.e.,cannot reproduce it).
- Code intermittently fails (e.g., it could be a race condition or other timing issue.)
A lot of these types of symptoms are screaming “memory error!” And indeed, that's got to be top of the list. You might want to run your memory debugging tools again (e.g., Valgrind), even on a different platform to the one that's crashing.
However, it's not always memory or pointers. There are various other insidious bugs that can cause weird behavior in the 0.001% of cases where it's not a memory glitch:
- Uninitialized variables or object members.
- Numeric overflow or underflow (of integers or
floattype). - Data size problems (e.g., 16-bit, 32-bit, or 64-bit).
- Undefined language features. Your code might be relying on something that isn't actually guaranteed in C++.
Data Type Sizes
There are a variety of portability issues with the sizes of basic data types in C++. Some of the problems include:
- Fundamental data type byte sizes (e.g., how many bytes is an “
int”). - Pointer versus integer sizes (e.g., do
voidpointers fit inside anint?). size_tis usuallyunsigned long, notunsigned int.
Typical AI engines work with 32-bit floating-point (float type).
Note that for 32-bit integers you cannot assume that int is 32 bits,
but must define a specific type.
Furthermore, if you assume that short is 16-bit, int is 32-bit, and long is 64-bit, well, you’d be incorrect.
Most platforms have 64-bit int types,
and the C++ standard only requires relative sizes,
such as that long is at least as big as int.
Your startup portability check should check that sizes are what you want:
// Test basic numeric sizes
aussie_assert(sizeof(int) == 4);
aussie_assert(sizeof(float) == 4);
aussie_assert(sizeof(short) == 2);
Another more efficient way is the compile-time static_assert method:
static_assert(sizeof(int) == 4);
static_assert(sizeof(float) == 4);
static_assert(sizeof(short) == 2);
And you should also print them out in a report, or to a log file, for supportability reasons.
Here’s a useful way with a macro
that uses the “#” stringize preprocessor operator and also the standard adjacent string concatenation feature of C++.
#define PRINT_TYPE_SIZE(type) \
printf("Config: sizeof " #type " = %d bytes (%d bits)\n", \
(int)sizeof(type), 8*(int)sizeof(type));
You can print out whatever types you need:
PRINT_TYPE_SIZE(int);
PRINT_TYPE_SIZE(float);
PRINT_TYPE_SIZE(short);
Here’s the output on my Windows laptop with MSVS:
Config: sizeof int = 4 bytes (32 bits)
Config: sizeof float = 4 bytes (32 bits)
Config: sizeof short = 2 bytes (16 bits)
Standard Library Types:
Other data types to consider are the builtin ones in the standards.
I’m looking at you, size_t and time_t, and a few others that belong on Santa’s naughty list.
People often assume that size_t is the same as “unsigned int”
but it’s actually usually “unsigned long”.
Here’s a partial solution:
PRINT_TYPE_SIZE(size_t);
PRINT_TYPE_SIZE(clock_t);
PRINT_TYPE_SIZE(ptrdiff_t);
Data Representation Pitfalls
Portability of C++ to platforms also has data representation issues such as:
- Floating-point oddities (e.g., negative zero,
Inf, andNaN). - Whether “
char” means “signed char” or “unsigned char” - Endian-ness of integer byte storage (i.e., do you prefer “big endian” or “little endian”?).
- Whether zero bytes represent zero integers, zero floating-point, and null pointers.
Zero is not always zero? You probably assume that a 4-byte integer containing “0” has all four individual bytes equal to zero. It seems completely reasonable, and is correct on many platforms, but not all. There’s a theoretical portability problem on a few obscure platforms. There are computers where integer zero or floating-point 0.0 is not four zero bytes. If you want to check, here’s a few lines of code for your platform portability self-check code at startup:
int i2 = 0;
unsigned char* cptr2 = (unsigned char*)&i2;
for (int i = 0; i < sizeof(int); i++) {
assert(cptr2[i] == 0);
}
Are null pointers all-bytes-zero, too?
Here’s the code to check NULL in a “char*” type:
// Test pointer NULL portability
char *ptr1 = NULL;
unsigned char* cptr3 = (unsigned char*)&ptr1;
for (int i = 0; i < sizeof(char*); i++) {
assert(cptr3[i] == 0);
}
What about 0.0 in floating-point? You can test it explicitly with portability self-testing code:
// Test float zero portability
float f1 = 0.0f;
unsigned char* cptr4 = (unsigned char*)&f1;
for (int i = 0; i < sizeof(float); i++) {
assert(cptr4[i] == 0);
}
It is important to include these tests in a portability self-test,
because you’re relying on this whenever you use memset or calloc.
Pointers versus Integer Sizes
You didn’t hear this from me, but apparently you can store pointers in integers, and vice-versa, in C++ code. Weirdly, you can even get paid for doing this. But it only works if the byte sizes are big enough, and it’s best to self-test this portability risk during program startup. What exactly you want to test depends on what you’re (not) doing, but here’s one example:
// Test LONGs can be stored in pointers
aussie_assert(sizeof(char*) >= sizeof(long));
aussie_assert(sizeof(void*) >= sizeof(long));
aussie_assert(sizeof(int*) >= sizeof(long));
// ... and more
Note that a better version in modern C++ would use “static_assert” to test these sizes at compile-time,
with zero runtime cost.
static_assert(sizeof(char*) >= sizeof(long));
static_assert(sizeof(void*) >= sizeof(long));
static_assert(sizeof(int*) >= sizeof(long));
In this way, you can perfectly safely mix pointers and integers in a single variable. Just don’t tell the SOC compliance officer.
References
- Horton, Mark, Portable C Software, Prentice Hall, 1990, https://www.amazon.com/Portable-Software-Mark-R-Horton/dp/0138680507.
- Jaeschke, Rex, Portability and the C Language, Hayden Books, 1989, https://www.amazon.com/Portability-Language-Hayden-Books-library/dp/0672484285.
- Lapin, J. E., Portable C and UNIX System Programming, Prentice Hall, 1987, https://www.amazon.com/Portable-Systems-Programming-Prentice-hall-Processing/dp/0136864945.
- Rabinowitz, Henry, and SCHAAP, Chaim, Portable C, Prentice Hall, 1990, https://www.amazon.com/Portable-C-Prentice-Hall-Software/dp/0136859674.
- David Spuler, March 2024, Generative AI in C++, https://www.amazon.com/Generative-AI-Coding-Transformers-LLMs-ebook/dp/B0CXJKCWX9/.
|
• Online: Table of Contents • PDF: Free PDF book download |
|
Safe C++: Fixing Memory Safety Issues:
Get it from Amazon: Safe C++: Fixing Memory Safety Issues |