Aussie AI
Chapter 21. DIY Memory Safety
-
Book Excerpt from "Advanced C++ Memory Techniques: Efficiency and Safety"
-
by David Spuler, Ph.D.
Chapter 21. DIY Memory Safety
Why DIY Memory Safety?
Well, because you fix some bugs yourself!
Instead of waiting for compiler vendors to add a “-safe” option,
or the standards organizations to define “Safe C++” language,
you do it yourself!
These are the main memory safety issues in C++:
- Array bounds writes (buffer overflow writes)
- Array bounds reads (buffer overflow reads)
- Uninitialized memory usage (e.g.,
malloc,new, stack buffers). - Use-after-deallocation (i.e., reads or writes after
freeordelete). - Double-deallocation (i.e., double-
free, double-delete).
There are also other special cases of memory issues:
- File pointer misuses (e.g., double-
fclose). - Text buffer overruns (e.g., string copy overwrites).
Strategies for DIY Memory Safety
There are two overarching strategies, which are the opposite of each other:
- Make some failures harmless (e.g., get rid of uninitialized memory usage errors by always initializing memory to zero).
- Detect more failures by automatically causing memory problems intentionally.
You can pick one of these and do it for both developer testing and production runs by customers. Or you can vary the idea:
- Detect more bugs in developer mode.
- Make the bugs harmless in production mode.
Why would we do this?
Why not just run AddressSanitizer or valgrind?
There’s a few reasons:
- The sanitizers run too slow, so we cannot use them all the time, or in production.
- If we implement fast DIY methods, we can use them continually during testing.
- If they’re really fast, we might even leave the self-checks in for production runs.
The DIY techniques to detect more bugs inside your own code include:
- Canary regions (“redzones”) around memory blocks.
- Poisoning memory inside the blocks with error-triggering values.
- Magic values for statuses stored in buffers.
- Full address tracking (i.e., your own hash table of memory block addresses).
Hence, there are multiple levels of error detection,
ranging from super-fast to almost-as-slow-as-valgrind.
Making Uninitialized Accesses Harmless
There’s another option: just fix it! Instead of trying to find the bugs, just make them disappear by becoming harmless. This is particularly true of the whole class of memory bugs base on uninitialized memory reads.
Why are these even bugs? They seem more like language design failures, with too great a focus on speed. The basic problem with standard C++ and memory initialization is this patchwork of choices:
- Global variables are initialized to zero (hooray!).
- Local
staticlocal variables are initialized to zero (hooray!). - Stack variables are not initialized to zero (boo!).
- Heap-allocated memory blocks are sometimes initialized to zero (boo!).
For heap memory allocation, we have again a patchwork:
mallocmemory is never initialized.callocinitializes to zero always.newof object types relies on constructors to initialize.newof arrays of objects relies on (many) constructors to initialize.newof primitive data types does not initialize at all (single variables or arrays).reallocdoes not initialize extra memory.
Really we want: change all malloc and new calls to calloc.
Then a whole class of memory safety issues just disappears!
Honestly, rather than detecting uninitialized memory uses, shouldn’t we just make them a non-issue?
Why would we even bother trying the other strategy of filling uninitialized memory with poisoned values,
when we could just fix it everywhere?
Intercepting C++ Primitives
Here are the basic strategies for how to integrate safety into your code with DIY fixes to your codebase:
- Coding style to require calling safe functions
- Wrapper functions to automatically fix or detect issues.
The way that debug wrapper functions work includes these ideas:
- Macro intercepts of
malloc,calloc, andfree. - Link-time intercepts of
newanddeleteoperators. - Macro intercepts for
strlenandstrcpy, etc. - Macro intercepts for
fopenandfclose.
We have to be aware of a few issues:
- Macro intercepts won’t get any allocations from any less-used primitives we don’t intercept.
- Macro intercepts won’t see anything in third-party libraries (including Standard C++/STL).
- Link-time
newanddeleteintercepts will see Standard C++ calls (which can be good or bad). - Link-time
newanddeleteintercepts must define four versions, two for objects, and two array versions. - There’s no simple way to intercept stack-based memory operations for local variables (i.e., from function calls or returns).
- We can macro-intercept stack-based
allocacalls, but it’s hard to know when the function returns. - We can macro-intercept
fopentype file operations, but it’s hard for C++fstreamtypes.
Overall, the DIY memory safety approach is a patchwork of techniques in itself.
It would be so much easier if the compiler vendors would just add a “-safe” flag that
does all this!
|
• Online: Table of Contents • PDF: Free PDF book download • Buy: Advanced C++ Memory Techniques: Efficiency and Safety |
|
Advanced C++ Memory Techniques: Efficiency & Safety:
Get your copy from Amazon: Advanced C++ Memory Techniques |