Aussie AI
Chapter 2. Rust versus C++
-
Book Excerpt from "Safe C++: Fixing Memory Safety Issues"
-
by David Spuler
Why Rust Memory Safety?
We didn't get here overnight. The concerns about C++ memory glitches have been well-known for years. A variety of techniques and tools have arisen to mitigate these problems, but they're not perfection.
More recently came the focus on security vulnerabilities. The problem with memory safety issues is not only that it causes a crash or a glitch for our users, but it also exposes a security vulnerability that can be exploited by malicious actors.
The classic attack vector is to use a buffer overrun to cause the program to execute malicious code. The attack is rather involved, meaning that the buffer overrun has to trigger the machine code to be executed. However, exploiting these memory errors, especially on the stack, has become routine.
Many companies have been running defence against security exploits, and have spent a lot of resources doing so. Both Microsoft and Google report that over 70% of their C++ vulnerabilities are related to memory safety failures.
Maybe we should fix that!
But the initiatives for C++ safety didn't really get a head of steam until the U.S. Government began reporting on security vulnerabilities related to memory safety weaknesses in programming languages. The recent White House initiative to convert usage to memory-safe programming languages was seen as a challenge to the very existence of C++ as a programming language.
Hence, we get a debate on Rust versus C++ and whether programmers should switch to a memory-safe language. This has subsequently led to the development of the Safe C++ proposal.
C++ Versus Rust
Rust is newer and is gaining a lot of supporters in the development community. The pros of Rust over C++ include:
- Memory safety
- Thread safety (concurrency control)
- Advanced modern language features
The advantages that C++ retains over Rust include:
- Tested and well-understood
- Developer community
- Longstanding codebases
- Large ecosystem of tools and libraries.
- Standardized (i.e., C++11/C++14/C++17/C++20/C++23)
Rust vs C++ Syntax. Some of the differences in the low-level syntax of the two languages:
- Rust uses "
let" for assignments. - Rust memory allocation uses "borrow" and "lifetime" annotations for compile-time validated memory safety.
- Rust does not need a garbage collection mechanism (unlike various other memory-safe languages such as Go or Java).
- Rust has a "
println" command for output. - Ruse uses "
struct" (structure) and "impl" (implementation) for class-like modularity.
Why Not Other Memory-Safe Languages?
The push for an alternative memory-safe programming language has coalesced around Rust as the main alternative. But it's not the only memory-safe language. Why not others like Go or Java?
The primary reasons are:
- Memory safety compile-time enforcement, and
- No garbage collection.
Whereas Java has memory-safety, it also requires garbage collection for memory allocation. This is a significant runtime cost, and hinders the use of Java in latency-critical applications and low-level operating system code.
By way of comparison, Rust's use of borrows means that there's no need for garbage collection. The de-allocation of memory is automatic. Hence, Rust has a reputation as a strong choice for low-latency coding, and notably, is now being used as part of the code for the Linux kernel.
In Defense of C++
C++ has a lot going for it, and a wholesale move to Rust would involve massive upheaval. Advantages include:
- Large number of experienced and new developers.
- Strong ecosystem of tools and components.
- Standardized libraries of code (huge efforts).
- Existing installation codebase around the world.
Such an ecosystem didn't grow without a reason. Let us take a moment to remind ourselves of the inherent positives of the C++ programming language itself:
- Object oriented programming
- Modularity (classes)
- Type safety
- Speed and efficiency
- Portability (high-level language)
- Exception handling mechanisms
Types of Memory Safety
Memory errors are a large class of problems in C++ programs. Both Microsoft and Google reported that approximately 70% of their C++ program issues were related to memory safety. The main impacts are:
- Safety — glitches and crashes in programs.
- Security — buffer overflows and related memory vulnerabilities are attack vectors.
Memory safety errors can be split into two main types:
- Spatial (location-based)
- Temporal (time-based)
Spatial memory errors are those related to a bad address. Examples in C++ would include:
- Array address out-of-bounds
- Array address underflow
Temporal memory errors are time-related errors in the sequence of memory usage. The memory was previously valid, but is now invalid. Uninitialized memory is another example where the memory is not yet valid. Examples in C++ include:
- Double de-allocation.
- Use of stack addresses after stack unwinding.
- Use of de-allocated memory.
Concurrent and multi-threaded programming in C++ gives additional examples of temporal issues in parallel programming:
- Race conditions (write-after-read, read-after-write, write-after-write).
- Synchronization errors (underlying cause).
Detection versus Toleration
There are many areas where there is tension between detecting errors and resilient toleration of problems. These are the age-old debates about whether to leave debug code in production or not. If there is a failure for a customer, do we want it to be detected, bearing in mind that this will be perceived by the customer as a software failure, little different to other less graceful crashes. Or would we rather that the software quietly handles the error, and is thus resilient for the customer. An intermediate method would be to do both:
(a) Detect the internal error and log it, and
(b) Tolerate the error and continue execution.
Speed. The other issue: speed versus safety. How much performance in terms of compute efficiency are we willing to give up to achieve these different types of error detection and resilient capabilities?
Generally speaking, there is a trade-off between how many errors can be detected, versus the execution time penalty for doing the additional checking. For example, in trying to detect memory errors via filling the block with magic values, we could use:
- None
- Magic value in the first address of a memory block.
- Magic values in the whole memory block.
- Hash table of addresses for tracking of blocks.
- Hash table for addresses and magic values in blocks.
Uninitialized memory errors. The problems with incorrect use of uninitialized memory error present a classic example of detection versus resilience. Our debugging memory library can fill uninitialized memory with data, with two strategies:
- Canary strategy — fill with magic non-zero values.
- Toleration strategy — fill with zeros (i.e., initialize it).
The canary strategy will detect the error, whereas the toleration strategy will make it harmless. Which one is better?
Compiler-Supported Memory Safety
Some of the specific features that could be used to improve memory safety include:
- Heap memory initialization (e.g.,
malloc,new) - Stack memory initialization
- Double-deallocation detection
- Uninitialized memory detection
- Use-after-free memory detection
- Use after stack unwind memory detection
Note that these methods that initialize memory could either use a canary strategy with non-zero magic values to detect memory issues, or could zero the memory to make uninitialized-use errors harmless. Hence, these safety methods should have different options for handling uninitialized memory usage checking:
- Nothing
- Canary (detection with magic value filling)
- Zeroing (toleration harmlessly).
Memory safety is only one aspect in safe programming, although it's a major problem in C++. Other issues in C++ (and other languages, too) include:
- Arithmetic overflow and underflow
- Undefined behavior (non-standardized features)
- String and character processing
- File processing
Languages Can't Fix Everything
There are some things that neither Rust nor Safe C++ could possibly fix:
- Platform-specific features
- Low-level features
Areas of portability that are unlikely to be sorted by your programming language include:
- Data type sizes (e.g., 32-bit vs 64-bit).
- Files and directories
- Database integrations
- Devices and peripherals
- Signals and interrupts
- Assembly language
There are also some more obscure C++ coding issues that are problematic for all languages:
- Endian-ness of numeric representations.
- ASCII versus EBCDIC character set.
- Internationalization with UTF8 and Unicode.
|
• Online: Table of Contents • PDF: Free PDF book download |
|
Safe C++: Fixing Memory Safety Issues:
Get it from Amazon: Safe C++: Fixing Memory Safety Issues |