Aussie AI

Chapter 21. Reliability

  • Book Excerpt from "Safe C++: Fixing Memory Safety Issues"
  • by David Spuler

Code Reliability

Code reliability means that the execution is predictable and produces the desired results. Sequential coding is hard enough, but parallelized code of any kind (e.g., CPU or GPU vectorization, multi-threaded, multi-GPU, etc.) multiplies this complexity by another order of magnitude. Hence, starting with the basics of high quality coding practices are ever more important for code reliability, such as:

  • Unit tests
  • Assertions
  • Self-testing code
  • Debug tracing methods
  • Automated system tests
  • Function argument validation
  • Error detection (e.g., starting with checking error return codes)
  • Exception handling (wrapping code in a full exception handling stack)
  • Resilience and failure tolerance
  • Regression testing
  • Test automation
  • Test coverage measurement

One useful method of catching program failures is making the program apply checks to itself. Assertions and other self-testing code have the major advantage that they will catch such errors early, rather than letting the program continue, and cause a failure much later.

All of these techniques involve a significant chunk of extra coding work. Theory says that full exception handling can be 80% of a finalized software product, so it's a four-fold amount of extra work! Maybe that estimate is a little outdated, given improvements in modern tech stacks, but it still contains many grains of truth.

There are many programming tools to help improve code reliability throughout the development, testing and debugging cycle:

  • Memory debugging tools (e.g., valgrind).
  • Performance profiling tools (for “de-slugging”).
  • Synchronization debugging tools (e.g., race condition checkers).
  • Memory usage tracking (i.e., memory leaks and allocated memory measurement).
  • Interactive debugging tools (IDE and command-line).
  • Static analysis tools (“linters”) or turn on more compiler warnings.
  • Bug tracking databases (for cussing at each other).

Refactoring versus Rewriting

Refactoring was something I was doing for years, but I called it “code cleanup.” The seminal work on refactoring is Martin Fowler's book “Refactoring” from 1999. This was the first work to gain traction in popularizing and formalizing the ideas of cleaning up code into a disciplined approach.

Refactoring is a code maintenance task that you do mainly for code quality reasons, and it needs to be considered an overhead cost. True refactoring does not add any new functionality for customers, and marketing won't be happy if you do refactoring all day long. But refactoring is a powerful way to achieve consistency in code quality and adhere to principles such as DRY. In highly technical special cases such as writing an API, you'll need to refactor multiple times until the API is “good.”

Rewriting is where you pick up the dusty server containing the old source code repo, walk over to the office window and toss it out. You watch it smash ten floors below, drive over to CompUSA to buy a new server, and then start tapping away with a big smile on your face.

The goals of refactoring and rewriting are quite different. Refactoring aims to:

  • Make the existing code “better” (e.g., modularized, layered).
  • Add unit testing and other formality.
  • Retain all the old features and functionality.
  • Not add any new functionality.

Rewriting projects tend to:

  • Throw away all the existing code.
  • Choose a new tech stack, new UI, new tools, etc.
  • Not support backward compatibility.
  • Add some new functionality.

Refactoring and rewriting are very close together, and there's a lot of middle ground between them. If you're fixing some old code by rewriting one of the main modules, is it refactoring or rewriting?

The reality is that rewriting versus refactoring is always an engineering choice, and it's a difficult one without a clear right or wrong answer. You can't try both to see which one works better, so there's never any proof either way.

Defensive Programming

Defensive programming is a mindset where you assume that everything will go wrong. The user input will be garbage. Anyone else's code will be broken. The operating system intrinsics will fail. And your poor helpless C++ application needs to keep chugging along.

Many of the high-level types of defensive coding are discussed elsewhere in this book. Good practices that attempt to prevent bugs include: assertions, self-testing code, unit tests, regression tests, check return codes, validate incoming parameter values, exception handling, error logging, debug tracing code, warning-free compilation, memory debugging tools, static analysis tools, document your code, and call your mother on Sunday.

Using Compiler Errors for Good, not Evil: One of the advanced types of defensive programming is to intentionally trigger compiler errors that prevent compilation. For example, you can enforce security coding policies:

    #define tmpnam dont_use_tmpnam_please

Or if you are using debug wrappers for some low-level system functions, you can enforce that:

    #define memset please_use_memset_wrapper

Politeness is always required. You don't want to be rude to your colleagues.

Defensive Coding Style Policies: You might want to consider some specific bug-prevention coding styles, for defensive programming, maintainability, and general reliability. Some examples might be:

  • All variables must be initialized when declared. Don't want to see this anymore: int x;
  • All switch statements need a default clause.
  • Null the pointer after every delete. You can define a macro to help.
  • Null the pointer after every free. If you use a debug wrapper for free, make it pass-by-reference and NULL the pointer's value insider the wrapper function.
  • Null the file pointer after fclose. Also can be nulled by a wrapper function.
  • Unreachable code should be marked as such with an assertion (a special type).
  • Prefer inline functions to preprocessor macros.
  • Define numeric constants using const rather than #define.
  • Validate enum variables are in range. Add a dummy EOL item at the end of an enum list, which can be used as an upper-bound to range-check any enum has a valid value. Define a self-test macro to range-check the value.
  • Use [[nodiscard]] attributes for functions. All of them.
  • Start different enums at different numbers (e.g., token numbers start at 10,000 and some other IDs start at 200,000), so that they can't get mixed up, even if they end up in int variables. And you'll need a bottom and top value to range-check their validity. You have to remove the commas from these numbers, though!
  • All allocated memory must be zeroed. This might be a policy for each coder, or it could be auto-handled by intercepting the new operator and malloc/calloc into debug wrappers, and only returning cleared memory.
  • Constructors should use memset to zero their own memory. This seems like bad coding style in a way, but how many times have you forgotten to initialize a data member in a constructor?
  • Zero means “not set” for every flag, enum, status code, etc. This is a policy supporting the “zero all memory” defensive idea.

Assume failures will happen: Plan ahead to make failures easier to detect and debug (supportability!), even when they happen in production code:

  • Use extra messages in assertions, and make them brief but useful.
  • If an assertion keeps failing in testing, or fails in production for users, change it to more detailed self-checking code that emits a more detailed error.
  • Add unique code numbers to error messages to make identifying causes easier (supportability).
  • Separately check different error occurrences. Don't use only one combined assertion: assert(s && *s);
  • Review assertions for cases where lazy code jockeys have used them to check return codes (e.g., file not found).

Maintainability

My first Software Engineer job was maintenance of low-level systems management on a lumbering Ultrix box in C code, with hardly any comments. You'd think I hate code maintenance, right? No, I had the opposite reaction: it was the best job ever!

If you think you don't like code maintenance, consider this: Code maintenance is what you do every day. I mean, except for those rare days where you're starting a new project from scratch, you're either maintaining your own code or someone else's, or both. There are two main modes: you're either debugging issues or extending the product with new features, but in both cases it is at some level a maintenance activity.

So, how do you improve future maintainability of code? And how do you fix up old code that's landed on your desk, flapping around like a seagull, because your company acquired a small startup.

Let's consider your existing code. How would you make your code better so that a future new hire can be quickly productive? The answer is probably not that different to the general approach to improving reliability of your code. Things like unit tests, regression testing, exception handling, and so on will make it easier for a new hire. You can't stop that college intern from re-naming all the source code files or re-indenting the whole codebase, but at least you can help them to not break stuff.

One way to think about future maintainability is to take a step back and think of it as a “new hire induction” problem. After you've shown your new colleague the ping pong table in the lunch room and the restrooms, they need to know:

  • Where is the code, and how do I check it out of the repo?
  • How do I build it? Run it? Test it?
  • Where's the bug database, requirements documents, or enhancements list?
  • What are the big code libraries? Which directories?

After that, then you can get into the nitty-gritty of how the C++ is laid out. Where are the utility libraries that handle low-level things like files, memory allocation, strings, hash tables, and whatnot? Where do I add a new unit test? A new command-line argument or configuration property?

Maintenance safety nets: How do you make your actual C++ code resilient to the onslaught of a new hire programmer? Assume that future changes to the code will often introduce bugs, and try to plan ahead to catch them using various coding tricks. Actually, the big things in preventing future bugs are the large code reliability techniques (e.g., unit tests, assertions, comment your code, blah blah blah). There are a lot of little things you can do, which are really quite marginal compared to the big things, but are much more fun, so here's my list:

  • All variables should be initialized, even if it'll be immediately overwritten (i.e.,“int x=3;” never just “int x;”). The temptation to not initialize is mainly from variables that are only declared so as to be passed into some other function to be set as a reference parameter. And yes, in this case, it's an intentional micro-inefficiency to protect against a future macro-crashability.
  • Unreachable code should be marked with at least a comment or preferably an attribute or assertion (e.g., use the “assert_not_reached” assertion idea).
  • Prefer statement blocks with curly braces to single-statements in any if, else, or loop body. Also for case and default. Use braces even if all fits on one line. Otherwise, some newbie will add a second statement, guaranteed.
  • Once-only initialization code that isn't in a constructor should also be protected (e.g., the “assert_once” idea).
  • All switch statements need a default (even if it just triggers an assertion).
  • Don't use case fallthrough, except it's allowed for Duff's Device and any other really cool code abuses. Tag it with [[fallthrough]] if you must use it.
  • Avoid preprocessor macros. Prefer inline functions rather than function-like macro tricks, and do named constants using const or enum names rather than #define. I've only used macros in this book for educational purposes, and you shouldn't even be looking at my dubious coding style.
  • Declare a dummy enum at the end of an enum list (e.g., “MyEnum_EOL_Dummy”), and use this EOL name in any range-checking of values of enum variables. Otherwise, it breaks when someone adds a new enum at the end. EOL means “end-of-list” if you were wondering.
  • Add some range-checking of your enum variables, because you forgot about that. Otherwise array indices and enum variables tend to get mixed up when you have a lot of int variables.
  • Assert the exact numeric values of a few random enum symbols, and put cuss words in the optional message, telling newbie programmers that they shouldn't add a new enum at the top of the list.
  • sizeof(varname) is better than sizeof(int) when someone changes it to long type. Similarly, use sizeof(arr[0]) and sizeof(*ptr). No, the * operator isn't live in sizeof.
  • All classes should have the “big four” (constructor, destructor, copy constructor, and assignment operator), even if they're silly, like when the destructor is just {}.
  • If your class should not ever be bitwise-copied, then declare a dummy copy constructor and assignment operator (i.e.,as “private” and without a function body), so the compiler prevents a newbie from accidentally doing something that would be an object bitwise copy.
  • If your code needs a mathematical constant, like the reciprocal of the square root of pi, just work it out on your calculator and type the number in directly. Job security.
  • A switch over an enum should usually have the default clause as an error or assertion. This detects the code maintenance situation where a newly added enum code isn't being handled.
  • Avoid long if-else-if sequences. They get confusing. They also break completely if someone adds a new “if” section in the middle, but forgets it should be “else if” instead.
  • Instigate a rule that whoever breaks the build has to bring kolaches tomorrow.

But don't sweat it. New hires will break your code, and then just comment out the unit test that fails.

Maintaining OPC. What about brand-new code? It's from that startup that got acquired, and it's really just a hacked-up prototype that should never have shipped. Now it's landed on your desk with a big red bow wrapped around it and a nice note from your boss telling you how much it'll be appreciated if you could have a little look at this. At least it's a challenge, and maybe you could even learn a little Italian, because that's the language the comments are written in.

So, refactoring has to be top of the list. You need to move code around so that it is modular, easier to unit test, and so on. Split out smaller functions and group all the low-level factory type routines. Writing some internal documentation about new code doesn't hurt either! And “canale” means “channel” in Italian so now you're bilingual.

Technical Debt

When programmers talk in disparaging tones about “technical debt” in code, what they often mean is that the code wasn't written “properly.” A prototype got shipped long ago, and was never designed well, or in fact, was never designed at all. Some other giveaways of high technical debt are basically:

  • Unit tests? That's someone else's job.
  • Documentation? Never heard of it. Oh, you meant code comments? We don't use those.
  • File Explorer is a source code control system.
  • And a backup tool.
  • Bug tracking tool? Do you mean the whiteboard?
  • Requirements documentation. Also the whiteboard.
  • Test plan? Eating free bananas while I test it.

Or to summarize all these points into one:

  • You work at an AI startup.

Debt-Free Code: The good news is that there is a popular software development paradigm that has zero technical debt. It's called Properly-Written Code (PWC) and programmers are always talking about it in hushed or strident tones. Personally, I've been watching for years, but haven't yet been fortunate enough to actually see any, but apparently it exists somewhere out in the wild, kind of like the Loch Ness Monster, but with semicolons.

Exactly what properly-written code means is rather vague, but the suggested solution is usually a refactor or a rewrite. Personally, I favor refactoring, because I think that technical debt gets increased by rewrites, because the brand-new code:

    a) Lacks unit tests.

    b) Lacks internal documentation.

    c) Hasn't been “tested by fire” in real customer usage.

    d) Hasn't been tested by anyone, for that matter.

    e) Is a “version 1.0” no matter how you try to spin it.

So, here's my probably-unpopular list of suggestions for reducing technical debt without rewriting anything:

  • Comment your code!
  • Fix compiler warnings to get warning-free compilation.
  • Add more assertions and self-checking code.
  • Check return codes from system functions (e.g., file operations).
  • Add parameter validation checks to your functions.
  • Add debug wrapper functions for selected system calls.
  • Add automated tests (unit tests or regression tests).
  • Port the platform-independent code modules to another platform. Even if only to get compiler warnings and run tests.
  • Add performance instrumentation (i.e.,time).
  • Add memory usage instrumentation (i.e.,space).
  • Add file usage instrumentation.
  • Document the architecture, APIs, classes, data formats, or interfaces. With words.
  • Add unique codes to error messages (for supportability).
  • Document your DevOps procedures.
  • Run nightly builds, and with tests running, too.
  • Do a backup once in a while.

And if you're at a startup or a new project, get your tools sorted out for professional software development workflows:

  • Compilers and IDEs
  • Memory error detection
  • Source code control (e.g., svn or git or cvs)
  • CI/CD/CT build system
  • Bug tracking system
  • Internal documentation tools
  • User support database

What really makes better code? Well, that's a rather big question about the entirety of software development practices, so I'll offer only one final suggestion: humans. My overarching view is that the quality of code is most impacted by the ability and motivation of the programmers, rather than by new tools or a trendy programming language (or even an AI coding copilot). A small team that is “on fire” can outpace a hundred coders sitting in meetings talking about the right way to do agile development processes. Hence, morale of the team is important, too.

 

Online: Table of Contents

PDF: Free PDF book download

Buy: Safe C++: Fixing Memory Safety Issues

Safe C++ Safe C++: Fixing Memory Safety Issues:
  • The memory safety debate
  • Memory and non-memory safety
  • Pragmatic approach to safe C++
  • Rust versus C++
  • DIY memory safety methods
  • Safe standard C++ library

Get it from Amazon: Safe C++: Fixing Memory Safety Issues