Aussie AI
Chapter 11. Linters and Static Analysis
-
Book Excerpt from "Safe C++: Fixing Memory Safety Issues"
-
by David Spuler
Linters for C++
Linters, or “static analyzers,” are tools that examine your source code for errors or stylistic concerns. The main advantage of these tools is that they improve safety "for free" without any runtime impact for your customers. Judicious use of static checkers on your C++ source code can detect a variety of errors before they impact customers. The main advantages of linters include:
- Detect coding errors before the code is even run.
- Both security vulnerabilities and bugs can be flagged.
- General improvement in coding quality.
- Reduced debugging time because the number of live bugs is reduced.
- Can be used for stylistic issues or coding policy guidelines.
There are also some linters that focus on "reformatting" and "beautification" of source code. Similarly, there are source code analysis tools that aim to auto-generate internal code documentation. I'm not really talking about those ones in this section. I'm hunting bugs!
Linters are not for everyone, and are less popular with developers than runtime memory checkers. Disadvantages of linters include:
- Additional cost and time to implement and address issues in an ongoing way.
- Fixing harmless warnings (e.g., need to fix warnings that aren't real bugs).
- Stylistic warnings are never popular with developers.
- Configuration of some linters is onerous (e.g., they need all the include paths setup).
You should consider linters as an additional safety technique, which is orthogonal to runtime techniques such as runtime memory checking tools. Linting is an add-on technique for additional improvements to overall quality. General advice in regard to using linters for C++ programming is:
- Use compiler warnings as free linting.
- Turn off less serious stylistic warnings when introducing linting.
- Use a separate linter build sequence.
- Have two linter paths (i.e., one for bugs, one for style).
- Use multiple compilers and linters for extra coverage.
- Automate linting into the nightly build.
In the past, linters have gained a somewhat poor reputation because of two factors:
- Not finding many bugs, and
- Emitting a huge swathe of warnings for minor, stylistic nitpicks.
These concerns are largely no longer true of both open source and commercial linting tools. Linting tools can now detect a huge range of real bugs in your code, and many can be focused to only emit serious bugs or security vulnerabilities. You can, of course, turn on all of the stylistic warnings if you want to use a linter for enforcing a company-wide C++ coding policy, in which case, they won't be popular with the team.
Using GCC as a Linter
If you want more warnings, and who doesn’t, you can enable more warnings
in gcc on Linux.
You can either do this in your main build by enabling more compiler warnings,
or use a separate build path (e.g., choose an inspiring name like: “make lint”)
so that the main build is not inundated with new warnings.
There are some gcc flags that are specific to static analysis of source code:
-fanalyzer— enables the static analyzer.-Wanalyzer-SUBAREA— controls the static analyzer's warnings.
Some useful gcc warning flags include:
-Wall— “all” warnings (well, actually, some).-Wextra— the “extra” warnings not enabled by “-Wall”.-Wpedantic— yet more of the fun ones.
You know, I really cannot say that I am a fan of endlessly scrolling warnings
from the “pedantic” mode.
Maybe, turn that one off, and pick-and-choose from the list of flags
in the “pedantic” list.
For example, I have used “-Wpointer-arith” in projects.
Fixing Linter Warnings
Here’s some advice about fixing the code to address linter concerns:
- Aim for a warning-free compilation of bug-level messages.
- Don’t overdo code changes to fix any stylistic complaints.
Fix the bugs found by warnings (obviously), but as far as the stylistic type warnings are concerned, be picky. I say, aim for code quality and resilience, not code aesthetic perfection.
Warning-free linting. As with the main build, if you’re not fixing the less severe linter warnings, turn them off, or have two separate build sequences for the main anti-bug linting versus stylistic linting. You want any newly found serious problems to be visible, not lost in a stream of a hundred other spurious warnings. Hence, high quality code requires achieving a warning-free linting status for the main warnings.
On the other hand, you don’t want programmers doing too much “busy work” fixing minor coding style warnings with little practical impact on code reliability. Hence, you might find that your policy of “warning-free linting” needs to suppress some of the pickier warnings. And that’ll be a fun meeting to have.
Linter Products
There are many existing linter tools that are available in open source or commercially. Some examples include:
- Sonar Lint
cppcheckcpplintoclintclang-tidy
Existing compilers and IDEs also include linters and static analysis tools:
- Microsoft Visual Studio's "static analyzer"
- Clang static analyzer
- GCC static analyzer
There are many more. Wikipedia has an extensive list of them on its "List of tools for static code analysis" page.
Note that we have an active project for a C++ linter. Find more information about Aussie Lint at https://www.aussieai.com/safe/projects.
Linter Capabilities
There are many linters available, and a whole range of features. There are various different types of linting capabilities that you might consider in a project:
- Bug detection
- Security vulnerability detection
- Coding policy adherence
Note that the state-of-the-art has progressed rapidly in the area of static analysis. These tools can identify a variety of pitfalls in C++ programming, including:
- Lexcical oddities (e.g., nested comments).
- Preprocessor errors (e.g., macro operator precedence errors).
- Expression errors (e.g., wrong logical operators).
- Control flow errors (e.g., unreachable code, never-failing conditions, etc.)
- C++ class errors (e.g., consistent types of constructor and destructor declarations).
- Function call graph errors (e.g., indirect recursion)
Some examples of specific and simple warnings in coding style may include:
- Deprecated functions (e.g.,
getsversusfgets). - Inefficient older functions (e.g.,
randandsrand). - Security-vulnerable functions (e.g.,
tmpnam). - Buffer-overflow prone functions (e.g.,
sprintf). - Unsafe functions to change to "safe" versions (e.g.,
strcpyversusstrncpy).
Linter Research
Linters have an advanced base of theory these days. It's similar to compiler design theory, but with a different focus, since linters do not need the "code generation" phase of compilation. Some of the main techniques that linters use include:
- Expression trees
- Control flow graphs
- Function call graphs
Expression trees expression operator precedence and parenthesized sub-expressions into a hierarchical tree. This is not a graph, since there are no cross-edges between subtrees.
Control flow graphs express the flow of control through a function
or a code block.
These primarily focus on if statements, loops, and switch statements.
Aspects of short-circuited operators and the ternary operator may sometimes be involved,
or these may handled be in the expression trees.
Note that control flow graphs may contain cycles due to loops.
Function call graphs express the hierarchy of function calls.
Never-returning functions such as exit or abort need to be handled specially.
This analysis is primarily based on the static calls to function names,
and will have difficulty if virtual functions or function pointers are used for dynamic function calls.
Nevertheless, the call graph can be useful to detect various errors.
Note that the call graph may contain cycles in the event that recursion, directly or indirectly,
is used in any functions.
Variable analysis. Particularly interesting is that static analyzers now use "flow propagation" to track errors throughout execution pathways. The idea is similar to a compiler's "constant propagation" but can relate to categories of values for a variable, rather than just a single constant value. Aspects of the value of a variable can be propagated through expression trees and also along the edges of the control-flow graph. In advanced cases, it may also be propagated through the function call graph.
For example, pointers can be tracked as null versus non-null, as a simple binary condition. Integral variables can have sets of possible values propagated through control flow statements, so that always-succeed or always-fail tests on these variables can be detected.
The level of error detection from these approaches is quite amazing. If you've tried static analysis tools for C++ in the past, and been underwhelmed, you really should give them another try!
|
• Online: Table of Contents • PDF: Free PDF book download |
|
Safe C++: Fixing Memory Safety Issues:
Get it from Amazon: Safe C++: Fixing Memory Safety Issues |