Aussie AI
Chapter 12. Zero Runtime Cost Operations
-
Book Excerpt from "C++ AVX Optimization: CPU SIMD Vectorization"
-
by David Spuler, Ph.D.
Chapter 12. Zero Runtime Cost Operations
You want free CPU cycles? You got it! There are plenty of “freebies” in C++!
We’ve already talked about compile-time operations in C++, but here’s a summary of some of the “hints” you can give to the compiler for a free gain, usually via helping the optimizer to do fancier optimizations:
inlinetemplateconstconstexpr(alsoconstevalandconstinit)noexceptstatic_assert- Restricted pointers (e.g.,
__restrict) likely/unlikelyor__builtin_expect(expressions)[[likely]]and[[unlikely]]path attributes
I’ve missed a bunch of them, so you should re-read those chapters. Those are well-known optimizations via programmer hints.
Here are some other ones that are useful. If you see these keywords, these are free or compile-time operations:
autotypes (type deduction)decltypefinaloverrideexplicit[[nodiscard]](function attribute)= delete
But there’s always more. Here are some advanced C++ language features that you might think cost real CPU juice, but are free for various language design reasons:
- Type traits — compile-time type operators (not RTTI).
- Concepts (C++20) — compile-time guarantees.
- Static reflection (C++26) — fixing RTTI inefficiencies.
- Profiles — safety with compile-time validation.
- Curious Recurring Template Pattern (CRTP) — useful for devirtualization.
- Structured bindings — grouped assignments are compile-time processed.
Type traits are a form of Compile-Time Type Information (CTTI)
and work at compile-time.
Some examples are operations like std::is_trivial or std::is_same.
However, note that you have to be careful not to move across into the
darker side of RTTI, which is dynamic_cast and typeid.
Free Type Cast Operations
There are various arithmetic operations that can look real, but actually disappear in a puff of compiler smoke. The first item on the list is type casts, which have many freebies:
reinterpret_caststatic_castconst_caststd::move(move semantics)std::forward(perfect forwarding)
Note that std::move is effectively a compile-time type cast,
which turns an l-value into an r-value (I’m simplifying the idea here).
However, there are also overloaded versions of std::move
with two or more arguments that really
do move bytes at runtime (effectively doing memcpy), so be aware
of the distinction
between free uses of std::move for move semantics versus real byte movers.
Arithmetic type casts between similarly represented numbers can often be optimized away. For example, these are usually free, or at least very fast:
- Downsizing integer type casts (e.g.,
inttochar). - Upsizing integer type casts (e.g.,
chartoint) - Floating-point type conversions (e.g.,
floattodouble)
Differently sized integer types seem like they would cost real instructions
to convert between them.
If a char is one byte and an int is four bytes, you’d think there’s an
operation that adds or removes three bytes.
However, the compiler has many tricks up its sleeves here, such as:
- Copy propagation
- Register allocation
- Peephole optimizations
This is often true of the conversions between any of the many and varied integer types,
from a 1-byte char to a 16-byte long long.
In the cases where the compiler cannot find a way to do it freely,
the operation is very inexpensive anyway.
But note that not all type casts are free. In particular, converting between integers and floating-point types is expensive, in both directions, because the way these two types of values are represented is very different. Be careful with explicit type casts, but also any expressions that mix integer and floating-point types may have implicit type casts.
Optimized Away
Here’s a somewhat random list of stuff that should get optimized away by the compiler. We can be reasonably sure these are free:
- Constant expressions (via “constant folding” and newer
constexprfeatures) - Small getter member functions (via inlining)
- Null-effect expressions (useful for compiling-out assertions)
- Unnecessary temporary variables (removed by copy propagation, peephole optimizations, and register allocation)
- Wrongly typed constants (e.g., using
1or1Uor1.0or1.0fshould be implicitly type-converted at compile-time). - Double negation (using “
!!(x)” is a common trick). - Algebraic simplifications (e.g., plus zero, subtract zero, times one, and many more).
- Explicit zero conditional tests (e.g.,
if (x != 0)orif (ptr != nullptr)equates toif(x)orif(ptr)at runtime). - First data member in an object or structure (it’s offset is zero, so there’s a “plus zero” in the address calculation that is optimized away).
- Assertions and
#if DEBUG(if compiled-out for production).
The compiler optimization of “dead code elimination” will make these control flow features free:
while(1)— usingfor(;;)isn’t faster!if(true)orif(1)orif(0)or whateverdo...while(0)— a common macro trick.- Short-circuited constants in
||or&&operators - Tested constants in the ?: ternary operator
You can always check the assembly code with “gcc -S”
or the MSVS assembly debug window.
Standard Container Operations
A lot of the standard containers have many optimized specializations for builtin types.
Hence, if you’re using std::vector<int>,
you can expect operations like push_back are inlined and very fast.
All of the contiguous containers have a simple structure,
and the non-contiguous linked containers would maintain incremental variables,
making begin() and end() calls very fast.
Similarly, most of the containers maintain an incrementer counter of objects inside, so
all calls to std::size are as fast as a getter accessing an integer data member
(inlined, of course).
There are some relatively simple standard C++ data types where operations can often be inlined or optimized away by the compiler:
std::pairstd::tuplestd::optionalstd::expectedstd::variant(modern C++ unions)
Finally, note that some calls to containers can lead to memory allocations, which is a slowdown. And various containers when used on your own non-scalar objects can trigger many calls to constructors or assignment operators, which is slow regardless of whether it calls copy or move versions. I mean, moving is better than copying an object, but the optimizer can only do so much.
The Opposite of Free
There are also features of C++ that look like they should be free, but are actually costly. Perhaps we should call them “costlies”?
Elegance and the beauty of short code sequences is not the same thing as fast. Here are some examples of beautiful things that can be slow:
- Calls to
virtualfunctions - RTTI (i.e.,
dynamic_castandtypeid) - Lambdas, functors and other function objects
std::function- Comparators (except maybe standard ones like
std::less) - Fold expressions
- Exception handling
The issue with lambdas and function objects is not clear-cut.
If you use a lambda with a simple capture and an immediate assignment to a functor variable,
which is then called, the optimizer probably can handle this and inline the function call.
However, if you declare your own complex lambda as a comparator that is sent to a function
(e.g., to std::sort),
all of the calls to that lambda are probably not inlined,
leading to a performance bottleneck.
Also, if you use a builtin comparator like std::greater and pass it to std::sort or other library functions,
it’s likely that the operation has a pre-coded template specialization for that comparator,
meaning it won’t
really be using it as a function call.
However, you might want to benchmark this or look at the standard library source to confirm
there is such a specialization!
And here are some more slugs that are less obvious, because the code is concise and looks like it should be fast:
- Operator overloading (looks like a single instruction, but it’s a function call, even if it’s inlined).
- Initializer lists (can call lots of copy constructors).
- Pointer-to-function types (cannot be inlined).
- Implicit type conversions (especially via overloaded type cast operators).
- Temporary object creation (accidental)
- Type casts between
intandfloat(explicit or implicit) - Container
resize()calls
Modern C++ is becoming such a complex language with conflicting goals of elegance and performance, so it’s hard to know which things are freebies or costlies.
|
• Online: Table of Contents • PDF: Free PDF book download • Buy: C++ AVX Optimization |
|
C++ AVX Optimization: CPU SIMD Vectorization:
Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization |