Aussie AI

Instruction Cache Locality

Book Excerpt from "C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations"

by David Spuler, Ph.D.

Instruction Cache Locality

The instruction cache stores recently executed machine code instructions in a CPU hardware cache. There’s also a separate mechanism of “instruction prefetching” to try to load the next instruction that will be executed. As part of this prefetching method, there’s also “branch prediction” in the CPU, which attempts to predict which of two branch directions will get chosen.

To get the best out of these instruction speedups, our C++ code should generally use:

Short and tight loops
Fewer branches

Keeping loops short will mean that the CPU stays within the same block of code, maximizing the chances that it already has an instruction in its cache. Interestingly, this means that some common code optimizations can be bad for instruction cache locality:

Inlining of functions
Loop unrolling

Both of these can cut both ways, since they both reduce branches, but also lengthen code blocks. Whenever you’re tempted to maximize your use of such optimizations, think about the plight of the poor instruction cache as it tries to keep up.

Branches are another separate issue from short code blocks. In fact, long code sequences of compute instructions are fine for branch prediction. To maximize the CPU’s branch prediction capability, we should either have few branches, or at least have very predictable branches. At the limit, we could use branchless programming, which is a set of tricks to get rid of branches. See Chapter 4 for more on branch prediction and branchless coding methods.

Ultra-Low Latency C++ Book:

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: C++ Ultra-Low Latency

C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:

Low-level C++ efficiency techniques
C++ multithreading optimizations
AI LLM inference backend speedups
Low latency data structures
Multithreading optimizations
General C++ optimizations

Get your copy from Amazon: C++ Ultra-Low Latency

Aussie AI

Instruction Cache Locality

Instruction Cache Locality

Quick Links

Product

New to Writing?

Writing Styles