Aussie AI

Instruction Reordering Optimizations

  • Book Excerpt from "C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations"
  • by David Spuler, Ph.D.

Instruction reordering is a type of Instruction-Level Parallelism (ILP), and is an optimization performed inside the CPU where it actually runs the machine code instructions out-of-order. The way this works in simple terms is:

  • Delay any opcodes that don’t have the data they need (e.g., from memory).
  • Run any instructions that are ready as soon as possible.

There’s a whole smash of fun to be had researching how this all works in the CPU. There are schedulers and “stations” and various queues and caches. Kudos to all those hardware engineers.

Another special type of fun is for compiler engineers. GCC does a lot of fancy optimizations in the code generation backend in terms of taking advantage of instruction orders.

But what about C++? Is there anything you can do in C++ to optimize your code? Or with inline assembly instructions?

Safety first. Most of the discussion of out-of-order execution and C++ occurs in relation to safety. Problems can arise across multiple threads if the reads and writes from our C++ statements are running out-of-order. I mean, how can it be good to just run my C++ code in any random order that the CPU chooses?

The issue of preventing out-of-order errors involves “memory order.” These are especially useful for correctly implementing lock-free algorithms with atomics, but they also act as memory barriers that can prevent any undesirable types of out-of-order execution.

 

Ultra-Low Latency C++ Book:

Online: Table of Contents

PDF: Free PDF book download

Buy: C++ Ultra-Low Latency

C++ Ultra-Low Latency C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:
  • Low-level C++ efficiency techniques
  • C++ multithreading optimizations
  • AI LLM inference backend speedups
  • Low latency data structures
  • Multithreading optimizations
  • General C++ optimizations

Get your copy from Amazon: C++ Ultra-Low Latency