Aussie AI

Chapter 34. Core Pinning

  • Book Excerpt from "C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations"
  • by David Spuler, Ph.D.

Chapter 34. Core Pinning

What is Core Pinning?

Core pinning is a multithreading optimization where a thread is “pinned” to one of the cores to give it higher priority. This means that important thread that runs the hotpath can have guaranteed CPU availability, rather than waiting for the default thread scheduling algorithms. Hence, core pinning can be a solution to avoid lock contention worries or excessive context switch in the main hotpath thread.

Core pinning is also called “thread affinity” and has multiple other names (e.g., “processor affinity” or “CPU affinity” or “CPU pinning”), but if you hear the words “pinning” or “affinity” in relation to threads, this is it.

Pinning has other meanings in related architectures. There’s a higher-level type of pinning whereby whole processes or applications are pinned to a CPU core by the operating system, rather than just a single thread, which isn’t quite the same thing. Note also that CUDA C++ has another type of “pinned memory” for GPUs, but that’s a memory upload optimization rather than a compute improvement.

The other side of core pinning is that you obviously don’t pin the less important threads. All the lower-priority threads have fewer cores available, and are downgraded.

Pros and Cons

The use of core pinning is a very powerful type of hotpath optimization. The main pathways are super-optimized because of these factors:

  • No context switches
  • Fewer cache misses (no invalidated caches)
  • Highest priority execution
  • Guaranteed core availability (no delay)

The downsides are fairly obvious:

  • That core isn’t available for other work.
  • Load balancing only available on the other cores.

And also, you can’t do it too many times, because the CPU only has a fixed number of cores.

Counting Cores

The code to set up core pinning is really a two-part procedure with these steps:

    1. Determine how many CPU cores are available.

    2. Pin a thread to one of them.

There are various non-standard ways to interrogate the system for its CPU settings. The standard method is to call hardware_concurrency() in the standard thread library, which tells you how many physical cores are in the CPU.

    int number_of_cores() 
    {
        return std::thread::hardware_concurrency();
    }

This has been a standard method since C++11, so it should be available to you. Alternatively, non-standard methods include:

  • sysconf() — POSIX version in <unistd.h> for Linux.
  • GetSystemInfo() — Win32 API in <windows.h>.
  • __cpuid() — low-level intrinsic function in <cpuid.h> that wraps the CPUID machine instruction on x86 CPUs (Intel/AMD).

All of these functions offer a whole wealth of other hardware information about the CPU, rather than just the number of cores.

Setting Up Core Pinning

There’s no standard way to set up core pinning using the C++11 std::thread library, nor does anything appear forthcoming in C++26 for this area. However, there are longstanding platform-specific functions to do this.

Sometimes, you don’t need to code up core pinning in C++, but can use OS settings or commands. On Windows, you can set up a process-level CPU pinning for an application via the GUI. On Linux, there is a “taskset” command that allows running a program with core pinning.

Both Windows and Linux have non-standard C++ system calls that can set up core pinning for either a process or a thread. Linux uses the “pthreads” library to do core pinning, and Windows has some Win32 features. The sequence at a high-level looks like:

    1. Get a native thread id

    2. Call the platform-specific core pinning API.

To implement core pinning in C++ on Linux you need to bypass std::thread to get to the underlying POSIX thread id, which has type pthread_t as defined in <pthread.h>. This is required because the core pinning calls are POSIX functions on Linux. There are at least two ways to do this:

  • pthread_self() — POSIX call to return the id of the current thread.
  • std::thread::native_handle() — returns the “native” thread ID of a standard C++ thread object, which is a POSIX thread id on Linux.

Once you have a valid thread id, then you can set up core pinning for that thread. The programmatic C++ APIs on Linux are:

  • Pinning processes — sched_setaffinity()
  • Pinning threads — pthread_setaffinity_np() or pthread_attr_setaffinity_np()

On Windows, these are the C++ APIs:

  • Pinning processes — SetProcessAffinityMask()
  • Pinning threads — SetThreadAffinityMask()

Now let’s look at a full example on Linux.

Linux Core Pinning

Here’s a native pthreads sequence to pin the current thread to a core:

    #include <pthread.h>
    #include <unistd.h>
    #include <sched.h>

    bool pin_me(int corenum)
    { 
        pthread_t tid = pthread_self(); // Get current thread id
        cpu_set_t cpuset;
        CPU_ZERO(&cpuset);         // Clear all core bit flags
        CPU_SET(corenum, &cpuset);  // Set one core bit flag
        // Pin the thread!
        int ret = pthread_setaffinity_np(tid, sizeof(cpuset), &cpuset);
        return ret == 0;  // Zero return is success
    }

Note that failures can occur when attempting to pin a thread to a core. The process needs adequate permissions to do so, and the core number needs to be valid for the given system.

This code uses “cpu_set_t” from <sched.h>, which is a bitmask (or other data structure) that represents a mask of one or more cores. There are various bit manipulation macros also defined in <sched.h> for use with this bitmask type:

  • CPU_ZERO() — clears all the bits.
  • CPU_SET() — sets one bit.
  • CPU_CLR() — unsets one bit.
  • CPU_ISSET() — tests one bit.
  • CPU_COUNT() — counts how many bits are bit.

There are also some arithmetic operations on the CPU bit sets in <sched.h>:

  • CPU_EQUAL() — test if two bitsets are equal.
  • CPU_AND() — bitwise-and on all bits.
  • CPU_OR() — bitwise-or on all bits.
  • CPU_XOR() — bitwise-xor on all bits.

The CPU bitmask type cpu_set_t is not a C++ object, but a raw C-like structure, which means it can be copied or moved by bitwise copy using memcpy.

Note that pthread_setaffinity_np() can be passed a CPU set with more than one bit set, in which case the thread will be migrated to one of those cores. You can also examine the thread affinity bitmasks via pthread_getaffinity_np().

Isolating Linux Cores

To fully implement core pinning of a thread to a particular core on Linux, some further actions may be needed. Changes are required to Linux kernel settings to do things like:

  • Isolating the core
  • Disabling interrupts

Some of the Linux kernel parameters you may need to adjust include:

  • nohz or nohz_full
  • isolcpus
  • irqaffinity
  • rcu_nocbs

There is some industry wisdom to avoid core zero on Linux systems, because that’s the CPU core that the kernel always tries to run system tasks on, as described in Bernhardt (2023). There’s also a discussion of some odd issues with core 1 on Linux in Dawson (2023).

References

  1. Machinet, March 13, 2024, How to optimize C++ code for use in high-frequency trading algorithms? https://www.machinet.net/tutorial-eng/optimize-cpp-code-high-frequency-trading-algorithms
  2. Dung Le, Aug 13, 2020, Optimizations for C++ multi-threaded programming, https://medium.com/distributed-knowledge/optimizations-for-c-multi-threaded-programs-33284dee5e9c
  3. Larry Jones, 27 Feb 2025, Mastering Concurrency and Multithreading in C++: Unlock the Secrets of Expert-Level Skills, https://www.amazon.com.au/Mastering-Concurrency-Multithreading-Secrets-Expert-Level-ebook/dp/B0DYSB519C/
  4. Eli Bendersky, January 17, 2016, C++11 threads, affinity and hyperthreading, https://eli.thegreenplace.net/2016/c11-threads-affinity-and-hyperthreading/
  5. Bytefreaks, 23 November 2016, C/C++: Set Affinity to process thread – Example Code 3, https://bytefreaks.net/programming-2/c/cc-set-affinity-to-process-thread-example-code
  6. Mark Dawson, Jr., February 12, 2023, My Fear of Commitment to the 1st CPU Core, https://www.jabperf.com/my-fear-of-commitment-to-the-1st-cpu-core/ (avoiding core 1 for CPU affinity).
  7. Manuel Bernhardt, 16 Nov, 2023, On pinning and isolating CPU cores, https://manuel.bernhardt.io/posts/2023-11-16-core-pinning/ (examines costs of arithmetic operations versus cache mispredictions and context switches).
  8. Davood Ghatreh Samani, Chavit Denninnart, Josef Bacik, Mohsen Amini Salehi, 3 Jun 2020, The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms, https://arxiv.org/abs/2006.02055
  9. Kernel.org, May 2025 (accessed), The kernel’s command-line parameters, https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html

 

Online: Table of Contents

PDF: Free PDF book download

Buy: C++ Ultra-Low Latency

C++ Ultra-Low Latency C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:
  • Low-level C++ efficiency techniques
  • C++ multithreading optimizations
  • AI LLM inference backend speedups
  • Low latency data structures
  • Multithreading optimizations
  • General C++ optimizations

Get your copy from Amazon: C++ Ultra-Low Latency