Aussie AI
Chapter 3. CPU Platform Detection
-
Book Excerpt from "C++ AVX Optimization: CPU SIMD Vectorization"
-
by David Spuler, Ph.D.
Chapter 3. CPU Platform Detection
Portability Checking of AVX Versions
The power of AVX support has changed over the years, with different CPUs having different capabilities, not only with AVX, AVX-2 and AVX-512, but also their sub-releases. And it’s also a little unclear into the future, with reports that some of the newer Intel chips have AVX-512 disabled.
If you write some code using AVX-512 intrinsics, and compile your C++ into an executable with the AVX-512 flags on, and then it runs on a lower-capability CPU without AVX-512, what happens? Do the AVX-512 intrinsics fail, or are they simulated somehow so that they’re slower but still work?
Answer: kaboom on MSVS.
In the MSVS IDE, if you try to call these intrinsics on a CPU that doesn’t support it, you get “unhandled exception: illegal instruction.” In other words, the C++ compiler still emits the AVX-512 instruction codes, but they aren’t valid, so it excepts at runtime.
Hence, the calls to AVX-512 are not emulated at run-time on lower-capability CPUs. And they aren’t checked, either. That’s up to you!
Preprocessor Macro Tests
Firstly, you cannot generally use the preprocessor to decide what version of AVX you have (if any). This only works if:
1. There’s only one platform, and
2. You’re compiling on (or for) the same platform that will run the binary.
In other words, it’s either you and your one box doing everything, or else you’re carefully maintaining lots of different executable binaries for each platform.
Note that you can modify the default CPU platform target via compiler mode settings. During compilation, you can either take whatever platform you’re on, or you can modify the setting with compiler flags for different compile-time platform effects:
-mavx— GCC/Clang compiler-march=native— GCC/Clang compiler/arch:AVX— MSVC compiler/arch:AVX2— MSVC compiler
In those limited circumstances, you can use the builtin preprocessor macros:
__AVX____AVX2____AVX512F__
There are also the SSE versions of these macros:
__MMX____SSE____SSE2____SSE3____SSE4_1____SSE4_2__
There are also some macros for specific types of CPU functionality or individual machine codes:
__FMA__— fused multiply-add.__BMI__— bit manipulation instructions.__POPCNT__— popcount (set bits count instruction).
If you’re also supporting non-AVX platforms, your AVX code probably should have a check like this somewhere :
#if defined(_M_ARM) || defined(_M_ARM64) || defined(_M_HYBRID_X86_ARM64)
|| defined(_M_ARM64EC) || __arm__ || __aarch64__
#error AVX not supported on ARM platform
#endif
Source: GGML AI inference backend open-source code (see Appendix for license details).
Runtime CPU Feature Checking
In general, for shipping a binary to customers,
you can’t test #if or #ifdef for whether you’ve got AVX-512 in the CPU or not.
You can use the preprocessor to distinguish between different platforms
where you’ll compile a separate binary (e.g., ARM Neon for phones or Apple M1/M2/M3 chipsets).
Preprocessing checks can help with the non-AVX platforms, but not so much on x86 CPUs. You cannot choose between AVX, AVX-2, and AVX-512 at compile-time, unless you really plan to ship three separate binary executables. Well, you probably could do this if you really, really wanted to. Go ahead, prove me wrong!
The other thing you don’t really want to do is low-level testing of capabilities. You don’t want to test a flag right in front of every AVX-512 intrinsic call. Otherwise, you’ll lose most of the speedup benefits. Instead, you want this test done much higher up, and then have multiple versions of the higher-level kernel operations (e.g., vector add, vector multiply, vector dot product, etc.)
CPUID Instruction
Given the preprocessor limitations, it is important to check your CPU platform has the AVX support that you need. What this means is that you have to check in your runtime code what the CPU’s capabilities are, at a very high level in your program, usually during initialization.
Fortunately, every CPU has a builtin machine-code instruction called “CPUID” that is very fast and provides this information. The main features of CPUID include:
1. It’s a hardware opcode! (fast), and
2. The bit flags are very obscure, and therefore
3. Using it directly is a real pain.
The main way to do this is via one of several possible “cpuid” intrinsic
functions
at program startup.
There are several versions of this non-standard C++ intrinsic:
cpuid— the main CPU instruction.__cpuid()— basic CPU information (MSVC)__cpuidex()— extended information (MSVC)__get_cpuid()— GCC/Clang version in<cpuid.h>__cpuid_count()— also GCC/Clang, but more specific.
GCC also has a more user-friendly version without any bit flags needed:
__builtin_cpu_supports("NAME")— look up CPU features by name (e.g., “SSE”).int _may_i_use_cpu_feature (unsigned __int64 a)— an old version.
The GCC version is current and quite easy to use. The other one looks like a bad AI hallucination, but it’s in some 2022 Intel documentation, so best of luck with that.
Then you have a dynamic flag that specifies whether you have AVX-512 or not, and you can then choose between an AVX-2 dot product or an AVX-512 dot product, or whatever else, during execution. Obviously, it gets a bit convoluted when you have to dynamically choose between versions for AVX, AVX-2 and AVX-512, not to mention all the AVX sub-capabilities and also AVX-10 coming soon!
References
- Microsoft, 2021, __cpuid, __cpuidex, https://github.com/MicrosoftDocs/cpp-docs/blob/main/docs/intrinsics/cpuid-cpuidex.md
- Microsoft, April 2025, DirectXMath: DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps, https://github.com/microsoft/DirectXMath/blob/main/Extensions/DirectXMathAVX.h
- Agner Fog, 2023, version2: Vector Class Library, latest version, https://github.com/vectorclass/version2/blob/master/instrset_detect.cpp, https://github.com/vectorclass/version2/blob/master/instrset.h
- Wikipedia, July 2025 (accessed), Advanced Vector Extensions, https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
- Stack Overflow, 2013, Intrinsics for CPUID like informations?, https://stackoverflow.com/questions/17758409/intrinsics-for-cpuid-like-informations
- Gnu, July 2025 (accessed), x86 Built-in Functions, https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html
- Intel, 2022, Intel® C++ Compiler Classic Developer Guide and Reference, https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/overview.html, PDF: https://cdrdv2.intel.com/v1/dl/getContent/767250?fileName=cpp-compiler_developer-guide-reference_2021.8-767249-767250.pdf
- GGML, July 2025 (accessed),
- llama.cpp: LLM inference in C/C++, https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-cpu/arch/x86/cpu-feats.cpp (Testing the CPU platform.)
|
• Online: Table of Contents • PDF: Free PDF book download • Buy: C++ AVX Optimization |
|
C++ AVX Optimization: CPU SIMD Vectorization:
Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization |