Aussie AI Blog

AGI might require higher precision FP32 or FP64

  • Feb 2nd, 2026
  • by David Spuler, Ph.D.

AGI and FP32 or FP64

I have a weird theory about getting to AGI: it might require the use of higher-precision arithmetic such as FP32 and FP64. There's no specific computational results that I can point to, but it's just a feeling, based on this:

  • AI failings are (now) often subtle and nuanced.
  • Subtlety requires accuracy in choosing the rights words of an answer.
  • The later model layers often choose between two or more tokens that are both reasonably good.

Certainly, the industry is currently based on quantization to avoid the extensive costs of GPU computation. BF16 has become the "de facto standard" for LLM training (see Lee et al, March 2025, https://arxiv.org/abs/2405.18710). Most of inference compute is heading to lower bit-count quantization, like INT8 or FP4/FP8. There's no palpable industry demand for FP32, let alone FP64. INT4 quantization is commonly used in inference, and has a reasonable trade-off between accuracy and cost. Indeed, the Blackwell and Rubin generations of NVIDIA GPUs are adding features like native FP4/FP8 tensor cores, without much mention of enhancements to FP32 or FP64.

Is FP64 required? How much precision might be required to encode the nuances of AGI. FP64 is widely used in scientific computations, but hardly at all in AI. However, there's starting to be a lot of papers on FP64 compute. Another wrinkle is the variety of papers looking at using lower-end data types such as INT8 to perform "FP64 emulation". Native FP64 capabilities are available in data center GPUs, but maybe these emulations are desirable for on-device AI. Also, there's the analogous CUDA C++ optimization known as "BF16x9 FP32 emulation" on Blackwell GPUs, which performs a single FP32 multiplication faster by "emulation" with 9 FP16 tensor cores in parallel.

Of course, AGI is going to require a lot of improvements, not just more bits in compute. Improvements to training data, tool integrations, and training algorithms are all ongoing. But I'm just wondering if maybe they need a few more bits for that!

References

  1. Joonhyung Lee, Jeongin Bae, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee, 25 Mar 2025 (v2), To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability, https://arxiv.org/abs/2405.18710 ("BrainFloat16 (BF16) precision has become the de facto standard for LLM training")
  2. Cole Brower, Samuel Rodriguez Bernabeu, Jeff Hammond, John Gunnels, Sotiris S. Xanthea, Martin Ganahl, Andor Menczer, Örs Legeza, 6 Oct 2025, Mixed-precision ab initio tensor network state methods adapted for NVIDIA Blackwell technology via emulated FP64 arithmetic, https://arxiv.org/abs/2510.04795
  3. Daichi Mukunoki, 25 Sep 2025 (v3), DGEMM without FP64 Arithmetic - Using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme, https://arxiv.org/abs/2508.00441
  4. Samuel Rodriguez, July 2025, Floating Point Emulation in NVDIA Math Libraries: Optimizing Floating Point Precision, CERN, July 1-2, 2025, Geneve, Switzerland, https://indico.cern.ch/event/1538409/contributions/6521976/attachments/3096181/5485165/cern-talk.pdf

More AI Research Topics

Read more about:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging