Aussie AI
Block Floating-Point
-
Last Updated 31 May, 2026
-
by David Spuler, Ph.D.
Block Floating-Point (BFP): Book Excerpts and Blog Articles
Free online book excerpts with full text chapters online and free PDF downloads, and the Aussie AI blog, including related articles:
- David Spuler, Ph.D., Feb 6th, 2026 (updated), 500+ LLM Inference Optimization Techniques, Aussie AI Blog, https://www.aussieai.com/blog/llm-inference-optimization
- David Spuler, May 31st, 2026, Chapter 13. Fixed-Point and Block-Floating Point, in book LLM Inference Optimization: State-of-the-Art Research, Table of Contents: https://www.aussieai.com/book/llm-inference-optimization https://www.amazon.com/dp/B0H3FKR39T
Research on Block Floating-Point
Research papers include:
- J Wu, M Song, J Zhao, HKH So, 2024, A Case for Low Bitwidth Floating Point Arithmetic on FPGA for Transformer Based DNN Inference, https://wujiajunic.cn/publication/ipdpsw2024/IPDPSW2024.pdf
- Nils Kohl, Stephen F. McCormick, Rasmus Tamstorf, 30 Jun 2023, Multigrid Methods using Block Floating Point Arithmetic, https://arxiv.org/abs/2307.00124
- Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi, 2 Dec 2018 ( v4), Training DNNs with Hybrid Block Floating Point, NeurIPS, https://arxiv.org/abs/1804.01526 PDF: https://proceedings.neurips.cc/paper/2018/file/6a9aeddfc689c1d0e3b9ccc3ab651bc5-Paper.pdf
- Kobayashi, S., Fettweis, G.P., 2000, A Hierarchical Block-Floating-Point Arithmetic. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 24, 19–30 (2000). https://doi.org/10.1023/A:1008110410087 https://link.springer.com/article/10.1023/a:1008110410087 (A paper from 2000 on BFP theory in signal processing applications.)
- Yeong Foong Choo, Brian L. Evans, Alan Gatherer, 25 Oct 2017 ( v2), Complex Block Floating-Point Format with Box Encoding For Wordlength Reduction in Communication Systems, https://arxiv.org/abs/1705.05217 (Use of BFP for audio sampling in 2017.)
- Simla Burcu Harma, Ayan Chakraborty, Babak Falsafi, Martin Jaggi, Yunho Oh, 2023, Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating Point for DNN Training, ML for Computer Architecture and Systems (MLArchSys), ISCA 2023, https://openreview.net/pdf?id=nfmfqzQ4Mwl (Mixed precision version of BFP with per-block bit sizes, and integer arithmetic for dot product, but FP32 for other operations.)
- Wikipedia, April 2024 (accessed), Block floating point https://en.wikipedia.org/wiki/Block_floating_point
- Chhabra, Arun; Iyer, Ramesh (December 1999). "TMS320C55x A Block Floating Point Implementation on the TMS320C54x DSP" (PDF) (Application report). Digital Signal Processing Solutions. Texas Instruments. SPRA610. Archived (PDF) from the original on 2018-07-11. Retrieved 2018-07-11. https://web.archive.org/web/20180711175625/http://www.eeng.dcu.ie/~ee206/pdf/block_flt_pt.pdf
- Elam, David; Iovescu, Cesar (September 2003). "A Block Floating Point Implementation for an N-Point FFT on the TMS320C55x DSP" (PDF) (Application report). TMS320C5000 Software Applications. Texas Instruments. SPRA948. Archived (PDF) from the original on 2018-07-11. Retrieved 2015-11-01. https://www.ti.com/lit/an/spra948/spra948.pdf
- Wilkinson, James Hardy (1963). Rounding Errors in Algebraic Processes (1 ed.). Englewood Cliffs, NJ, USA: Prentice-Hall, Inc. MR 0161456. https://books.google.com.au/books?id=yFogU9Ot-qsC&redir_esc=y
- Nikita Trukhanov, Ilya Soloveychik, 29 Mar 2024, Accurate Block Quantization in LLMs with Outliers, https://arxiv.org/abs/2403.20137 (Analyzes block floating point number formats in block quantization with a focus on the KV cache memory reduction, including the use of permutations to reorder tensor weight rows.)
- Microsoft, “MX pytorch emulation library,” https://github.com/microsoft/microxcaling, 2023.
- Nils Kohl, Stephen F. McCormick, Rasmus Tamstorf, 2023, Multigrid Methods Using Block Floating Point Arithmetic, https://doi.org/10.1137/23M1581819 https://epubs.siam.org/doi/abs/10.1137/23M1581819
- Lancheng Zou, Wenqian Zhao, Shuo Yin, Chen Bai, Qi Sun, Bei Yu, 2024, BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62978-62992, https://proceedings.mlr.press/v235/zou24d.html https://openreview.net/forum?id=DbyHDYslM7 https://openreview.net/pdf?id=DbyHDYslM7 https://www.cse.cuhk.edu.hk/~byu/papers/C229-ICML2024-BiE-slides.pdf
- Yongqi Xu, Yujian Lee, Gao Yi, Bosheng Liu, Yucong Chen, Peng Liu, Jigang Wu, Xiaoming Chen, Yinhe Han, 25 Sep 2024, BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices. https://arxiv.org/abs/2409.17093
- Hui Wang, Yuan Cheng, Xiaomeng Han, Zhengpeng Zhao, Dawei Yang, Zhe Jiang, 21 Jan 2025, Pushing the Limits of BFP on Narrow Precision LLM Inference, https://arxiv.org/abs/2502.00026
- Jude Haris, Jos\'e Cano, 15 Oct 2025, F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs, https://arxiv.org/abs/2510.13401
- Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai (Helen)Li, Yiran Chen, 8 Oct 2024. A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models, https://arxiv.org/abs/2410.07265
- Alireza Khodamoradi, Kristof Denolf, Eric Dellinger, 15 Oct 2024, Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks, https://arxiv.org/abs/2410.11203 https://github.com/ROCm/tensorcast
- GHADA ALSUHLI, VASILIS SAKELLARIOU, MAHMOUD AL-QUTAYRI, THANOS STOURAITIS, 2025, A Survey and Comparative Analysis of Number Systems for Deep Neural Networks, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11053145
- Jonathan Bentz, Tony Scudiero, Jon Waxman and Rob Armstrong, Aug 06, 2025 What’s New and Important in CUDA Toolkit 13.0, https://developer.nvidia.com/blog/whats-new-and-important-in-cuda-toolkit-13-0/
- Xiaomeng Han, Yuan Cheng, Jing Wang, Junyang Lu, Hui Wang, X.x. Zhang, Ning Xu, Dawei Yang, Zhe Jiang, 22 Apr 2025, BBAL: A Bidirectional Block Floating Point-Based Quantisation Accelerator for Large Language Models, https://arxiv.org/abs/2504.15721
- Weihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, and Dazhao Cheng. 2025. MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM Library. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '25). Association for Computing Machinery, New York, NY, USA, 1590–1603. https://doi.org/10.1145/3712285.3759809 https://dl.acm.org/doi/full/10.1145/3712285.3759809 (GEMM using "microscaling format" of 8-bit values with scaling factors, with effect similar to block-level mixed-precision quantization and block-floating point numeric formats.)
- David Spuler, Ph.D., Feb 6th, 2026 (updated), 500+ LLM Inference Optimization Techniques, Aussie AI Blog, https://www.aussieai.com/blog/llm-inference-optimization
- Athiwaratkun, Leon Song, Tri Dao, Daniel Y. Fu, Chris De Sa, 12 May 2026, Search Your Block Floating Point Scales! Tanmaey Gupta, Hayden Prairie, Xiaoxia Wu, Reyna Abhyankar, Qingyang Wu, Austin Silveria, Pragaash Ponnusamy, Jue Wang, Ben https://arxiv.org/abs/2605.12464
- David Spuler, May 31st, 2026, Chapter 13. Fixed-Point and Block-Floating Point, in book LLM Inference Optimization: State-of-the-Art Research, Table of Contents: https://www.aussieai.com/book/llm-inference-optimization https://www.amazon.com/dp/B0H3FKR39T
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home