Aussie AI

Block Floating-Point

Last Updated 10 March, 2026

by David Spuler, Ph.D.

Research on Block Floating-Point

Research papers include:

J Wu, M Song, J Zhao, HKH So, 2024, A Case for Low Bitwidth Floating Point Arithmetic on FPGA for Transformer Based DNN Inference, https://wujiajunic.cn/publication/ipdpsw2024/IPDPSW2024.pdf
Nils Kohl, Stephen F. McCormick, Rasmus Tamstorf, 30 Jun 2023, Multigrid Methods using Block Floating Point Arithmetic, https://arxiv.org/abs/2307.00124
Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi, 2 Dec 2018 ( v4), Training DNNs with Hybrid Block Floating Point, NeurIPS, https://arxiv.org/abs/1804.01526 PDF: https://proceedings.neurips.cc/paper/2018/file/6a9aeddfc689c1d0e3b9ccc3ab651bc5-Paper.pdf
Kobayashi, S., Fettweis, G.P., 2000, A Hierarchical Block-Floating-Point Arithmetic. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 24, 19–30 (2000). https://doi.org/10.1023/A:1008110410087 https://link.springer.com/article/10.1023/a:1008110410087 (A paper from 2000 on BFP theory in signal processing applications.)
Yeong Foong Choo, Brian L. Evans, Alan Gatherer, 25 Oct 2017 ( v2), Complex Block Floating-Point Format with Box Encoding For Wordlength Reduction in Communication Systems, https://arxiv.org/abs/1705.05217 (Use of BFP for audio sampling in 2017.)
Simla Burcu Harma, Ayan Chakraborty, Babak Falsafi, Martin Jaggi, Yunho Oh, 2023, Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating Point for DNN Training, ML for Computer Architecture and Systems (MLArchSys), ISCA 2023, https://openreview.net/pdf?id=nfmfqzQ4Mwl (Mixed precision version of BFP with per-block bit sizes, and integer arithmetic for dot product, but FP32 for other operations.)
Wikipedia, April 2024 (accessed), Block floating point https://en.wikipedia.org/wiki/Block_floating_point
Chhabra, Arun; Iyer, Ramesh (December 1999). "TMS320C55x A Block Floating Point Implementation on the TMS320C54x DSP" (PDF) (Application report). Digital Signal Processing Solutions. Texas Instruments. SPRA610. Archived (PDF) from the original on 2018-07-11. Retrieved 2018-07-11. https://web.archive.org/web/20180711175625/http://www.eeng.dcu.ie/~ee206/pdf/block_flt_pt.pdf
Elam, David; Iovescu, Cesar (September 2003). "A Block Floating Point Implementation for an N-Point FFT on the TMS320C55x DSP" (PDF) (Application report). TMS320C5000 Software Applications. Texas Instruments. SPRA948. Archived (PDF) from the original on 2018-07-11. Retrieved 2015-11-01. https://www.ti.com/lit/an/spra948/spra948.pdf
Wilkinson, James Hardy (1963). Rounding Errors in Algebraic Processes (1 ed.). Englewood Cliffs, NJ, USA: Prentice-Hall, Inc. MR 0161456. https://books.google.com.au/books?id=yFogU9Ot-qsC&redir_esc=y
Nikita Trukhanov, Ilya Soloveychik, 29 Mar 2024, Accurate Block Quantization in LLMs with Outliers, https://arxiv.org/abs/2403.20137 (Analyzes block floating point number formats in block quantization with a focus on the KV cache memory reduction, including the use of permutations to reorder tensor weight rows.)
Microsoft, “MX pytorch emulation library,” https://github.com/microsoft/microxcaling, 2023.
Nils Kohl, Stephen F. McCormick, Rasmus Tamstorf, 2023, Multigrid Methods Using Block Floating Point Arithmetic, https://doi.org/10.1137/23M1581819 https://epubs.siam.org/doi/abs/10.1137/23M1581819
Lancheng Zou, Wenqian Zhao, Shuo Yin, Chen Bai, Qi Sun, Bei Yu, 2024, BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62978-62992, https://proceedings.mlr.press/v235/zou24d.html https://openreview.net/forum?id=DbyHDYslM7 https://openreview.net/pdf?id=DbyHDYslM7 https://www.cse.cuhk.edu.hk/~byu/papers/C229-ICML2024-BiE-slides.pdf
Yongqi Xu, Yujian Lee, Gao Yi, Bosheng Liu, Yucong Chen, Peng Liu, Jigang Wu, Xiaoming Chen, Yinhe Han, 25 Sep 2024, BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices. https://arxiv.org/abs/2409.17093
Hui Wang, Yuan Cheng, Xiaomeng Han, Zhengpeng Zhao, Dawei Yang, Zhe Jiang, 21 Jan 2025, Pushing the Limits of BFP on Narrow Precision LLM Inference, https://arxiv.org/abs/2502.00026
Jude Haris, Jos\'e Cano, 15 Oct 2025, F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs, https://arxiv.org/abs/2510.13401
Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai (Helen)Li, Yiran Chen, 8 Oct 2024. A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models, https://arxiv.org/abs/2410.07265
Alireza Khodamoradi, Kristof Denolf, Eric Dellinger, 15 Oct 2024, Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks, https://arxiv.org/abs/2410.11203 https://github.com/ROCm/tensorcast
GHADA ALSUHLI, VASILIS SAKELLARIOU, MAHMOUD AL-QUTAYRI, THANOS STOURAITIS, 2025, A Survey and Comparative Analysis of Number Systems for Deep Neural Networks, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11053145
Jonathan Bentz, Tony Scudiero, Jon Waxman and Rob Armstrong, Aug 06, 2025 What’s New and Important in CUDA Toolkit 13.0, https://developer.nvidia.com/blog/whats-new-and-important-in-cuda-toolkit-13-0/
Xiaomeng Han, Yuan Cheng, Jing Wang, Junyang Lu, Hui Wang, X.x. Zhang, Ning Xu, Dawei Yang, Zhe Jiang, 22 Apr 2025, BBAL: A Bidirectional Block Floating Point-Based Quantisation Accelerator for Large Language Models, https://arxiv.org/abs/2504.15721
Weihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, and Dazhao Cheng. 2025. MXBLAS: Accelerating 8-bit Deep Learning with a Unified Micro-Scaled GEMM Library. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '25). Association for Computing Machinery, New York, NY, USA, 1590–1603. https://doi.org/10.1145/3712285.3759809 https://dl.acm.org/doi/full/10.1145/3712285.3759809 (GEMM using "microscaling format" of 8-bit values with scaling factors, with effect similar to block-level mixed-precision quantization and block-floating point numeric formats.)