Aussie AI

Addition Optimization

  • Last Updated 27 August, 2025
  • by David Spuler, Ph.D.

Addition is not the main bottleneck when compared to multiplication, but there are various ways to improve addition, or to use addition in optimization of neural networks.

Addition has a role in optimization techniques such as:

Addition operations are a secondary bottleneck in a vanilla Transformer architecture, but they can be reduced for extra speedup. Some ways to only remove the addition operations from Transformers, without changing any multiplications, include:

  • Bias vector pruning (i.e., don't add any bias vectors in FFNs)
  • Residual connection pruning (i.e., remove skip connections)

Addition Optimization

Research papers on optimizing the arithmetic addition operation:

Approximate Addition Algorithms

Early papers examined the use of approximate addition. This is probably only of interest to hardware designers now.

  • V. Gupta, D. Mohapatra, S.P. Park, A. Raghunathan, “IMPACT: IMPrecise adders for low-power approximate computing”, International Symposium on Low Power Electronics and Design (ISLPED), pp. 409–414, 2011, https://dl.acm.org/doi/10.5555/2016802.2016898
  • V. Gupta, D. Mohapatra, A. Raghunathan, K. Roy, “Low-Power Digital Signal Processing Using Approximate Adders”, IEEE Transaction on CAD of Integrated Circuits and Systems 32(1): 124-137, 2013, https://dl.acm.org/doi/10.1109/TCAD.2012.2217962
  • M. Shafique, W. Ahmad, R. Hafiz, J. Henkel, “A Low Latency Generic Accuracy Configurable Adder”, IEEE/ACM Design Automation Conference (DAC), 2015, https://ieeexplore.ieee.org/abstract/document/7167270
  • R. Ye, T. Wang, F. Yuan, R. Kumar, Q. Xu, “On reconfiguration-oriented approximate adder design and its application”, International Conference on Computer-Aided Design (ICCAD), pp.48-54, 2013, PDF: https://www.cse.cuhk.edu.hk/~qxu/ye-iccad13.pdf
  • J. Miao, K. He, A. Gerstlauer, M. Orshansky, “Modeling and synthesis of quality-energy optimal approximate adders”, International Conference on Computer Aided Design (ICCAD), pp. 728-735, 2012, https://ieeexplore.ieee.org/document/6386754
  • A. B. Kahng, S. Kang, “Accuracy-configurable adder for approximate arithmetic designs”, IEEE/ACM Design Automation Conference (DAC), pp.820-825, 2012, https://ieeexplore.ieee.org/document/6241600
  • S. Mazahir, O. Hasan, R. Hafiz, M. Shafique, J. Henkel, “An Area-Efficient Consolidated Configurable Error Correction for Approximate Hardware Accelerators”, ACM/EDAC/IEEE 53rd Design Automation Conference (DAC), 2016, https://ieeexplore.ieee.org/document/7544339
  • N. Zhu, W.-L. Goh, K.-S. Yeo, “An enhanced low-power high-speed Adder for Error-Tolerant application”, 12th International Symposium on Integrated Circuits (ISIC), 2009, https://ieeexplore.ieee.org/document/5403865
  • Ryu, H. Kim, W. Yi and J.-J. Kim, "BitBlade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation", Proc. 56th Annu. Design Autom. Conf., pp. 1-6, Jun. 2019. https://ieeexplore.ieee.org/document/8807054
  • Ao Ren, Ji Li, Zhe Li, Caiwen Ding, Xuehai Qian, Qinru Qiu, Bo Yuan, Yanzhi Wang, 2017, "SC-DCNN: Highly-scalable deep convolutional neural network using stochastic computing", ACM SIGPLAN Notices, vol. 52, no. 4, pp. 405-418, 2017. https://arxiv.org/abs/1611.05939 (Stochastic method with multiplication and addition approximations via AND gates and multiplexers.)
  • Salar Shakibhamedan, Amin Aminifar, Nima TaheriNejad, Axel Jantsch, 2024, EASE: Energy Optimization through Adaptation — A Review of Runtime Energy-Aware Approximate Deep Learning Algorithms, https://eclectx.org/Publications/2024_M13.pdf (Survey paper on techniques for adaptive inference with a focus on approximations of inference, including loop performance, stochastic algorithms, approximate arithmetic, quantization, pruning and low-rank.)

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: