Aussie AI
AI Matrix Algebra
-
Last Updated 30 August, 2025
-
by David Spuler, Ph.D.
Neural networks run on matrices and vectors. The main primitive is matrix multiplication, with lots of multiplication and addition, and matrix multiplication is actually a lot of vector dot product computations. Modern AI frameworks and research papers tend to use abbreviations to talk about matrix multiplication (i.e. "MATMUL" operations), and GEMM (General Matrix Multiplication).
Various attempts to optimize models involve manipulating matrix algebra. Some of the research areas include:
- Matrix multiplication algorithms (e.g. Strassen, Winograd, FFT)
- Alternative and advanced types of matrices (e.g. Butterfly, Monarch)
- Advanced matrix algebra
- Approximate matrix multiplication algorithms
- Vector dot product optimizations
- Matrix/tensor decomposition (factorizing matrices into smaller sub-matrices)
- Low-rank matrices (smaller factors of matrices)
The individual multiplication operations between numbers inside a matrix can also be optimized:
- Alternatives to using arithmetic multiplication (e.g. bitshifting, logarithms, other fancy stuff)
- Faster algorithms for scalar arithmetic multiplication (mostly hardware acceleration algorithms for chip designers)
- Approximate scalar arithmetic multiplication (faster ways to multiply two numbers by allowing errors)
Matrix Algebra
Matrix algebra underpins most of AI model inference (and training). The term "tensor" in AI theory mainly refers to multi-dimensional matrices, and the multiplication algorithm is the standard one you learned in high school.
There are areas of active research to use matrix algebra in different ways to reduce the total number of multiplication operations. Some of the types of matrices include:
- Structured matrices (the general class)
- Submatrices
- Low-rank matrices
- Sparse matrix optimizations
- Monarch matrices
- Butterfly matrices
- Low-rank matrices
The theory of matrix algebra can be applied to neural networks, since matrix multiplication is at the core of inference and training. Inference of a model involves executing matrix multiplication on a vector of probabilities. And matrix multiplication involves multiplying a row of that matrix over a vector, which is actually computing the vector dot product of two vectors. Undearneath all of those vector iterations are the low-level multiplication operations on pairs of numbers, usually floating point but also possibly integers (in quantized models that use integer-only arithmetic).
General Research on Matrix Algebra Theory
Some of the papers on theory of matrix algebra include:
- Paul Cull, A Matrix Algebra for Neural Nets, https://link.springer.com/chapter/10.1007/978-1-4757-0555-3_43
- Xian-Da Zhang, A Matrix Algebra Approach to Artificial Intelligence, January 2020, DOI:10.1007/978-981-15-2770-8, ISBN: 978-981-15-2769-2, https://www.researchgate.net/publication/341581565_A_Matrix_Algebra_Approach_to_Artificial_Intelligence
- Hillar, C. J. & Lim, L.-H., Most tensor problems are NP-hard. J. ACM 60, 1–39 (2013) https://arxiv.org/abs/0911.1393
- Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean, Nov 2022, Efficiently Scaling Transformer Inference, Google Research, https://arxiv.org/abs/2211.05102
- Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. Nov 2021, Data movement is all you need: A case study on optimizing transformers, Proceedings of Machine Learning and Systems, 3, 2021. https://arxiv.org/abs/2007.00072 Code: https://github.com/spcl/substation
- C. Deng, S. Liao, Y. Xie, K. K. Parhi, X. Qian and B. Yuan, "PermDNN: Efficient compressed DNN architecture with permuted diagonal matrices", Proc. 51st Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO), pp. 189-202, Oct. 2018. https://arxiv.org/abs/2004.10936
- Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson, 10 Jun 2024, Compute Better Spent: Replacing Dense Layers with Structured Matrices, https://arxiv.org/abs/2406.06248
- Haque, S.A.; Choudhury, N.; Hossain, S. Matrix Multiplication with Diagonals: Structured Sparse Matrices and Beyond. In Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications, Jinan, China, 17–19 June 2023; pp. 69–76. https://doi.org/10.1145/3606043.3606053
- Sardar Anisul Haque,Mohammad Tanvir Parvez, Shahadat Hossain, Jan 2024, GPU Algorithms for Structured Sparse Matrix Multiplication with Diagonal Storage Schemes, https://www.mdpi.com/1999-4893/17/1/31
- Josh Alman, Zhao Song, 6 Oct 2023, How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation, https://arxiv.org/abs/2310.04064
- Ishna Satyarth, Chao Yin, RuQing G. Xu, Devin A. Matthews, 15 Nov 2024, Skew-Symmetric Matrix Decompositions on Shared-Memory Architectures, https://arxiv.org/abs/2411.09859
Advanced Types of Matrices
The simplest ideas are that smaller matrices (lower-rank) and sparse matrices can require less computation. Various advanced subtypes of matrices with specific constraints have properties that allow a reduced number of multiplications in evaluating matrix multiplication or matrix-vector multiplication.
Lower-Rank Submatrices: There are many theoriest that typical models are "over-parameterized", which means their matrices are bigger than they need to be. The number of weights is quadratic in the dimensions of the matrix, so reducing to smaller, lower-rank matrices improves performance. There is much theory about finding the "smaller models" that are inside the "big models", but have accuracy not much worse than the larger matrices. This is sometimes called "parameter factorization" of models. The non-mathematical technique of "distillation" is also related, as it finds a smaller model with similar accuracy. See Low-rank matrix factorization research.
Sparse Matrices: The idea with sparse matrices is that if they have a lot of zero weights, then the inference algorithm is doing a lot of unnecessary multiplications by zero. Various algorithms aim to focus on only doing multiplication in the submatrices that have data. Sparsity is closely related to model pruning, and these matrix sparsity techniques can also be amplified using dynamic pruning of near-zero weights, to further increase the total number of zeros in the matrix. See Sparse matrices and sparsification.
Butterfly Matrices: These special matrices are one approach to use of matrix algebra to achieve sparsity. Research includes:
- Beidi Chen, Tri Dao and Chris Ré, Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models, Jan 17, 2022, https://hazyresearch.stanford.edu/blog/2022-01-17-Sparsity-3-Pixelated-Butterfly
- Keivan Alizadeh Vahid, Anish Prabhu, Ali Farhadi, Mohammad Rastegari Apr 2020, Butterfly Transform: An Efficient FFT Based Neural Architecture Design, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), https://ieeexplore.ieee.org/abstract/document/9157695/, https://arxiv.org/abs/1906.02256
- L Zheng, G Puy, E Riccietti, P Pérez, R Gribonval, Butterfly factorization by algorithmic identification of rank-one blocks, July 2023, https://arxiv.org/abs/2307.00820
- L Zheng, E Riccietti, R Gribonval, 2023, Efficient Identification of Butterfly Sparse Matrix Factorizations, SIAM Journal on Mathematics of Data ScienceVol. 5, Iss. 1 (2023), DOI 10.1137/22M1488727, https://epubs.siam.org/doi/abs/10.1137/22M1488727, https://arxiv.org/abs/2110.01230
- J Peca-Medlin, T Trogdon, 2023, Growth factors of random butterfly matrices and the stability of avoiding pivoting, SIAM Journal on Matrix Analysis and Applications, 2023, https://epubs.siam.org/doi/abs/10.1137/22M148762X, https://arxiv.org/abs/2203.15921
- D. Stott Parker, 1995, "Random butterfly transformations with applications in computational linear algebra", Technical report, UCLA Computer Science Dept, 1995. https://searchworks.stanford.edu/view/4640257
- D. Stott Parker, 1995, A randomizing butterfly transformation useful in block matrix computations, Technical report, UCLA Computer Science Dept. https://searchworks.stanford.edu/view/4640258
- Tatsuya Member and Member Kazuyoshi, "Bidirectional learning for neural network having butterfly structure", Systems and Computers in Japan, vol. 26, pp. 64-73, 04 1995. https://onlinelibrary.wiley.com/doi/abs/10.1002/scj.4690260407
- Li Yingzhou, Yang Haizhao, R. Martin Eileen, L. Ho Kenneth and Ying Lexing, "Butterfly factorization", Multiscale Modeling Simulation, vol. 13, pp. 714-732, 2015. https://arxiv.org/abs/1502.01379
- Dao Tri, Gu Albert, Eichhorn Matthew, Rudra Atri and Ré Christopher, Learning fast algorithms for linear transforms using butterfly factorizations, 2019. http://proceedings.mlr.press/v97/dao19a.html
- Y Li, X Cheng, J Lu, Apr 2020, Butterfly-Net: Optimal function representation based on convolutional neural networks, arXiv preprint arXiv:1805.07451, https://arxiv.org/abs/1805.07451
- Rui Lin, Jie Ran, King Hung Chiu, Graziano Chesi, Ngai Wong, Mar 2022, Deformable butterfly: A highly structured and sparse linear transform NeurIPS 2021, https://arxiv.org/abs/2203.13556, https://proceedings.neurips.cc/paper/2021/file/86b122d4358357d834a87ce618a55de0-Paper.pdf
- N Ailon, O Leibovitch, V Nair, July 2021, Sparse linear networks with a fixed butterfly structure: theory and practice, Uncertainty in Artificial Intelligence, UAI 2021, https://arxiv.org/abs/2007.08864 https://proceedings.mlr.press/v161/ailon21a/ailon21a.pdf
- Daniel Y. Fu, Elliot L. Epstein, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré, Feb 2023, Simple Hardware-Efficient Long Convolutions for Sequence Modeling, https://arxiv.org/abs/2302.06646 (FlashButterfly algorithm.)
Monarch Matrices: Monarch matrices are a superset of bufferfly matrices, named after the orange-winged monarch butterfly. These special types of matrices aim to exploit operations on submatrices to reduce the overall computational overhead of matrix multiplication.
- Dan Fu, Simran Arora, Chris Ré, Monarch Mixer: Revisiting BERT, Without Attention or MLPs, Jul 25, 2023, https://hazyresearch.stanford.edu/blog/2023-07-25-m2-bert (Monarch Matrices) (Code: https://github.com/HazyResearch/m2)
- Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré, Apr 2022, Monarch: Expressive Structured Matrices for Efficient and Accurate Training, Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. https://arxiv.org/abs/2204.00595 https://proceedings.mlr.press/v162/dao22a/dao22a.pdf
- Sunil Babu Melingi, Ramesh Kumar Mojjada, C. Tamizhselvan, R. Surender, S. Yazhinian., 2022, A self-adaptive monarch butterfly optimization (MBO) algorithm based improved deep forest neural network model for detecting and classifying brain stroke lesions, Research on Biomedical Engineering volume 38, pages647–660 (2022), https://link.springer.com/article/10.1007/s42600-022-00214-2
- Dan Fu, Simran Arora, Chris Ré, Jul 25, 2023, Monarch Mixer: A new model architecture for increased efficiency, Together AI blog, https://together.ai/blog/monarch-mixer, Code: https://github.com/HazyResearch/m2 (An implementation by Together AI of Stanford's Hazy Research AI engines using Monarch matrices.)
- Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson, 10 Jun 2024, Compute Better Spent: Replacing Dense Layers with Structured Matrices, https://arxiv.org/abs/2406.06248
- Mimoun Mohamed, Valentin Emiya, Caroline Chaux. 17 Jan 2025, Learning Permutations in Monarch Factorization. 2025. hal-04887483 https://hal.science/hal-04887483/document
Matrix/Tensor Factorization (Decomposition)
Methods to factorize or decompose larger matrices into smaller matrices (see low-rank matrices for optimization), or specific subtypes of matrices as above, require special algorithms. Some of the main theoretical decomposition algorithms are CANDECOMP/PARAFAC decomposition (CP decomposition) and Tucker decomposition.
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer (2009), https://ieeexplore.ieee.org/abstract/document/5197422
- Defu Lian, Rui Liu, Yong Ge, Kai Zheng, Xing Xie, and Longbing Cao. 2017. Discrete content-aware matrix factorization. In SIGKDD. 325–334, https://dl.acm.org/doi/10.1145/3097983.3098008
- Andriy Mnih and Russ R Salakhutdinov. 2007. Probabilistic matrix factorization. NIPS 20 (2007), 1257–1264, https://dl.acm.org/doi/10.5555/2981562.2981720
- Carroll, J Douglas and Chang, Jih-Jie. Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika, 35(3):283–319, 1970. https://link.springer.com/article/10.1007/BF02310791 (CANDECOMP/PARAFAC decomposition/factorization.)
- Harshman, Richard A and Lundy, Margaret E. PARAFAC: Parallel factor analysis. Computational Statistics & Data Analysis, 18(1):39–72, 1994 https://www.sciencedirect.com/science/article/abs/pii/0167947394901325 (The P in CP decomposition/factorization.)
- Shashua, Amnon and Hazan, Tamir. Non-negative tensor factorization with applications to statistics and computer vision. In Proceedings of the 22nd international conference on Machine learning, pp. 792–799. ACM, 2005. PDF: https://icml.cc/imls/conferences/2005/proceedings/papers/100_NonNegative_ShashuaHazan.pdf (CANDECOMP/PARAFAC decomposition/factorization.)
- Tucker, Ledyard R. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3): 279–311, 1966 https://link.springer.com/article/10.1007/BF02289464 (Tucker decomposition.)
- De Lathauwer, Lieven, De Moor, Bart, and Vandewalle, Joos. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000. https://epubs.siam.org/doi/10.1137/S0895479896305696 (Tucker decomposition.)
- Kim, Y.-D. and Choi, S. Nonnegative Tucker decomposition. In Proceedings of the IEEE CVPR 2007 Workshop on Component Analysis Methods, Minneapolis, Minnesota, 2007. https://ieeexplore.ieee.org/document/4270403
- H Li, J Choi, Y Kwon, JH Ahn, Oct 2023, A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models, IEEE Computer Architecture Letters, https://ieeexplore.ieee.org/abstract/document/10285300/ (Tiled version of matrix multiplication with SVD factorization.)
- Megan Flynn, Alexander Wang, Dean Edward Alvarez, Christopher De Sa, Anil Damle, 29 May 2024, STAT: Shrinking Transformers After Training, https://arxiv.org/abs/2406.00061
- Chakshu Moar, 2024, Compressing Language Models using Low-Rank Decomposition and Characterizing the Accuracy- Efficiency Trade-offs, Master of Science Thesis, Electrical and Computer Engineering, University of California, Irvine, USA, https://escholarship.org/content/qt0t6967h4/qt0t6967h4.pdf
- Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen, 21 May 2024, Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression, https://arxiv.org/abs/2405.12591
- Chakshu Moar, Michael Pellauer, Hyoukjun Kwon, 10 May 2024, Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models, https://arxiv.org/abs/2405.06626
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
- Dolgov, R. Brasher, M. Perelshtein, 29 Jan 2024, TQCompressor: improving tensor decomposition methods in neural networks via permutations, V. Abronin, A. Naumov, D. Mazur, D. Bystrov, K. Tsarova, Ar. Melnikov, I. Oseledets, S. https://arxiv.org/abs/2401.16367 Code: https://huggingface.co/tq-ag/TQCompressedGPT2 Code: https://github.com/terra-quantum-public/TQCompressedGPT2 (A permutation-based enhancement to the Kronecker decomposition method.)
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 15 Mar 2024 (v5), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer (A large survey of a variety of LLM optimizations.)
- Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim, Nov 2023, LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning, https://arxiv.org/abs/2311.12023
- Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun, Dec 2023, ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models, https://arxiv.org/abs/2312.05821 Code: https://github.com/hahnyuan/ASVD4LLM
- R Gribonval, T Mary, E Riccietti, 2023, Optimal quantization of rank-one matrices in floating-point arithmetic---with applications to butterfly factorizations https://inria.hal.science/hal-04125381/file/rank1_quant.pdf
- H Fan, T Chau, SI Venieris, R Lee, 2022, Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design https://ieeexplore.ieee.org/abstract/document/9923888/ https://arxiv.org/pdf/2209.09570
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019, https://arxiv.org/abs/1909.11942 Code: https://github.com/google-research/ALBERT
- Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta, 24 Apr 2024 (v2), Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward, https://arxiv.org/abs/2402.01799 Code: https://github.com/nyunAI/Faster-LLM-Survey
- Jungi Lee, Wonbeom Lee, Jaewoong Sim, 16 Jun 2024, Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization, https://arxiv.org/abs/2406.12930 (Combining tensor decomposition and quantization with power-of-two scale factors.)
- Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre, 24 Jun 2024, Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers, https://arxiv.org/abs/2406.16450 Code: https://github.com/CLAIRE-Labo/StructuredFFN/tree/main
- Mirko Farina, Usman Ahmad, Ahmad Taha, Hussein Younes, Yusuf Mesbah, Xiao Yu, Witold Pedrycz, 2024, Sparsity in transformers: A systematic literature review, Neurocomputing, Volume 582, 14 May 2024, 127468, https://www.sciencedirect.com/science/article/abs/pii/S092523122400239X (General survey of sparsity methods, and techniques that create sparsity.)
- Yao Yao, Zuchao Li, Hai Zhao, 21 May 2024, SirLLM: Streaming Infinite Retentive LLM, https://arxiv.org/abs/2405.12528 (Low-rank decomposition to compress KV cache heads.)
- 18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
- Zeyu Zhang, Haiying Shen, 7 Aug 2024, Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference, https://arxiv.org/abs/2408.04107
- Jiuxiang Gu, Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Junze Yin, 8 May 2024, Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers, https://arxiv.org/abs/2405.05219 (Attention optimization using multiple low-rank matrices.)
- Sasindu Wijeratne, Rajgopal Kannan, Viktor Prasanna, 14 May 2024, Sparse MTTKRP Acceleration for Tensor Decomposition on GPU, https://arxiv.org/abs/2405.08470
- Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat, 8 May 2024, A Sparse Tensor Generator with Efficient Feature Extraction, https://arxiv.org/abs/2405.04944 https://github.com/sparcityeu/feaTen https://github.com/sparcityeu/genTen
- Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna, 31 Mar 2024 (v2), Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition, https://arxiv.org/abs/2403.07953
- Jan Laukemann, Ahmed E. Helal, S. Isaac Geronimo Anderson, Fabio Checconi, Yongseok Soh, Jesmin Jahan Tithi, Teresa Ranadive, Brian J Gravelle, Fabrizio Petrini, Jee Choi, 11 Mar 2024, Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation, https://arxiv.org/abs/2403.06348
- Nishant Yadav, May 2024, Efficient K-Nearet Neighbor Search With Black-Box Neural Similarity Functions, Ph.D. Dissertation, Manning College of Information and Computer Sciences, University of Massachusetts Amherst, https://scholarworks.umass.edu/bitstreams/5572c0b9-cd96-46d8-983d-f877c9d0e22e/download
- Hongyaoxing Gu, 27 May 2024, LRAMM -- Low precision approximates GEMM via RSVD, https://arxiv.org/abs/2405.16917
- Ignacio Hounie, Charilaos Kanatsoulis, Arnuv Tandon, Alejandro Ribeiro, 5 Oct 2024, LoRTA: Low Rank Tensor Adaptation of Large Language Models, https://arxiv.org/abs/2410.04060
- Haoran Guan, Yuwei Fan, 9 Oct 2024, CholeskyQR for sparse matrices, https://arxiv.org/abs/2410.06525
- D.Breen, Oct 2024, Towards Sustainable CNNs: Tensor Decompositions for Green AI Solutions: Exploring Energy Consumption of Large CNNs, Master's Thesis, Systems and Control & Robotics, Delft University of Technology, https://repository.tudelft.nl/file/File_8208301f-51ef-4edf-bd12-d6ec3d5a8711
- Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin, 2024, MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition, 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Year: 2024, Pages: 1032-1047, DOI Bookmark: 10.1109/ISCA59077.2024.00079, https://www.computer.org/csdl/proceedings-article/isca/2024/265800b032/1Z3pCEBnapO
- Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, 30 Jul 2024, Palu: Compressing KV-Cache with Low-Rank Projection, https://arxiv.org/abs/2407.21118 https://github.com/shadowpa0327/Palu
- Shi, J., Shi, C. (2025). Improve LLM Inference Performance with Matrix Decomposition Strategies. In: Shi, Z., Witbrock, M., Tian, Q. (eds) Intelligence Science V. ICIS 2024. IFIP Advances in Information and Communication Technology, vol 720. Springer, Cham. https://doi.org/10.1007/978-3-031-71253-1_12 https://link.springer.com/chapter/10.1007/978-3-031-71253-1_12 (Speed up matrix operations with SVD and NMF via adaptive block sizing based on batching.)
- Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu, 31 Oct 2024, BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments, https://arxiv.org/abs/2410.23918 https://github.com/xinghaow99/BitStack
- Ishna Satyarth, Chao Yin, RuQing G. Xu, Devin A. Matthews, 15 Nov 2024, Skew-Symmetric Matrix Decompositions on Shared-Memory Architectures, https://arxiv.org/abs/2411.09859
- Kwangryeol Park, Seulki Lee, 12 Dec 2024, SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization, https://arxiv.org/abs/2412.08894 (Gradient optimizer Adam optimized using low-rank matrix factorization.)
- Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
- Sanghyeon Park, Soo-Mook Moon, 8 Jan 2025, CURing Large Models: Compression via CUR Decomposition, https://arxiv.org/abs/2501.04211
- Doruk Aksoy, David J. Gorsich, Shravan Veerapaneni, Alex A. Gorodetsky, 18 Sep 2023 (v2), An Incremental Tensor Train Decomposition Algorithm, https://arxiv.org/abs/2211.12487
- Ryuta Matsuno, 14 Aug 2025, Source Component Shift Adaptation via Offline Decomposition and Online Mixing Approach, https://arxiv.org/abs/2508.10257
- Rui Wu, Nikola Kovachki, Burigede Liu, 23 Jul 2025, A Learning-based Domain Decomposition Method, https://arxiv.org/abs/2507.17328
- Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione, 24 Jul 2025, Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning, https://arxiv.org/abs/2505.05086
- Jianhong Chen, Meng Zhao, Mostafa Reisi Gahrooei, Xubo Yue, 18 Jul 2025, Toward Temporal Causal Representation Learning with Tensor Decomposition, https://arxiv.org/abs/2507.14126
- Quang-Binh Nguyen, Minh Luu, Quang Nguyen, Anh Tran, Khoi Nguyen, 18 Jul 2025, CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models, https://arxiv.org/abs/2507.13984
- Z.Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z.F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan, 18 Jul 2025, DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition, https://arxiv.org/abs/2504.21801
- Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, Hang Li, 21 Jul 2025, Solving Formal Math Problems by Decomposition and Iterative Reflection, https://arxiv.org/abs/2507.15225
- Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang, 19 Jul 2025, Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model, https://arxiv.org/abs/2507.14668
- Daniel Ayomide Olanrewaju, 20 Jul 2025, Partial Symmetry Enforced Attention Decomposition (PSEAD): A Group-Theoretic Framework for Equivariant Transformers in Biological Systems, https://arxiv.org/abs/2507.14908
- Lu Chenggang, 10 Aug 2025, A Globally Optimal Analytic Solution for Semi-Nonnegative Matrix Factorization with Nonnegative or Mixed Inputs, https://arxiv.org/abs/2508.07134
- Wenpeng Xing, Jie Chen, Zaifeng Yang, Tiancheng Zhao, Gaolei Li, Changting Lin, Yike Guo, Meng Han, 8 Aug 2025, CoDe-NeRF: Neural Rendering via Dynamic Coefficient Decomposition, https://arxiv.org/abs/2508.06632
- Qin Xu, Lili Zhu, Xiaoxia Cheng, Bo Jiang, 9 Aug 2025, Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification, https://arxiv.org/abs/2508.06959
- Matthew Fahrbach, Mehrdad Ghadiri, 8 Aug 2025, A Tight Lower Bound for the Approximation Guarantee of Higher-Order Singular Value Decomposition, https://arxiv.org/abs/2508.06693
- Runshi Tang and Tamara Kolda and Anru R. Zhang, 9 Aug 2025, Tensor Decomposition with Unaligned Observations, https://arxiv.org/abs/2410.14046
- Valentin Six, Evan Dufraisse, Ga\"el de Chalendar, 11 Aug 2025, DAGR: Decomposition Augmented Graph Retrieval with LLMs, https://arxiv.org/abs/2506.13380
- Zhengqi Lin and Andrzej Ruszczy\'nski, 25 Jul 2025, Federated Calculation of the Free-Support Transportation Barycenter by Single-Loop Dual Decomposition, https://arxiv.org/abs/2507.19627
- Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang, 28 Jul 2025, From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation, https://arxiv.org/abs/2507.20968
- Sara M. Ichinaga, Steven L. Brunton, Aleksandr Y. Aravkin, J. Nathan Kutz, 26 Jul 2025, Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures, https://arxiv.org/abs/2507.19787
- Yukino Terui, Yuka Inoue, Yohei Hamakawa, Kosuke Tatsumura, Kazue Kudo, 29 Jul 2025, Collaborative filtering based on nonnegative/binary matrix factorization, https://arxiv.org/abs/2410.10381
- Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji, 31 Jul 2025, Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models, https://arxiv.org/abs/2501.08727
- Steffen Limmer, Steffen Udluft, Clemens Otte, 31 Jul 2025, Neural-ANOVA: Analytical Model Decomposition using Automatic Integration, https://arxiv.org/abs/2408.12319
- Agust\'in Borda, Juan Bautista Cabral, Gonzalo Giarda, Diego Nicol\'as Gimenez Irusta, Paula Pacheco, Alvaro Roy Schachner, 31 Jul 2025, Algorithmic Detection of Rank Reversals, Transitivity Violations, and Decomposition Inconsistencies in Multi-Criteria Decision Analysis, https://arxiv.org/abs/2508.00129
- Willem Diepeveen, Jon Schwenk, Andrea Bertozzi, 1 Aug 2025, Latent Diffeomorphic Dynamic Mode Decomposition, https://arxiv.org/abs/2505.06351
- Jun Lu, 1 Aug 2025, Matrix Decomposition and Applications, https://arxiv.org/abs/2201.00145
- Hang Yin, Zipeng Liu, Xiaoyong Peng, Liyao Xiang, 4 Aug 2025, Graph Unlearning via Embedding Reconstruction -- A Range-Null Space Decomposition Approach, https://arxiv.org/abs/2508.02044
- Amitava Das, Abhilekh Borah, Vinija Jain, Aman Chadha, 4 Aug 2025, AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization, https://arxiv.org/abs/2508.02079
- Nicolas Langren\'e, Xavier Warin, Pierre Gruet, 3 Aug 2025, Fast Gaussian process inference by exact Mat\'ern kernel decomposition, https://arxiv.org/abs/2508.01864
- Ziqin He, Mengqi Hu, Yifei Lou, Can Chen, 4 Aug 2025, Tensor Dynamic Mode Decomposition, https://arxiv.org/abs/2508.02627
- Kang Du, Zhihao Liang, Yulin Shen and Zeyu Wang, 4 Aug 2025, GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors, https://arxiv.org/abs/2408.08524
- Pusen Dong, Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li, 5 Aug 2025, From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning, https://arxiv.org/abs/2412.08920
- Yang Li, Daniel Agyei Asante, Changsheng Zhao, Ernie Chang, Yangyang Shi, Vikas Chandra, 6 Aug 2025, Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications, https://arxiv.org/abs/2405.15877
- Yossi Arjevani, Gal Vinograd, 6 Aug 2025, Symmetry & Critical Points for Symmetric Tensor Decomposition Problems, https://arxiv.org/abs/2306.07886
- Alex Glushkovsky, 8 Aug 2025, Dual Signal Decomposition of Stochastic Time Series, https://arxiv.org/abs/2508.05915
- Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Zhiying Li, Guanggang Geng, 6 Aug 2025, Log2Sig: Frequency-Aware Insider Threat Detection via Multivariate Behavioral Signal Decomposition, https://arxiv.org/abs/2508.05696
- Luke Li, 18 Aug 2025, Deep Learning-Based Financial Time Series Forecasting via Sliding Window and Variational Mode Decomposition, https://arxiv.org/abs/2508.12565
- Ying Huang, Yuanbin Man, Wenqi Jia, Zhengzhong Tu, Junzhou Huang, Miao Yin, 16 Aug 2025, AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition, https://arxiv.org/abs/2508.11870
- Yuannuo Feng, Wenyong Zhou, Yuexi Lyu, Hanjie Liu, Zhengwu Liu, Ngai Wong, Wang Kang, 16 Aug 2025, HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware, https://arxiv.org/abs/2508.11935
- Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.23176
- Suryanarayana Sankagiri, Jalal Etesami, Matthias Grossglauser, 19 Aug 2025, Recommendations with Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization, https://arxiv.org/abs/2502.20033
- N\'icolas Roque dos Santos, Dawon Ahn, Diego Minatel, Alneu de Andrade Lopes, Evangelos E. Papalexakis, 20 Aug 2025, Multi-view Graph Condensation via Tensor Decomposition, https://arxiv.org/abs/2508.14330
- Sebastian Musia{\l}, Bartosz Zieli\'nski, Tomasz Danel, 20 Aug 2025, Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis, https://arxiv.org/abs/2508.15015
- Muchammad Daniyal Kautsar, Afra Majida Hariono, Widyawan, Syukron Abu Ishaq Alfarozi and Kuntpong Wararatpanya, 21 Aug 2025, CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression, https://arxiv.org/abs/2508.16680
- Paul Fogel (1), Christophe Geissler (1), George Luta (2) ((1) Data Services, Forvis Mazars, Levallois, France, (2) Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, USA), 22 Aug 2025, The Target Polish: A New Approach to Outlier-Resistant Non-Negative Matrix Factorization, https://arxiv.org/abs/2507.10484
- Mathieu Godbout and Audrey Durand, 18 Jul 2025, On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes, https://arxiv.org/abs/2507.14005
- Filip de Roos and Fabio Muratore, 28 Jul 2025, Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference, https://arxiv.org/abs/2507.20678
- Michael Aichm\"uller, Hector Geffner, 15 Aug 2025, Sketch Decompositions for Classical Planning via Deep Reinforcement Learning, https://arxiv.org/abs/2412.08574
Low-Rank Matrix Factorization
Matrix factorization (decomposition) can be used to find low-rank matrices that approximate the larger sets of weights.
- Genta Indra Winata, Andrea Madotto, Jamin Shin, Elham J Barezi, and Pascale Fung. 2019. On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression. arXiv preprint arXiv:1908.09982. https://arxiv.org/abs/1908.09982
- Ashish Khetan and Zohar Karnin. “schubert: Optimizing elements of bert”. arXiv preprint arXiv:2005.06628 (2020) https://arxiv.org/abs/2005.06628
- Jian Xue, Jinyu Li, and Yifan Gong. 2013. Restructuring of deep neural network acoustic models with singular value decomposition. In Interspeech, pages 2365–2369. https://www.academia.edu/72568360/Restructuring_of_deep_neural_network_acoustic_models_with_singular_value_decomposition
- Patrick Chen, Si Si, Yang Li, Ciprian Chelba, and Cho-Jui Hsieh. 2018. GroupReduce: Block-wise low-rank approximation for neural language model shrinking. In Advances in Neural Information Processing Systems, pages 10988–10998. https://arxiv.org/abs/1806.06950
- Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7370–7379. PDF: https://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_On_Compressing_Deep_CVPR_2017_paper.pdf
- Zi Lin, Jeremiah Zhe Liu, Zi Yang, Nan Hua, Dan Roth. “Pruning Redundant Mappings in Transformer Models via SpectralNormalized Identity Prior”. arXiv preprint arXiv:2010.01791 (2020) https://arxiv.org/abs/2010.01791
- Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. https://arxiv.org/abs/2303.10512, Code: https://github.com/QingruZhang/AdaLoRA
- Arnav Chavan, Zhuang Liu, Deepak K. Gupta, Eric P. Xing, and Zhiqiang Shen. One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning. CoRR, abs/2306.07967, 2023. https://arxiv.org/abs/2306.07967
- Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation, In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 3266–3279. Association for Computational Linguistics, 2023. https://arxiv.org/abs/2210.07558
- Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, LoRA: Low-Rank Adaptation of Large Language Models, In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. https://arxiv.org/abs/2106.09685
- Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, and Sanjeev Khudanpur. Semi-orthogonal low-rank matrix factorization for deep neural networks. In B. Yegnanarayana, editor, Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, pages 3743– 3747. ISCA, 2018. PDF: https://www.isca-speech.org/archive/pdfs/interspeech_2018/povey18_interspeech.pdf
- Yerlan Idelbayev; Miguel Á. Carreira-Perpiñán, Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 8046–8056. Computer Vision Foundation / IEEE, 2020. https://ieeexplore.ieee.org/document/9157223/
- Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang, A Survey on Model Compression for Large Language Models, arXiv preprint arXiv:2308.07633, Aug 2023 https://arxiv.org/abs/2308.07633 (Recent 2023 survey paper on various model compression approaches including low-rank matrices.)
Singular Value Decomposition (SVD)
SVD is a specific type of matrix decomposition. Research papers on SVD:
- Zeyu Zhang, Haiying Shen, 7 Aug 2024, Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference, https://arxiv.org/abs/2408.04107
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
- Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, 30 Jul 2024, Palu: Compressing KV-Cache with Low-Rank Projection, https://arxiv.org/abs/2407.21118 https://github.com/shadowpa0327/Palu
- Hongyaoxing Gu, 27 May 2024, LRAMM -- Low precision approximates GEMM via RSVD, https://arxiv.org/abs/2405.16917
- Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao 12 Aug 2024 (v3), A Survey on LoRA of Large Language Models, https://arxiv.org/abs/2407.11046 https://github.com/ZJU-LLMs/Awesome-LoRAs.git
- Shi, J., Shi, C. (2025). Improve LLM Inference Performance with Matrix Decomposition Strategies. In: Shi, Z., Witbrock, M., Tian, Q. (eds) Intelligence Science V. ICIS 2024. IFIP Advances in Information and Communication Technology, vol 720. Springer, Cham. https://doi.org/10.1007/978-3-031-71253-1_12 https://link.springer.com/chapter/10.1007/978-3-031-71253-1_12 (Speed up matrix operations with SVD and NMF via adaptive block sizing based on batching.)
- Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu, 31 Oct 2024, BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments, https://arxiv.org/abs/2410.23918 https://github.com/xinghaow99/BitStack
- Shengwen Ding, Chenhui Hu, 24 Nov 2024, eFedLLM: Efficient LLM Inference Based on Federated Learning, https://arxiv.org/abs/2411.16003
- Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
- Hong Yankun, Li Xing, Zhen Hui-Ling, Yu Xianzhi, Liu Wulong, Yuan Mingxuan, 21 Feb 2025, SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention, https://arxiv.org/abs/2502.15304
- Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang, 16 Mar 2025, SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression, https://arxiv.org/abs/2503.12340 https://github.com/AIoT-MLSys-Lab/SVD-LLM
- Jiujun He, Huazhen Lin, 10 Jun 2025, Olica: Efficient Structured Pruning of Large Language Models without Retraining, https://arxiv.org/abs/2506.08436
- Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma, 29 Jul 2025, Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration, https://arxiv.org/abs/2507.22170
- Jiayu Fang, Zhiqi Shao, S T Boris Choy, Junbin Gao, 19 Aug 2025, SVDformer: Direction-Aware Spectral Graph Embedding Learning via SVD and Transformer, https://arxiv.org/abs/2508.13435
- Mete Erdogan, Sebnem Demirtas, 25 Aug 2025, SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features, https://arxiv.org/abs/2504.20970
Tucker Decomposition
Research papers on Tucker decomposition:
- Chakshu Moar, 2024, Compressing Language Models using Low-Rank Decomposition and Characterizing the Accuracy- Efficiency Trade-offs, Master of Science Thesis, Electrical and Computer Engineering, University of California, Irvine, USA, https://escholarship.org/content/qt0t6967h4/qt0t6967h4.pdf
- Chakshu Moar, Michael Pellauer, Hyoukjun Kwon, 10 May 2024, Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models, https://arxiv.org/abs/2405.06626
- Tucker, Ledyard R. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3): 279–311, 1966 https://link.springer.com/article/10.1007/BF02289464 (Tucker decomposition.)
- De Lathauwer, Lieven, De Moor, Bart, and Vandewalle, Joos. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000. https://epubs.siam.org/doi/10.1137/S0895479896305696 (Tucker decomposition.)
- Kim, Y.-D. and Choi, S. Nonnegative Tucker decomposition. In Proceedings of the IEEE CVPR 2007 Workshop on Component Analysis Methods, Minneapolis, Minnesota, 2007. https://ieeexplore.ieee.org/document/4270403
- Federica Stolf, Antonio Canale, 15 Nov 2024, Bayesian Adaptive Tucker Decompositions for Tensor Factorization, https://arxiv.org/abs/2411.10218
- Matthew Pietrosanu, Bei Jiang, Linglong Kong, 13 Jun 2024, Oblivious subspace embeddings for compressed Tucker decompositions, https://arxiv.org/abs/2406.09387
- Tobias Weber, Jakob Dexl, David Rügamer, Michael Ingrisch, 18 Apr 2024 (v2), Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition, https://arxiv.org/abs/2404.09683
- Ruizhong Qiu, Jun-Gi Jang, Xiao Lin, Lihui Liu, Hanghang Tong, 11 Jan 2025, TUCKET: A Tensor Time Series Data Structure for Efficient and Accurate Factor Analysis over Time Ranges, https://arxiv.org/abs/2501.06647 (Tucker decomposition to optimize time-series forecasting.)
Vector Dot Product Optimization
The computation of a vector dot product, also called "scalar product", is the basis of all AI operations. Matrix multiplications are everywhere, and a matrix multiply operation is just a series of dot product operations. Each element in the result of a matrix multiplication is computed via a vector dot product between two vectors: a row in one matrix, and a column in the other matrix.
Given the importance of vector dot products to the speed of AI, various attempts have been made to speed it up. Options for speedup include hardware accelerated dot product, faster vector dot product algorithms and the use of approximations.
Research on dot product optimization includes:
- N Yamanaka, T Ogita, SM Rump, S Oishi, 2008, A parallel algorithm for accurate dot product, Parallel Computing, https://www.sciencedirect.com/science/article/pii/S016781910800032X, PDF: https://ogilab.w.waseda.jp/ogita/math/doc/2008_YaOgRuOi.pdf
- W Kamp, A Bainbridge-Smith, 2007, Multiply accumulate unit optimised for fast dot-product evaluation, 2007 International Conference on Field-Programmable Technology, https://ieeexplore.ieee.org/abstract/document/4439283/
- Chitta Ranjan May 9, 2019, Understanding the Kernel Trick with fundamentals, Towards Data Science https://towardsdatascience.com/truly-understanding-the-kernel-trick-1aeb11560769
- J Diffenderfer, D Osei-Kuffuor, H Menon, March 2021, A framework for error-bounded approximate computing, with an application to dot products, SIAM Journal on Scientific Computing, https://www.osti.gov/servlets/purl/1959416
- J Diffenderfer, D Osei-Kuffuor, H Menon, 2021, QDOT: Quantized dot product kernel for approximate high-performance computing, arXiv preprint arXiv:2105.00115, https://arxiv.org/abs/2105.00115
- Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre Jeannerod, Mioara Joldes, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, Serge Torres, 2018, Enhanced Floating-Point Sums, Dot Products, and Polynomial Values, In: Handbook of Floating-Point Arithmetic, pp. 163–192, https://link.springer.com/chapter/10.1007/978-3-319-76526-6_5
- NM Ho, DT Nguyen, JL Gustafson, WF Wong, 2023, Bedot: Bit Efficient Dot Product for Deep Generative Models, CoNGA 2023: Next Generation Arithmetic, pp. 19–37, https://link.springer.com/chapter/10.1007/978-3-031-32180-1_2, PDF: https://www.comp.nus.edu.sg/~wongwf/papers/CONGA23-Bedot.pdf
- Lucas Klemmer; Saman Froehlich; Rolf Drechsler; Daniel Große, 2021, XbNN: Enabling CNNs on edge devices by approximate on-chip dot product encoding, 2021 IEEE International Symposium on Circuits and Systems (ISCAS), https://ieeexplore.ieee.org/document/9401780, PDF: https://agra.informatik.uni-bremen.de/doc/konf/2021_ISCAS_XBNN.pdf
- Y. Nievergelt, Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit, ACM Trans. Math. Softw., 29 (2003), pp. 27–48, https://dl.acm.org/doi/10.1145/641876.641878
- S Graillat, V Ménissier-Morain, 2012, Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic, Information and Computation, Volume 216, July 2012, Pages 57-71, https://www.sciencedirect.com/science/article/pii/S0890540112000715
- AM Zaki, MH El-Shafey, AMB Eldin, 2010, A new architecture for accurate dot product of floating point numbers, The 2010 International Conference on Computer Engineering & Systems, https://ieeexplore.ieee.org/abstract/document/5674841/
- K He, R Barrio, L Chen, H Jiang, J Liu, T Gu, 2021, A Class of Fast and Accurate Multi-layer Block Summation and Dot Product Algorithms, IFIP International Conference on Network and Parallel Computing, NPC 2021: Network and Parallel Computing, pp. 64–75, https://link.springer.com/chapter/10.1007/978-3-030-93571-9_6
- S. Graillat, P. Langlois, N. Louvet, 15 September 2006, Choosing a twice more accurate dot product implementation, https://www.researchgate.net/publication/250769076_Choosing_a_Twice_More_Accurate_Dot_Product_Implementation, PDF: https://www-pequan.lip6.fr/~graillat/papers/icnaam06.pdf
- A Knofel, 1991, Fast hardware units for the computation of accurate dot products, Proceedings 10th IEEE Symposium on Computer Arithmetic, https://ieeexplore.ieee.org/document/145536, PDF: https://scholar.archive.org/work/cp6cgjq7g5enzfqoqtte2bb6k4/access/wayback/http://www.acsel-lab.com/arithmetic/papers/ARITH10/ARITH10_Knofel.pdf
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about:
- Low-rank matrices
- Sparse matrices
- Advanced AI Mathematics
- Zero-Multiplication Models
- Logarithmic Models
- Approximate Computing
- Inference Optimizations
- « Research Home