Aussie AI

AI Matrix Algebra

Last Updated 17 November, 2025

by David Spuler, Ph.D.

What is AI Matrix Algebra

AI matrix algebra is the use of mathematical algebra of matrices to optimize LLM inference engines. Neural networks run on matrices and vectors. The main primitive is matrix multiplication, with lots of multiplication and addition, and matrix multiplication is actually a lot of vector dot product computations. Modern AI frameworks and research papers tend to use abbreviations to talk about matrix multiplication (i.e. "MATMUL" operations), and GEMM (General Matrix Multiplication).

Various attempts to optimize models involve manipulating matrix algebra. Some of the research areas include:

Matrix multiplication algorithms (e.g. Strassen, Winograd, FFT)
Alternative and advanced types of matrices (e.g. Butterfly, Monarch)
Advanced matrix algebra
Approximate matrix multiplication algorithms
Vector dot product optimizations
Matrix/tensor decomposition (factorizing matrices into smaller sub-matrices)
Low-rank matrices (smaller factors of matrices)

The individual multiplication operations between numbers inside a matrix can also be optimized:

Alternatives to using arithmetic multiplication (e.g. bitshifting, logarithms, other fancy stuff)
Faster algorithms for scalar arithmetic multiplication (mostly hardware acceleration algorithms for chip designers)
Approximate scalar arithmetic multiplication (faster ways to multiply two numbers by allowing errors)

Matrix Algebra

Matrix algebra underpins most of AI model inference (and training). The term "tensor" in AI theory mainly refers to multi-dimensional matrices, and the multiplication algorithm is the standard one you learned in high school.

There are areas of active research to use matrix algebra in different ways to reduce the total number of multiplication operations. Some of the types of matrices include:

The theory of matrix algebra can be applied to neural networks, since matrix multiplication is at the core of inference and training. Inference of a model involves executing matrix multiplication on a vector of probabilities. And matrix multiplication involves multiplying a row of that matrix over a vector, which is actually computing the vector dot product of two vectors. Undearneath all of those vector iterations are the low-level multiplication operations on pairs of numbers, usually floating point but also possibly integers (in quantized models that use integer-only arithmetic).

General Research on Matrix Algebra Theory

Some of the papers on theory of matrix algebra include:

Paul Cull, A Matrix Algebra for Neural Nets, https://link.springer.com/chapter/10.1007/978-1-4757-0555-3_43
Xian-Da Zhang, A Matrix Algebra Approach to Artificial Intelligence, January 2020, DOI:10.1007/978-981-15-2770-8, ISBN: 978-981-15-2769-2, https://www.researchgate.net/publication/341581565_A_Matrix_Algebra_Approach_to_Artificial_Intelligence
Hillar, C. J. & Lim, L.-H., Most tensor problems are NP-hard. J. ACM 60, 1–39 (2013) https://arxiv.org/abs/0911.1393
Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean, Nov 2022, Efficiently Scaling Transformer Inference, Google Research, https://arxiv.org/abs/2211.05102
Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. Nov 2021, Data movement is all you need: A case study on optimizing transformers, Proceedings of Machine Learning and Systems, 3, 2021. https://arxiv.org/abs/2007.00072 Code: https://github.com/spcl/substation
C. Deng, S. Liao, Y. Xie, K. K. Parhi, X. Qian and B. Yuan, "PermDNN: Efficient compressed DNN architecture with permuted diagonal matrices", Proc. 51st Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO), pp. 189-202, Oct. 2018. https://arxiv.org/abs/2004.10936
Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson, 10 Jun 2024, Compute Better Spent: Replacing Dense Layers with Structured Matrices, https://arxiv.org/abs/2406.06248
Haque, S.A.; Choudhury, N.; Hossain, S. Matrix Multiplication with Diagonals: Structured Sparse Matrices and Beyond. In Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications, Jinan, China, 17–19 June 2023; pp. 69–76. https://doi.org/10.1145/3606043.3606053
Sardar Anisul Haque,Mohammad Tanvir Parvez, Shahadat Hossain, Jan 2024, GPU Algorithms for Structured Sparse Matrix Multiplication with Diagonal Storage Schemes, https://www.mdpi.com/1999-4893/17/1/31
Josh Alman, Zhao Song, 6 Oct 2023, How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation, https://arxiv.org/abs/2310.04064
Ishna Satyarth, Chao Yin, RuQing G. Xu, Devin A. Matthews, 15 Nov 2024, Skew-Symmetric Matrix Decompositions on Shared-Memory Architectures, https://arxiv.org/abs/2411.09859

Advanced Types of Matrices

The simplest ideas are that smaller matrices (lower-rank) and sparse matrices can require less computation. Various advanced subtypes of matrices with specific constraints have properties that allow a reduced number of multiplications in evaluating matrix multiplication or matrix-vector multiplication.

Lower-Rank Submatrices: There are many theoriest that typical models are "over-parameterized", which means their matrices are bigger than they need to be. The number of weights is quadratic in the dimensions of the matrix, so reducing to smaller, lower-rank matrices improves performance. There is much theory about finding the "smaller models" that are inside the "big models", but have accuracy not much worse than the larger matrices. This is sometimes called "parameter factorization" of models. The non-mathematical technique of "distillation" is also related, as it finds a smaller model with similar accuracy. See Low-rank matrix factorization research.

Sparse Matrices: The idea with sparse matrices is that if they have a lot of zero weights, then the inference algorithm is doing a lot of unnecessary multiplications by zero. Various algorithms aim to focus on only doing multiplication in the submatrices that have data. Sparsity is closely related to model pruning, and these matrix sparsity techniques can also be amplified using dynamic pruning of near-zero weights, to further increase the total number of zeros in the matrix. See Sparse matrices and sparsification.

Butterfly Matrices: These special matrices are one approach to use of matrix algebra to achieve sparsity. Research includes:

Beidi Chen, Tri Dao and Chris Ré, Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models, Jan 17, 2022, https://hazyresearch.stanford.edu/blog/2022-01-17-Sparsity-3-Pixelated-Butterfly
Keivan Alizadeh Vahid, Anish Prabhu, Ali Farhadi, Mohammad Rastegari Apr 2020, Butterfly Transform: An Efficient FFT Based Neural Architecture Design, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), https://ieeexplore.ieee.org/abstract/document/9157695/, https://arxiv.org/abs/1906.02256
L Zheng, G Puy, E Riccietti, P Pérez, R Gribonval, Butterfly factorization by algorithmic identification of rank-one blocks, July 2023, https://arxiv.org/abs/2307.00820
L Zheng, E Riccietti, R Gribonval, 2023, Efficient Identification of Butterfly Sparse Matrix Factorizations, SIAM Journal on Mathematics of Data ScienceVol. 5, Iss. 1 (2023), DOI 10.1137/22M1488727, https://epubs.siam.org/doi/abs/10.1137/22M1488727, https://arxiv.org/abs/2110.01230
J Peca-Medlin, T Trogdon, 2023, Growth factors of random butterfly matrices and the stability of avoiding pivoting, SIAM Journal on Matrix Analysis and Applications, 2023, https://epubs.siam.org/doi/abs/10.1137/22M148762X, https://arxiv.org/abs/2203.15921
D. Stott Parker, 1995, "Random butterfly transformations with applications in computational linear algebra", Technical report, UCLA Computer Science Dept, 1995. https://searchworks.stanford.edu/view/4640257
D. Stott Parker, 1995, A randomizing butterfly transformation useful in block matrix computations, Technical report, UCLA Computer Science Dept. https://searchworks.stanford.edu/view/4640258
Tatsuya Member and Member Kazuyoshi, "Bidirectional learning for neural network having butterfly structure", Systems and Computers in Japan, vol. 26, pp. 64-73, 04 1995. https://onlinelibrary.wiley.com/doi/abs/10.1002/scj.4690260407
Li Yingzhou, Yang Haizhao, R. Martin Eileen, L. Ho Kenneth and Ying Lexing, "Butterfly factorization", Multiscale Modeling Simulation, vol. 13, pp. 714-732, 2015. https://arxiv.org/abs/1502.01379
Dao Tri, Gu Albert, Eichhorn Matthew, Rudra Atri and Ré Christopher, Learning fast algorithms for linear transforms using butterfly factorizations, 2019. http://proceedings.mlr.press/v97/dao19a.html
Y Li, X Cheng, J Lu, Apr 2020, Butterfly-Net: Optimal function representation based on convolutional neural networks, arXiv preprint arXiv:1805.07451, https://arxiv.org/abs/1805.07451
Rui Lin, Jie Ran, King Hung Chiu, Graziano Chesi, Ngai Wong, Mar 2022, Deformable butterfly: A highly structured and sparse linear transform NeurIPS 2021, https://arxiv.org/abs/2203.13556, https://proceedings.neurips.cc/paper/2021/file/86b122d4358357d834a87ce618a55de0-Paper.pdf
N Ailon, O Leibovitch, V Nair, July 2021, Sparse linear networks with a fixed butterfly structure: theory and practice, Uncertainty in Artificial Intelligence, UAI 2021, https://arxiv.org/abs/2007.08864 https://proceedings.mlr.press/v161/ailon21a/ailon21a.pdf
Daniel Y. Fu, Elliot L. Epstein, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré, Feb 2023, Simple Hardware-Efficient Long Convolutions for Sequence Modeling, https://arxiv.org/abs/2302.06646 (FlashButterfly algorithm.)

Monarch Matrices: Monarch matrices are a superset of bufferfly matrices, named after the orange-winged monarch butterfly. These special types of matrices aim to exploit operations on submatrices to reduce the overall computational overhead of matrix multiplication.

Dan Fu, Simran Arora, Chris Ré, Monarch Mixer: Revisiting BERT, Without Attention or MLPs, Jul 25, 2023, https://hazyresearch.stanford.edu/blog/2023-07-25-m2-bert (Monarch Matrices) (Code: https://github.com/HazyResearch/m2)
Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré, Apr 2022, Monarch: Expressive Structured Matrices for Efficient and Accurate Training, Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. https://arxiv.org/abs/2204.00595 https://proceedings.mlr.press/v162/dao22a/dao22a.pdf
Sunil Babu Melingi, Ramesh Kumar Mojjada, C. Tamizhselvan, R. Surender, S. Yazhinian., 2022, A self-adaptive monarch butterfly optimization (MBO) algorithm based improved deep forest neural network model for detecting and classifying brain stroke lesions, Research on Biomedical Engineering volume 38, pages647–660 (2022), https://link.springer.com/article/10.1007/s42600-022-00214-2
Dan Fu, Simran Arora, Chris Ré, Jul 25, 2023, Monarch Mixer: A new model architecture for increased efficiency, Together AI blog, https://together.ai/blog/monarch-mixer, Code: https://github.com/HazyResearch/m2 (An implementation by Together AI of Stanford's Hazy Research AI engines using Monarch matrices.)
Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson, 10 Jun 2024, Compute Better Spent: Replacing Dense Layers with Structured Matrices, https://arxiv.org/abs/2406.06248
Mimoun Mohamed, Valentin Emiya, Caroline Chaux. 17 Jan 2025, Learning Permutations in Monarch Factorization. 2025. hal-04887483 https://hal.science/hal-04887483/document

Matrix/Tensor Factorization (Decomposition)

Methods to factorize or decompose larger matrices into smaller matrices (see low-rank matrices for optimization), or specific subtypes of matrices as above, require special algorithms. Some of the main theoretical decomposition algorithms are CANDECOMP/PARAFAC decomposition (CP decomposition) and Tucker decomposition.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer (2009), https://ieeexplore.ieee.org/abstract/document/5197422
Defu Lian, Rui Liu, Yong Ge, Kai Zheng, Xing Xie, and Longbing Cao. 2017. Discrete content-aware matrix factorization. In SIGKDD. 325–334, https://dl.acm.org/doi/10.1145/3097983.3098008
Andriy Mnih and Russ R Salakhutdinov. 2007. Probabilistic matrix factorization. NIPS 20 (2007), 1257–1264, https://dl.acm.org/doi/10.5555/2981562.2981720
Carroll, J Douglas and Chang, Jih-Jie. Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika, 35(3):283–319, 1970. https://link.springer.com/article/10.1007/BF02310791 (CANDECOMP/PARAFAC decomposition/factorization.)
Harshman, Richard A and Lundy, Margaret E. PARAFAC: Parallel factor analysis. Computational Statistics & Data Analysis, 18(1):39–72, 1994 https://www.sciencedirect.com/science/article/abs/pii/0167947394901325 (The P in CP decomposition/factorization.)
Shashua, Amnon and Hazan, Tamir. Non-negative tensor factorization with applications to statistics and computer vision. In Proceedings of the 22nd international conference on Machine learning, pp. 792–799. ACM, 2005. PDF: https://icml.cc/imls/conferences/2005/proceedings/papers/100_NonNegative_ShashuaHazan.pdf (CANDECOMP/PARAFAC decomposition/factorization.)
Tucker, Ledyard R. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3): 279–311, 1966 https://link.springer.com/article/10.1007/BF02289464 (Tucker decomposition.)
De Lathauwer, Lieven, De Moor, Bart, and Vandewalle, Joos. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000. https://epubs.siam.org/doi/10.1137/S0895479896305696 (Tucker decomposition.)
Kim, Y.-D. and Choi, S. Nonnegative Tucker decomposition. In Proceedings of the IEEE CVPR 2007 Workshop on Component Analysis Methods, Minneapolis, Minnesota, 2007. https://ieeexplore.ieee.org/document/4270403
H Li, J Choi, Y Kwon, JH Ahn, Oct 2023, A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models, IEEE Computer Architecture Letters, https://ieeexplore.ieee.org/abstract/document/10285300/ (Tiled version of matrix multiplication with SVD factorization.)
Megan Flynn, Alexander Wang, Dean Edward Alvarez, Christopher De Sa, Anil Damle, 29 May 2024, STAT: Shrinking Transformers After Training, https://arxiv.org/abs/2406.00061
Chakshu Moar, 2024, Compressing Language Models using Low-Rank Decomposition and Characterizing the Accuracy- Efficiency Trade-offs, Master of Science Thesis, Electrical and Computer Engineering, University of California, Irvine, USA, https://escholarship.org/content/qt0t6967h4/qt0t6967h4.pdf
Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen, 21 May 2024, Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression, https://arxiv.org/abs/2405.12591
Chakshu Moar, Michael Pellauer, Hyoukjun Kwon, 10 May 2024, Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models, https://arxiv.org/abs/2405.06626
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
Dolgov, R. Brasher, M. Perelshtein, 29 Jan 2024, TQCompressor: improving tensor decomposition methods in neural networks via permutations, V. Abronin, A. Naumov, D. Mazur, D. Bystrov, K. Tsarova, Ar. Melnikov, I. Oseledets, S. https://arxiv.org/abs/2401.16367 Code: https://huggingface.co/tq-ag/TQCompressedGPT2 Code: https://github.com/terra-quantum-public/TQCompressedGPT2 (A permutation-based enhancement to the Kronecker decomposition method.)
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 15 Mar 2024 (v5), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer (A large survey of a variety of LLM optimizations.)
Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim, Nov 2023, LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning, https://arxiv.org/abs/2311.12023
Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun, Dec 2023, ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models, https://arxiv.org/abs/2312.05821 Code: https://github.com/hahnyuan/ASVD4LLM
R Gribonval, T Mary, E Riccietti, 2023, Optimal quantization of rank-one matrices in floating-point arithmetic---with applications to butterfly factorizations https://inria.hal.science/hal-04125381/file/rank1_quant.pdf
H Fan, T Chau, SI Venieris, R Lee, 2022, Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design https://ieeexplore.ieee.org/abstract/document/9923888/ https://arxiv.org/pdf/2209.09570
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019, https://arxiv.org/abs/1909.11942 Code: https://github.com/google-research/ALBERT
Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta, 24 Apr 2024 (v2), Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward, https://arxiv.org/abs/2402.01799 Code: https://github.com/nyunAI/Faster-LLM-Survey
Jungi Lee, Wonbeom Lee, Jaewoong Sim, 16 Jun 2024, Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization, https://arxiv.org/abs/2406.12930 (Combining tensor decomposition and quantization with power-of-two scale factors.)
Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre, 24 Jun 2024, Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers, https://arxiv.org/abs/2406.16450 Code: https://github.com/CLAIRE-Labo/StructuredFFN/tree/main
Mirko Farina, Usman Ahmad, Ahmad Taha, Hussein Younes, Yusuf Mesbah, Xiao Yu, Witold Pedrycz, 2024, Sparsity in transformers: A systematic literature review, Neurocomputing, Volume 582, 14 May 2024, 127468, https://www.sciencedirect.com/science/article/abs/pii/S092523122400239X (General survey of sparsity methods, and techniques that create sparsity.)
Yao Yao, Zuchao Li, Hai Zhao, 21 May 2024, SirLLM: Streaming Infinite Retentive LLM, https://arxiv.org/abs/2405.12528 (Low-rank decomposition to compress KV cache heads.)
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Zeyu Zhang, Haiying Shen, 7 Aug 2024, Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference, https://arxiv.org/abs/2408.04107
Jiuxiang Gu, Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Junze Yin, 8 May 2024, Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers, https://arxiv.org/abs/2405.05219 (Attention optimization using multiple low-rank matrices.)
Sasindu Wijeratne, Rajgopal Kannan, Viktor Prasanna, 14 May 2024, Sparse MTTKRP Acceleration for Tensor Decomposition on GPU, https://arxiv.org/abs/2405.08470
Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat, 8 May 2024, A Sparse Tensor Generator with Efficient Feature Extraction, https://arxiv.org/abs/2405.04944 https://github.com/sparcityeu/feaTen https://github.com/sparcityeu/genTen
Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna, 31 Mar 2024 (v2), Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition, https://arxiv.org/abs/2403.07953
Jan Laukemann, Ahmed E. Helal, S. Isaac Geronimo Anderson, Fabio Checconi, Yongseok Soh, Jesmin Jahan Tithi, Teresa Ranadive, Brian J Gravelle, Fabrizio Petrini, Jee Choi, 11 Mar 2024, Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation, https://arxiv.org/abs/2403.06348
Nishant Yadav, May 2024, Efficient K-Nearet Neighbor Search With Black-Box Neural Similarity Functions, Ph.D. Dissertation, Manning College of Information and Computer Sciences, University of Massachusetts Amherst, https://scholarworks.umass.edu/bitstreams/5572c0b9-cd96-46d8-983d-f877c9d0e22e/download
Hongyaoxing Gu, 27 May 2024, LRAMM -- Low precision approximates GEMM via RSVD, https://arxiv.org/abs/2405.16917
Ignacio Hounie, Charilaos Kanatsoulis, Arnuv Tandon, Alejandro Ribeiro, 5 Oct 2024, LoRTA: Low Rank Tensor Adaptation of Large Language Models, https://arxiv.org/abs/2410.04060
Haoran Guan, Yuwei Fan, 9 Oct 2024, CholeskyQR for sparse matrices, https://arxiv.org/abs/2410.06525
D.Breen, Oct 2024, Towards Sustainable CNNs: Tensor Decompositions for Green AI Solutions: Exploring Energy Consumption of Large CNNs, Master's Thesis, Systems and Control & Robotics, Delft University of Technology, https://repository.tudelft.nl/file/File_8208301f-51ef-4edf-bd12-d6ec3d5a8711
Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin, 2024, MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition, 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Year: 2024, Pages: 1032-1047, DOI Bookmark: 10.1109/ISCA59077.2024.00079, https://www.computer.org/csdl/proceedings-article/isca/2024/265800b032/1Z3pCEBnapO
Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, 30 Jul 2024, Palu: Compressing KV-Cache with Low-Rank Projection, https://arxiv.org/abs/2407.21118 https://github.com/shadowpa0327/Palu
Shi, J., Shi, C. (2025). Improve LLM Inference Performance with Matrix Decomposition Strategies. In: Shi, Z., Witbrock, M., Tian, Q. (eds) Intelligence Science V. ICIS 2024. IFIP Advances in Information and Communication Technology, vol 720. Springer, Cham. https://doi.org/10.1007/978-3-031-71253-1_12 https://link.springer.com/chapter/10.1007/978-3-031-71253-1_12 (Speed up matrix operations with SVD and NMF via adaptive block sizing based on batching.)
Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu, 31 Oct 2024, BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments, https://arxiv.org/abs/2410.23918 https://github.com/xinghaow99/BitStack
Ishna Satyarth, Chao Yin, RuQing G. Xu, Devin A. Matthews, 15 Nov 2024, Skew-Symmetric Matrix Decompositions on Shared-Memory Architectures, https://arxiv.org/abs/2411.09859
Kwangryeol Park, Seulki Lee, 12 Dec 2024, SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization, https://arxiv.org/abs/2412.08894 (Gradient optimizer Adam optimized using low-rank matrix factorization.)
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Sanghyeon Park, Soo-Mook Moon, 8 Jan 2025, CURing Large Models: Compression via CUR Decomposition, https://arxiv.org/abs/2501.04211
Doruk Aksoy, David J. Gorsich, Shravan Veerapaneni, Alex A. Gorodetsky, 18 Sep 2023 (v2), An Incremental Tensor Train Decomposition Algorithm, https://arxiv.org/abs/2211.12487
Ryuta Matsuno, 14 Aug 2025, Source Component Shift Adaptation via Offline Decomposition and Online Mixing Approach, https://arxiv.org/abs/2508.10257
Rui Wu, Nikola Kovachki, Burigede Liu, 23 Jul 2025, A Learning-based Domain Decomposition Method, https://arxiv.org/abs/2507.17328
Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione, 24 Jul 2025, Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning, https://arxiv.org/abs/2505.05086
Jianhong Chen, Meng Zhao, Mostafa Reisi Gahrooei, Xubo Yue, 18 Jul 2025, Toward Temporal Causal Representation Learning with Tensor Decomposition, https://arxiv.org/abs/2507.14126
Quang-Binh Nguyen, Minh Luu, Quang Nguyen, Anh Tran, Khoi Nguyen, 18 Jul 2025, CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models, https://arxiv.org/abs/2507.13984
Z.Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z.F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan, 18 Jul 2025, DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition, https://arxiv.org/abs/2504.21801
Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, Hang Li, 21 Jul 2025, Solving Formal Math Problems by Decomposition and Iterative Reflection, https://arxiv.org/abs/2507.15225
Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang, 19 Jul 2025, Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model, https://arxiv.org/abs/2507.14668
Daniel Ayomide Olanrewaju, 20 Jul 2025, Partial Symmetry Enforced Attention Decomposition (PSEAD): A Group-Theoretic Framework for Equivariant Transformers in Biological Systems, https://arxiv.org/abs/2507.14908
Lu Chenggang, 10 Aug 2025, A Globally Optimal Analytic Solution for Semi-Nonnegative Matrix Factorization with Nonnegative or Mixed Inputs, https://arxiv.org/abs/2508.07134
Wenpeng Xing, Jie Chen, Zaifeng Yang, Tiancheng Zhao, Gaolei Li, Changting Lin, Yike Guo, Meng Han, 8 Aug 2025, CoDe-NeRF: Neural Rendering via Dynamic Coefficient Decomposition, https://arxiv.org/abs/2508.06632
Qin Xu, Lili Zhu, Xiaoxia Cheng, Bo Jiang, 9 Aug 2025, Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification, https://arxiv.org/abs/2508.06959
Matthew Fahrbach, Mehrdad Ghadiri, 8 Aug 2025, A Tight Lower Bound for the Approximation Guarantee of Higher-Order Singular Value Decomposition, https://arxiv.org/abs/2508.06693
Runshi Tang and Tamara Kolda and Anru R. Zhang, 9 Aug 2025, Tensor Decomposition with Unaligned Observations, https://arxiv.org/abs/2410.14046
Valentin Six, Evan Dufraisse, Ga\"el de Chalendar, 11 Aug 2025, DAGR: Decomposition Augmented Graph Retrieval with LLMs, https://arxiv.org/abs/2506.13380
Zhengqi Lin and Andrzej Ruszczy\'nski, 25 Jul 2025, Federated Calculation of the Free-Support Transportation Barycenter by Single-Loop Dual Decomposition, https://arxiv.org/abs/2507.19627
Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang, 28 Jul 2025, From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation, https://arxiv.org/abs/2507.20968
Sara M. Ichinaga, Steven L. Brunton, Aleksandr Y. Aravkin, J. Nathan Kutz, 26 Jul 2025, Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures, https://arxiv.org/abs/2507.19787
Yukino Terui, Yuka Inoue, Yohei Hamakawa, Kosuke Tatsumura, Kazue Kudo, 29 Jul 2025, Collaborative filtering based on nonnegative/binary matrix factorization, https://arxiv.org/abs/2410.10381
Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji, 31 Jul 2025, Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models, https://arxiv.org/abs/2501.08727
Steffen Limmer, Steffen Udluft, Clemens Otte, 31 Jul 2025, Neural-ANOVA: Analytical Model Decomposition using Automatic Integration, https://arxiv.org/abs/2408.12319
Agust\'in Borda, Juan Bautista Cabral, Gonzalo Giarda, Diego Nicol\'as Gimenez Irusta, Paula Pacheco, Alvaro Roy Schachner, 31 Jul 2025, Algorithmic Detection of Rank Reversals, Transitivity Violations, and Decomposition Inconsistencies in Multi-Criteria Decision Analysis, https://arxiv.org/abs/2508.00129
Willem Diepeveen, Jon Schwenk, Andrea Bertozzi, 1 Aug 2025, Latent Diffeomorphic Dynamic Mode Decomposition, https://arxiv.org/abs/2505.06351
Jun Lu, 1 Aug 2025, Matrix Decomposition and Applications, https://arxiv.org/abs/2201.00145
Hang Yin, Zipeng Liu, Xiaoyong Peng, Liyao Xiang, 4 Aug 2025, Graph Unlearning via Embedding Reconstruction -- A Range-Null Space Decomposition Approach, https://arxiv.org/abs/2508.02044
Amitava Das, Abhilekh Borah, Vinija Jain, Aman Chadha, 4 Aug 2025, AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization, https://arxiv.org/abs/2508.02079
Nicolas Langren\'e, Xavier Warin, Pierre Gruet, 3 Aug 2025, Fast Gaussian process inference by exact Mat\'ern kernel decomposition, https://arxiv.org/abs/2508.01864
Ziqin He, Mengqi Hu, Yifei Lou, Can Chen, 4 Aug 2025, Tensor Dynamic Mode Decomposition, https://arxiv.org/abs/2508.02627
Kang Du, Zhihao Liang, Yulin Shen and Zeyu Wang, 4 Aug 2025, GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors, https://arxiv.org/abs/2408.08524
Pusen Dong, Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li, 5 Aug 2025, From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning, https://arxiv.org/abs/2412.08920
Yang Li, Daniel Agyei Asante, Changsheng Zhao, Ernie Chang, Yangyang Shi, Vikas Chandra, 6 Aug 2025, Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications, https://arxiv.org/abs/2405.15877
Yossi Arjevani, Gal Vinograd, 6 Aug 2025, Symmetry & Critical Points for Symmetric Tensor Decomposition Problems, https://arxiv.org/abs/2306.07886
Alex Glushkovsky, 8 Aug 2025, Dual Signal Decomposition of Stochastic Time Series, https://arxiv.org/abs/2508.05915
Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Zhiying Li, Guanggang Geng, 6 Aug 2025, Log2Sig: Frequency-Aware Insider Threat Detection via Multivariate Behavioral Signal Decomposition, https://arxiv.org/abs/2508.05696
Luke Li, 18 Aug 2025, Deep Learning-Based Financial Time Series Forecasting via Sliding Window and Variational Mode Decomposition, https://arxiv.org/abs/2508.12565
Ying Huang, Yuanbin Man, Wenqi Jia, Zhengzhong Tu, Junzhou Huang, Miao Yin, 16 Aug 2025, AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition, https://arxiv.org/abs/2508.11870
Yuannuo Feng, Wenyong Zhou, Yuexi Lyu, Hanjie Liu, Zhengwu Liu, Ngai Wong, Wang Kang, 16 Aug 2025, HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware, https://arxiv.org/abs/2508.11935
Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.23176
Suryanarayana Sankagiri, Jalal Etesami, Matthias Grossglauser, 19 Aug 2025, Recommendations with Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization, https://arxiv.org/abs/2502.20033
N\'icolas Roque dos Santos, Dawon Ahn, Diego Minatel, Alneu de Andrade Lopes, Evangelos E. Papalexakis, 20 Aug 2025, Multi-view Graph Condensation via Tensor Decomposition, https://arxiv.org/abs/2508.14330
Sebastian Musia{\l}, Bartosz Zieli\'nski, Tomasz Danel, 20 Aug 2025, Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis, https://arxiv.org/abs/2508.15015
Muchammad Daniyal Kautsar, Afra Majida Hariono, Widyawan, Syukron Abu Ishaq Alfarozi and Kuntpong Wararatpanya, 21 Aug 2025, CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression, https://arxiv.org/abs/2508.16680
Paul Fogel (1), Christophe Geissler (1), George Luta (2) ((1) Data Services, Forvis Mazars, Levallois, France, (2) Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, USA), 22 Aug 2025, The Target Polish: A New Approach to Outlier-Resistant Non-Negative Matrix Factorization, https://arxiv.org/abs/2507.10484
Mathieu Godbout and Audrey Durand, 18 Jul 2025, On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes, https://arxiv.org/abs/2507.14005
Filip de Roos and Fabio Muratore, 28 Jul 2025, Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference, https://arxiv.org/abs/2507.20678
Michael Aichm\"uller, Hector Geffner, 15 Aug 2025, Sketch Decompositions for Classical Planning via Deep Reinforcement Learning, https://arxiv.org/abs/2412.08574
E. Khalafyan, A. E. Allahverdyan and A. Hovhannisyan, 3 Sep 2025, Nonnegative matrix factorization and the principle of the common cause, https://arxiv.org/abs/2509.03652
Lucius Bushnaq, Dan Braun, Lee Sharkey, 4 Sep 2025, Stochastic Parameter Decomposition, https://arxiv.org/abs/2506.20790
Minghui Huang, 31 Aug 2025, DecMetrics: Structured Claim Decomposition Scoring for Factually Consistent LLM Outputs, https://arxiv.org/abs/2509.04483
Ricardo Borsoi, Konstantin Usevich, Marianne Clausel, 25 Aug 2025, Low-Rank Tensor Decompositions for the Theory of Neural Networks, https://arxiv.org/abs/2508.18408
Paimon Goulart, Shaan Pakala, Evangelos Papalexakis, 26 Aug 2025, Efficiently Generating Multidimensional Calorimeter Data with Tensor Decomposition Parameterization, https://arxiv.org/abs/2508.19443
Amirhossein Sohrabbeig, Omid Ardakanian, and Petr Musilek, 26 Aug 2025, Forecasting Multivariate Urban Data via Decomposition and Spatio-Temporal Graph Analysis, https://arxiv.org/abs/2505.22474
Hancheng Min, Ren\'e Vidal, 28 Aug 2025, Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization, https://arxiv.org/abs/2508.20344
Til Spreuer, Josef Hoppe, Michael T. Schaub, 29 Aug 2025, Faster Inference of Cell Complexes from Flows via Matrix Factorization, https://arxiv.org/abs/2508.21372
Zhen Qin, Michael B. Wakin, and Zhihui Zhu, 29 Aug 2025, Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery, https://arxiv.org/abs/2401.02592
Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Yongseok Soh, Jesmin Jahan Tithi, Fabrizio Petrini, Jee Choi, 29 Aug 2025, ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition, https://arxiv.org/abs/2509.00280
Osama Ahmad, Lukas Wesemann, Fabian Waschkowski, Zubair Khalid, 31 Aug 2025, Robust Spatiotemporal Forecasting Using Adaptive Deep-Unfolded Variational Mode Decomposition, https://arxiv.org/abs/2509.00703
I. Shavindra Jayasekera, Jacob Si, Wenlong Chen, Filippo Valdettaro, A. Aldo Faisal, Yingzhen Li, 2 Sep 2025, Variational Uncertainty Decomposition for In-Context Learning, https://arxiv.org/abs/2509.02327
Zitong Wang, Hang Zhao, Qianyu Zhou, Xuequan Lu, Xiangtai Li, Yiren Song, 1 Sep 2025, DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers, https://arxiv.org/abs/2505.21541
Tarhib Al Azad and Shahana Ibrahim, 8 Sep 2025, Tackling the Noisy Elephant in the Room: Label Noise-robust Out-of-Distribution Detection via Loss Correction and Low-rank Decomposition, https://arxiv.org/abs/2509.06918
Yuxin Ren, Benyou Wang, Lifeng Shang, Xin Jiang, Qun Liu, 20 May 2022, Exploring Extreme Parameter Compression for Pre-trained Language Models, https://arxiv.org/abs/2205.10036 https://github.com/twinkle0331/Xcompression (Splitting FFNs into sub-FFNs added together, and also using matrix/tensor decomposition.)
Yi Liu and Xiangrong Zhu and Xiangyu Liu and Wei Wei and Wei Hu, 9 Sep 2025, Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition, https://arxiv.org/abs/2509.07555
Weibin Li, Wendu Li, Quanying Liu, 27 Aug 2025, DCHO: A Decomposition-Composition Framework for Predicting Higher-Order Brain Connectivity to Enhance Diverse Downstream Applications, https://arxiv.org/abs/2509.09696
Chaeyun Ko, 11 Sep 2025, STRIDE: Scalable and Interpretable XAI via Subset-Free Functional Decomposition, https://arxiv.org/abs/2509.09070
Weibin Feng, Ran Tao, John Cartlidge, Jin Zheng, 18 Sep 2025, VMDNet: Time Series Forecasting with Leakage-Free Samplewise Variational Mode Decomposition and Multibranch Decoding, https://arxiv.org/abs/2509.15394
Het Patel, Muzammil Allie, Qian Zhang, Jia Chen, and Evangelos E. Papalexakis, 19 Sep 2025, Robust Vision-Language Models via Tensor Decomposition: A Defense Against Adversarial Attacks, https://arxiv.org/abs/2509.16163
Simon Klaes, Axel Klawonn, Natalie Kubicki, Martin Lanser, Kengo Nakajima, Takashi Shimokawabe, and Janine Weber, 19 Sep 2025, A Flow-rate-conserving CNN-based Domain Decomposition Method for Blood Flow Simulations, https://arxiv.org/abs/2509.15900
Danielle Cohen, Yoni Halpern, Noam Kahlon, Joel Oren, Omri Berkovitch, Sapir Caduri, Ido Dagan, Anatoly Efros, 15 Sep 2025, Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition, https://arxiv.org/abs/2509.12423
Shiyi Luo, Mingshuo Liu, Yifeng Yu, Shangping Ren, Yu Bai, 16 Sep 2025, An Adaptive Tensor-Train Decomposition Approach for Efficient Deep Neural Network Compression, https://arxiv.org/abs/2408.01534
Chandler Jones, Mark Bandstra, Stefan Faaland, Yue Shi Lai, Nico Abgrall, Scott Suchyta, Reynold Cooper, 16 Sep 2025, Real-time, Adaptive Radiological Anomaly Detection and Isotope Identification Using Non-negative Matrix Factorization, https://arxiv.org/abs/2507.10715
Soojin Park, Suyeon Kang, Chioun Lee, 12 Sep 2025, Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions, https://arxiv.org/abs/2506.19010
Jian Chen, Zhenyan Chen, Xuming Hu, Peilin Zhou, Yining Hua, Han Fang, Cissy Hing Yee Choy, Xinmei Ke, Jingfeng Luo, Zixuan Yuan, 18 Sep 2025, DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction, https://arxiv.org/abs/2509.14507
Yang Xu, Junpeng Li, Changchun Hua, and Yana Yang, 18 Sep 2025, Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition, https://arxiv.org/abs/2509.14577
Monika Henzinger, Nikita P. Kalinin, Jalaj Upadhyay, 17 Sep 2025, Normalized Square Root: Sharper Matrix Factorization Bounds for Differentially Private Continual Counting, https://arxiv.org/abs/2509.14334
Sam Yu-Te Lee, Chenyang Ji, Shicheng Wen, Lifu Huang, Dongyu Liu, Kwan-Liu Ma, 10 Sep 2025, VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents, https://arxiv.org/abs/2506.21582
Vasily Volkov, James Demmel, May 13, 2008, LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs, Electrical Engineering and Computer Sciences, University of California at Berkeley, Technical Report No. UCB/EECS-2008-49, http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.htm https://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.pdf
Jialin Zhao, 2 Oct 2025, Accelerating Attention with Basis Decomposition, https://arxiv.org/abs/2510.01718
Yupei Li, Philipp Borchert, Gerasimos Lampouras, 13 Oct 2025, TopoAlign: A Framework for Aligning Code to Math via Topological Decomposition, https://arxiv.org/abs/2510.11944
Rachita Mondal, Mert Indibi, Tapabrata Maiti and Selin Aviyente, 1 Oct 2025, Robust Spatiotemporally Contiguous Anomaly Detection Using Tensor Decomposition, https://arxiv.org/abs/2510.00460
Vikas Dwivedi, Enrico Schiassi, Monica Sigovan, Bruno Sixou, 1 Oct 2025, Gated X-TFC: Soft Domain Decomposition for Forward and Inverse Problems in Sharp-Gradient PDEs, https://arxiv.org/abs/2510.01039
Edmund Bu and Yossi Gandelsman, 24 Sep 2025, Interpreting ResNet-based CLIP via Neuron-Attention Decomposition, https://arxiv.org/abs/2509.19943
Tom Heskes, 23 Sep 2025, Bias-variance decompositions: the exclusive privilege of Bregman divergences, https://arxiv.org/abs/2501.18581
Tingyue Pan, Mingyue Cheng, Shilong Zhang, Zhiding Liu, Xiaoyu Tao, Yucong Luo, Jintao Zhang, Qi Liu, 28 Oct 2025, OneCast: Structured Decomposition and Modular Generation for Cross-Domain Time Series Forecasting, https://arxiv.org/abs/2510.24028
Xianjun Gao, Jianchun Liu, Hongli Xu, Liusheng Huang, 28 Oct 2025, Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion, https://arxiv.org/abs/2510.24390
Fujiang Yuan, Yangrui Fan, Xiaohuan Bing, Zhen Tian, Chunhong Yuan, Yankang Li, 26 Oct 2025, Traffic flow forecasting, STL decomposition, Hybrid model, LSTM, ARIMA, XGBoost, Intelligent transportation systems, https://arxiv.org/abs/2510.23668
Xiangfei Qiu, Xingjian Wu, Hanyin Cheng, Xvyuan Liu, Chenjuan Guo, Jilin Hu, Bin Yang, 27 Oct 2025, DBLoss: Decomposition-based Loss Function for Time Series Forecasting, https://arxiv.org/abs/2510.23672
Enze Shi, Pankaj Bhagwat, Zhixian Yang, Linglong Kong, Bei Jiang, 27 Oct 2025, Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis, https://arxiv.org/abs/2510.23935
Lukas Schynol, Marius Pesavento, 28 Oct 2025, Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling, https://arxiv.org/abs/2409.11529
Lukas Miklautz, Chengzhi Shi, Andrii Shkabrii, Theodoros Thirimachos Davarakis, Prudence Lam, Claudia Plant, Jennifer Dy, Stratis Ioannidis, 23 Oct 2025, H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition, https://arxiv.org/abs/2510.20627
James Oldfield, Shawn Im, Sharon Li, Mihalis A. Nicolaou, Ioannis Patras, Grigorios G Chrysos, 22 Oct 2025, Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders, https://arxiv.org/abs/2505.21364
Josef Dick, Seungchan Ko, Quoc Thong Le Gia, Kassem Mustapha and Sanghyeon Park, 23 Oct 2025, A decomposition-based robust training of physics-informed neural networks for nearly incompressible linear elasticity, https://arxiv.org/abs/2505.21994
Aur\'elien Bellet, Edwige Cyffers, Davide Frey, Romaric Gaudel, Dimitri Ler\'ev\'erend, Fran\c{c}ois Ta\"iani, 20 Oct 2025, Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization, https://arxiv.org/abs/2510.17480
Emmanuel Boadi, 11 Sep 2025, Bitcoin Price Forecasting Based on Hybrid Variational Mode Decomposition and Long Short Term Memory Network, https://arxiv.org/abs/2510.15900
Deliang Wei, Peng Chen, Haobo Xu, Jiale Yao, Fang Li, Tieyong Zeng, 18 Oct 2025, Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Inverse Problems, https://arxiv.org/abs/2505.08909
Jan Philipp Schneider and Pratik Singh Bisht and Ilya Chugunov and Andreas Kolb and Michael Moeller and Felix Heide, 19 Sep 2025, Neural Atlas Graphs for Dynamic Scene Decomposition and Editing, https://arxiv.org/abs/2509.16336
Ali Aghababaei-Harandi, Massih-Reza Amini, 21 Sep 2025, Unified Framework for Pre-trained Neural Network Compression via Decomposition and Optimized Rank Selection, https://arxiv.org/abs/2409.03555
J. Jon Ryu, Samuel Zhou, Gregory W. Wornell, 24 Oct 2025, Revisiting Orbital Minimization Method for Neural Operator Decomposition, https://arxiv.org/abs/2510.21952
Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, Jixuan Chen, Wenjing Hu, Xinyuan Wang, Yuhui Xu, Zekun Wang, Yiheng Xu, Junli Wang, Doyen Sahoo, Tao Yu, Caiming Xiong, 24 Oct 2025, Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis, https://arxiv.org/abs/2505.13227
Faraz Tahmasebi, Michael Pelluer, Hyoukjun Kwon, 15 Oct 2025, D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of Activations, https://arxiv.org/abs/2510.13147
Nassim Walha, Sebastian G. Gruber, Thomas Decker, Yinchong Yang, Alireza Javanmardi, Eyke H\"ullermeier, Florian Buettner, 26 Sep 2025, Fine-Grained Uncertainty Decomposition in Large Language Models: A Spectral Approach, https://arxiv.org/abs/2509.22272
Zhizun Wang and David Meger, 25 Sep 2025, VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model, https://arxiv.org/abs/2309.04615
Jingbo Yang, Bairu Hou, Wei Wei, Shiyu Chang, Yujia Bao, 8 Oct 2025, WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks, https://arxiv.org/abs/2510.06587
Yasaman Torabi, Shahram Shirani, James P. Reilly, 8 Oct 2025, Chem-NMF: Multi-layer $\alpha$-divergence Non-Negative Matrix Factorization for Cardiorespiratory Disease Clustering, with Improved Convergence Inspired by Chemical Catalysts and Rigorous Asymptotic Analysis, https://arxiv.org/abs/2510.06632
Baptiste Ferrere (EDF R\&D PRISME, IMT, SINCLAIR AI Lab), Nicolas Bousquet (EDF R\&D PRISME, SINCLAIR AI Lab, LPSM (UMR\_8001)), Fabrice Gamboa (IMT), Jean-Michel Loubes (IMT), Joseph Mur\'e (EDF R\&D PRISME), 8 Oct 2025, Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition, https://arxiv.org/abs/2510.07088
Namasi G Sankar, Georgios Miliotis, Simon Caton, 3 Oct 2025, Scalable Quantum Optimisation using HADOF: Hamiltonian Auto-Decomposition Optimisation Framework, https://arxiv.org/abs/2510.02926
Anil Kamber, Rahul Parhi, 2 Oct 2025, Sharpness of Minima in Deep Matrix Factorization: Exact Expressions, https://arxiv.org/abs/2509.25783
Roxana Petcu, Kenton Murray, Daniel Khashabi, Evangelos Kanoulas, Maarten de Rijke, Dawn Lawrie, Kevin Duh, 21 Oct 2025, Query Decomposition for RAG: Balancing Exploration-Exploitation, https://arxiv.org/abs/2510.18633
D. Halatsis, P. Mamidanna, J. Pereira, and D. Farina, 29 Sep 2025, A Biophysical-Model-Informed Source Separation Framework For EMG Decomposition, https://arxiv.org/abs/2510.17822
Michael James McCulloch, 20 Oct 2025, Self-Evidencing Through Hierarchical Gradient Decomposition: A Dissipative System That Maintains Non-Equilibrium Steady-State by Minimizing Variational Free Energy, https://arxiv.org/abs/2510.17916
Shuodi Liu, Yingzhuo Liu, Zi Wang, Yusheng Wang, Huijia Wu, Liuyu Xiang, Zhaofeng He, 20 Oct 2025, Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models, https://arxiv.org/abs/2510.17922
Angelo Giorgio, Riki Nagasawa, Shuta Yokoi, Tomoyuki Obuchi, Hajime Yoshino, 18 Oct 2025, Graphical model for tensor factorization by sparse sampling, https://arxiv.org/abs/2510.17886
Stelios Triantafyllou, Aleksa Sukovic, Yasaman Zolfimoselo, Goran Radanovic, 21 Oct 2025, Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making, https://arxiv.org/abs/2410.12539
Jiaqi Tang, Yinsong Xu, Yang Liu, Qingchao Chen, 25 Sep 2025, Shaping Initial State Prevents Modality Competition in Multi-modal Fusion: A Two-stage Scheduling Framework via Fast Partial Information Decomposition, https://arxiv.org/abs/2509.20840
Zhiteng Li, Mingyuan Xia, Jingyuan Zhang, Zheng Hui, Haotong Qin, Linghe Kong, Yulun Zhang, Xiaokang Yang, 25 Sep 2025, AdaSVD: Adaptive Singular Value Decomposition for Large Language Models, https://arxiv.org/abs/2502.01403
Bruno Viti, Elias Karabelas, Martin Holler, 25 Sep 2025, CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition, https://arxiv.org/abs/2505.14113
Liang Lin, Zhihao Xu, Junhao Dong, Jian Zhao, Yuchen Yuan, Guibin Zhang, Miao Yu, Yiming Zhang, Zhengtao Yao, Huahui Yi, Dongrui Liu, Xinfeng Li, Kun Wang, 29 Sep 2025, OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment, https://arxiv.org/abs/2509.24610
Zixu Wang and Hongbin Dong and Xiaoping Zhang, 29 Sep 2025, DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting, https://arxiv.org/abs/2509.24800
Wei Zhang, Qiufan Lin, Yuan-Sen Ting, Shupei Chen, Hengxin Ruan, Song Li, Yifan Wang, 28 Sep 2025, Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition, https://arxiv.org/abs/2509.23901
Panqi Chen, Lei Cheng, Jianlong Li, Weichang Li, Weiqing Liu, Jiang Bian, Shikai Fang, 29 Sep 2025, Functional Complexity-adaptive Temporal Tensor Decomposition, https://arxiv.org/abs/2502.06164
Mustafa Musab, Joseph K. Chege, Arie Yeredor and Martin Haardt, 28 Sep 2025, A Unified MDL-based Binning and Tensor Factorization Framework for PDF Estimation, https://arxiv.org/abs/2504.18686
Jiaqi Han, Austin Wang, Minkai Xu, Wenda Chu, Meihua Dang, Yisong Yue, Stefano Ermon, 27 Sep 2025, Discrete Diffusion Trajectory Alignment via Stepwise Decomposition, https://arxiv.org/abs/2507.04832
Christopher Salazar, Krithika Manohar, and Ashis G. Banerjee, 17 Oct 2025, Online Kernel Dynamic Mode Decomposition for Streaming Time Series Forecasting with Adaptive Windowing, https://arxiv.org/abs/2510.15404
Filip Landgren, 28 Sep 2025, Quantifying constraint hierarchies in Bayesian PINNs via per-constraint Hessian decomposition, https://arxiv.org/abs/2510.03278
Wenyuan Zhao, Adithya Balachandran, Chao Tian, and Paul Pu Liang, 6 Oct 2025, Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions, https://arxiv.org/abs/2510.04417
Allen Daniel Sunny, 6 Oct 2025, StructuralDecompose: A Modular Framework for Robust Time Series Decomposition in R, https://arxiv.org/abs/2510.04974
Chenxiang Luo, David K.Y. Yau, Qun Song, 1 Oct 2025, SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition, https://arxiv.org/abs/2510.03319
William Zhang, Saurabh Amin, Georgia Perakis, 6 Oct 2025, Modular and Adaptive Conformal Prediction for Sequential Models via Residual Decomposition, https://arxiv.org/abs/2510.04406
Jonathan Light, Wei Cheng, Benjamin Riviere, Wu Yue, Masafumi Oyamada, Mengdi Wang, Yisong Yue, Santiago Paternain, and Haifeng Chen, 6 Oct 2025, DISC: Dynamic Decomposition Improves LLM Inference Scaling, https://arxiv.org/abs/2502.16706
Jonah Botvinick-Greenhouse, Wael H. Ali, Mouhacine Benosman, Saviz Mowlavi, 10 Oct 2025, AB-PINNs: Adaptive-Basis Physics-Informed Neural Networks for Residual-Driven Domain Decomposition, https://arxiv.org/abs/2510.08924
Yiyang Huang, Yizhou Wang, Yun Fu, 9 Oct 2025, D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition, https://arxiv.org/abs/2510.08818
Angshul Majumdar, 24 Oct 2025, A Unified Matrix Factorization Framework for Classical and Robust Clustering, https://arxiv.org/abs/2510.21172
Andrea Bonfanti, Ismael Medina, Roman List, Bj\"orn Staeves, Roberto Santana, Marco Ellero, 24 Oct 2025, PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling, https://arxiv.org/abs/2510.21262
Liyuan Mao, Haoran Xu, Amy Zhang, Weinan Zhang, Chenjia Bai, 24 Oct 2025, Information-Theoretic Reward Decomposition for Generalizable RLHF, https://arxiv.org/abs/2504.06020
Pravesh K. Kothari, Ankur Moitra, Alexander S. Wein, 23 Oct 2025, Overcomplete Tensor Decomposition via Koszul-Young Flattenings, https://arxiv.org/abs/2411.14344
Le Ngoc Luyen and Marie-H\'el\`ene Abel, 13 Oct 2025, Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs, https://arxiv.org/abs/2510.11313
Cheng He, Xijie Liang, Zengrong Zheng, Patrick P.C. Lee, Xu Huang, Zhaoyi Li, Hong Xie, Defu Lian, Enhong Chen, 11 Oct 2025, A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting, https://arxiv.org/abs/2510.10145
Kenichi Satoh, 12 Oct 2025, Applying non-negative matrix factorization with covariates to label matrix for classification, https://arxiv.org/abs/2510.10375
Alex Ayoub, Samuel Robertson, Dawen Liang, Harald Steck, Nathan Kallus, 12 Oct 2025, Does Weighting Improve Matrix Factorization for Recommender Systems?, https://arxiv.org/abs/2510.10440
Tianle Zhou, Jiakai Xu, Guanhong Liu, Jiaxiang Liu, Haonan Wang, Eugene Wu, 13 Oct 2025, An approach for systematic decomposition of complex llm tasks, https://arxiv.org/abs/2510.07772
Andersen Ang, Waqas Bin Hamed, Hans De Sterck, 22 Sep 2025, Sum-of-norms regularized Nonnegative Matrix Factorization, https://arxiv.org/abs/2407.00706
Lin Xv, Jingsheng Gao, Xian Gao, Ting Li, Yuzhuo Fu, 22 Oct 2025, CPSVD: Enhancing Large Language Model Compression via Column-Preserving Singular Value Decomposition, https://arxiv.org/abs/2510.19385
Lawrence Liu, Alexander Liu, Mengdi Wang, Tuo Zhao, Lin F. Yang, 7 Oct 2025, ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization, https://arxiv.org/abs/2510.05528
Nikita P. Kalinin, Ryan McKenna, Jalaj Upadhyay, Christoph H. Lampert, 7 Oct 2025, Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD, https://arxiv.org/abs/2505.12128
Amel Abdelraheem, Alessandro Favero, Gerome Bovet, Pascal Frossard, 16 Oct 2025, Backdoor Unlearning by Linear Task Decomposition, https://arxiv.org/abs/2510.14845

Low-Rank Matrix Factorization

Matrix factorization (decomposition) can be used to find low-rank matrices that approximate the larger sets of weights.

Genta Indra Winata, Andrea Madotto, Jamin Shin, Elham J Barezi, and Pascale Fung. 2019. On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression. arXiv preprint arXiv:1908.09982. https://arxiv.org/abs/1908.09982
Ashish Khetan and Zohar Karnin. “schubert: Optimizing elements of bert”. arXiv preprint arXiv:2005.06628 (2020) https://arxiv.org/abs/2005.06628
Jian Xue, Jinyu Li, and Yifan Gong. 2013. Restructuring of deep neural network acoustic models with singular value decomposition. In Interspeech, pages 2365–2369. https://www.academia.edu/72568360/Restructuring_of_deep_neural_network_acoustic_models_with_singular_value_decomposition
Patrick Chen, Si Si, Yang Li, Ciprian Chelba, and Cho-Jui Hsieh. 2018. GroupReduce: Block-wise low-rank approximation for neural language model shrinking. In Advances in Neural Information Processing Systems, pages 10988–10998. https://arxiv.org/abs/1806.06950
Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7370–7379. PDF: https://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_On_Compressing_Deep_CVPR_2017_paper.pdf
Zi Lin, Jeremiah Zhe Liu, Zi Yang, Nan Hua, Dan Roth. “Pruning Redundant Mappings in Transformer Models via SpectralNormalized Identity Prior”. arXiv preprint arXiv:2010.01791 (2020) https://arxiv.org/abs/2010.01791
Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. https://arxiv.org/abs/2303.10512, Code: https://github.com/QingruZhang/AdaLoRA
Arnav Chavan, Zhuang Liu, Deepak K. Gupta, Eric P. Xing, and Zhiqiang Shen. One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning. CoRR, abs/2306.07967, 2023. https://arxiv.org/abs/2306.07967
Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation, In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 3266–3279. Association for Computational Linguistics, 2023. https://arxiv.org/abs/2210.07558
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, LoRA: Low-Rank Adaptation of Large Language Models, In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. https://arxiv.org/abs/2106.09685
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, and Sanjeev Khudanpur. Semi-orthogonal low-rank matrix factorization for deep neural networks. In B. Yegnanarayana, editor, Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, pages 3743– 3747. ISCA, 2018. PDF: https://www.isca-speech.org/archive/pdfs/interspeech_2018/povey18_interspeech.pdf
Yerlan Idelbayev; Miguel Á. Carreira-Perpiñán, Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 8046–8056. Computer Vision Foundation / IEEE, 2020. https://ieeexplore.ieee.org/document/9157223/
Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang, A Survey on Model Compression for Large Language Models, arXiv preprint arXiv:2308.07633, Aug 2023 https://arxiv.org/abs/2308.07633 (Recent 2023 survey paper on various model compression approaches including low-rank matrices.)

Singular Value Decomposition (SVD)

SVD is a specific type of matrix decomposition. Research papers on SVD:

Zeyu Zhang, Haiying Shen, 7 Aug 2024, Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference, https://arxiv.org/abs/2408.04107
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, 30 Jul 2024, Palu: Compressing KV-Cache with Low-Rank Projection, https://arxiv.org/abs/2407.21118 https://github.com/shadowpa0327/Palu
Hongyaoxing Gu, 27 May 2024, LRAMM -- Low precision approximates GEMM via RSVD, https://arxiv.org/abs/2405.16917
Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao 12 Aug 2024 (v3), A Survey on LoRA of Large Language Models, https://arxiv.org/abs/2407.11046 https://github.com/ZJU-LLMs/Awesome-LoRAs.git
Shi, J., Shi, C. (2025). Improve LLM Inference Performance with Matrix Decomposition Strategies. In: Shi, Z., Witbrock, M., Tian, Q. (eds) Intelligence Science V. ICIS 2024. IFIP Advances in Information and Communication Technology, vol 720. Springer, Cham. https://doi.org/10.1007/978-3-031-71253-1_12 https://link.springer.com/chapter/10.1007/978-3-031-71253-1_12 (Speed up matrix operations with SVD and NMF via adaptive block sizing based on batching.)
Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu, 31 Oct 2024, BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments, https://arxiv.org/abs/2410.23918 https://github.com/xinghaow99/BitStack
Shengwen Ding, Chenhui Hu, 24 Nov 2024, eFedLLM: Efficient LLM Inference Based on Federated Learning, https://arxiv.org/abs/2411.16003
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Hong Yankun, Li Xing, Zhen Hui-Ling, Yu Xianzhi, Liu Wulong, Yuan Mingxuan, 21 Feb 2025, SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention, https://arxiv.org/abs/2502.15304
Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang, 16 Mar 2025, SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression, https://arxiv.org/abs/2503.12340 https://github.com/AIoT-MLSys-Lab/SVD-LLM
Jiujun He, Huazhen Lin, 10 Jun 2025, Olica: Efficient Structured Pruning of Large Language Models without Retraining, https://arxiv.org/abs/2506.08436
Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma, 29 Jul 2025, Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration, https://arxiv.org/abs/2507.22170
Jiayu Fang, Zhiqi Shao, S T Boris Choy, Junbin Gao, 19 Aug 2025, SVDformer: Direction-Aware Spectral Graph Embedding Learning via SVD and Transformer, https://arxiv.org/abs/2508.13435
Mete Erdogan, Sebnem Demirtas, 25 Aug 2025, SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features, https://arxiv.org/abs/2504.20970
Johannes J. Brust and Michael A. Saunders, 2 Sep 2025, Fast and Accurate SVD-Type Updating in Streaming Data, https://arxiv.org/abs/2509.02840
Daniel D. Li, May 2025, Efficient ML Inference via Matrix-Vector Approximations, Master's Thesis, Department of Electrical Engineering and Computer Science, MIT, https://dspace.mit.edu/bitstream/handle/1721.1/162737/li-ddl-meng-eecs-2025-thesis.pdf?sequence=1&isAllowed=y
Abdulla Jasem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horv\'ath, Martin Tak\'a\v{c}, 24 Sep 2025, Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update, https://arxiv.org/abs/2509.19977
Shen Yuan, Yin Zheng, Taifeng Wang, Binbin Liu, and Hongteng Xu, 23 Oct 2025, MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation, https://arxiv.org/abs/2506.14436
Boya Xiong, Shuo Wang, Weifeng Ge, Guanhua Chen, Yun Chen, 27 Sep 2025, Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization, https://arxiv.org/abs/2506.11087
Minchan Jeong, J. Jon Ryu, Se-Young Yun, Gregory W. Wornell, 24 Oct 2025, Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems, https://arxiv.org/abs/2507.07222
Yassir Jedra, Devavrat Shah, 11 Oct 2025, $k$-SVD with Gradient Descent, https://arxiv.org/abs/2502.00320
Lin Xv, Jingsheng Gao, Xian Gao, Ting Liu, Yuzhuo Fu, 22 Oct 2025, ARA: Adaptive Rank Allocation for Efficient Large Language Model SVD Compression, https://arxiv.org/abs/2510.19389

Tucker Decomposition

Research papers on Tucker decomposition:

Chakshu Moar, 2024, Compressing Language Models using Low-Rank Decomposition and Characterizing the Accuracy- Efficiency Trade-offs, Master of Science Thesis, Electrical and Computer Engineering, University of California, Irvine, USA, https://escholarship.org/content/qt0t6967h4/qt0t6967h4.pdf
Chakshu Moar, Michael Pellauer, Hyoukjun Kwon, 10 May 2024, Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models, https://arxiv.org/abs/2405.06626
Tucker, Ledyard R. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3): 279–311, 1966 https://link.springer.com/article/10.1007/BF02289464 (Tucker decomposition.)
De Lathauwer, Lieven, De Moor, Bart, and Vandewalle, Joos. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000. https://epubs.siam.org/doi/10.1137/S0895479896305696 (Tucker decomposition.)
Kim, Y.-D. and Choi, S. Nonnegative Tucker decomposition. In Proceedings of the IEEE CVPR 2007 Workshop on Component Analysis Methods, Minneapolis, Minnesota, 2007. https://ieeexplore.ieee.org/document/4270403
Federica Stolf, Antonio Canale, 15 Nov 2024, Bayesian Adaptive Tucker Decompositions for Tensor Factorization, https://arxiv.org/abs/2411.10218
Matthew Pietrosanu, Bei Jiang, Linglong Kong, 13 Jun 2024, Oblivious subspace embeddings for compressed Tucker decompositions, https://arxiv.org/abs/2406.09387
Tobias Weber, Jakob Dexl, David Rügamer, Michael Ingrisch, 18 Apr 2024 (v2), Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition, https://arxiv.org/abs/2404.09683
Ruizhong Qiu, Jun-Gi Jang, Xiao Lin, Lihui Liu, Hanghang Tong, 11 Jan 2025, TUCKET: A Tensor Time Series Data Structure for Efficient and Accurate Factor Analysis over Time Ranges, https://arxiv.org/abs/2501.06647 (Tucker decomposition to optimize time-series forecasting.)
Yuxin Ren, Benyou Wang, Lifeng Shang, Xin Jiang, Qun Liu, 20 May 2022, Exploring Extreme Parameter Compression for Pre-trained Language Models, https://arxiv.org/abs/2205.10036 https://github.com/twinkle0331/Xcompression (Splitting FFNs into sub-FFNs added together, and also using matrix/tensor decomposition.)

Vector Dot Product Optimization

The computation of a vector dot product, also called "scalar product", is the basis of all AI operations. Matrix multiplications are everywhere, and a matrix multiply operation is just a series of dot product operations. Each element in the result of a matrix multiplication is computed via a vector dot product between two vectors: a row in one matrix, and a column in the other matrix.

Given the importance of vector dot products to the speed of AI, various attempts have been made to speed it up. Options for speedup include hardware accelerated dot product, faster vector dot product algorithms and the use of approximations.

Research on dot product optimization includes:

N Yamanaka, T Ogita, SM Rump, S Oishi, 2008, A parallel algorithm for accurate dot product, Parallel Computing, https://www.sciencedirect.com/science/article/pii/S016781910800032X, PDF: https://ogilab.w.waseda.jp/ogita/math/doc/2008_YaOgRuOi.pdf
W Kamp, A Bainbridge-Smith, 2007, Multiply accumulate unit optimised for fast dot-product evaluation, 2007 International Conference on Field-Programmable Technology, https://ieeexplore.ieee.org/abstract/document/4439283/
Chitta Ranjan May 9, 2019, Understanding the Kernel Trick with fundamentals, Towards Data Science https://towardsdatascience.com/truly-understanding-the-kernel-trick-1aeb11560769
J Diffenderfer, D Osei-Kuffuor, H Menon, March 2021, A framework for error-bounded approximate computing, with an application to dot products, SIAM Journal on Scientific Computing, https://www.osti.gov/servlets/purl/1959416
J Diffenderfer, D Osei-Kuffuor, H Menon, 2021, QDOT: Quantized dot product kernel for approximate high-performance computing, arXiv preprint arXiv:2105.00115, https://arxiv.org/abs/2105.00115
Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre Jeannerod, Mioara Joldes, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, Serge Torres, 2018, Enhanced Floating-Point Sums, Dot Products, and Polynomial Values, In: Handbook of Floating-Point Arithmetic, pp. 163–192, https://link.springer.com/chapter/10.1007/978-3-319-76526-6_5
NM Ho, DT Nguyen, JL Gustafson, WF Wong, 2023, Bedot: Bit Efficient Dot Product for Deep Generative Models, CoNGA 2023: Next Generation Arithmetic, pp. 19–37, https://link.springer.com/chapter/10.1007/978-3-031-32180-1_2, PDF: https://www.comp.nus.edu.sg/~wongwf/papers/CONGA23-Bedot.pdf
Lucas Klemmer; Saman Froehlich; Rolf Drechsler; Daniel Große, 2021, XbNN: Enabling CNNs on edge devices by approximate on-chip dot product encoding, 2021 IEEE International Symposium on Circuits and Systems (ISCAS), https://ieeexplore.ieee.org/document/9401780, PDF: https://agra.informatik.uni-bremen.de/doc/konf/2021_ISCAS_XBNN.pdf
Y. Nievergelt, Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit, ACM Trans. Math. Softw., 29 (2003), pp. 27–48, https://dl.acm.org/doi/10.1145/641876.641878
S Graillat, V Ménissier-Morain, 2012, Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic, Information and Computation, Volume 216, July 2012, Pages 57-71, https://www.sciencedirect.com/science/article/pii/S0890540112000715
AM Zaki, MH El-Shafey, AMB Eldin, 2010, A new architecture for accurate dot product of floating point numbers, The 2010 International Conference on Computer Engineering & Systems, https://ieeexplore.ieee.org/abstract/document/5674841/
K He, R Barrio, L Chen, H Jiang, J Liu, T Gu, 2021, A Class of Fast and Accurate Multi-layer Block Summation and Dot Product Algorithms, IFIP International Conference on Network and Parallel Computing, NPC 2021: Network and Parallel Computing, pp. 64–75, https://link.springer.com/chapter/10.1007/978-3-030-93571-9_6
S. Graillat, P. Langlois, N. Louvet, 15 September 2006, Choosing a twice more accurate dot product implementation, https://www.researchgate.net/publication/250769076_Choosing_a_Twice_More_Accurate_Dot_Product_Implementation, PDF: https://www-pequan.lip6.fr/~graillat/papers/icnaam06.pdf
A Knofel, 1991, Fast hardware units for the computation of accurate dot products, Proceedings 10th IEEE Symposium on Computer Arithmetic, https://ieeexplore.ieee.org/document/145536, PDF: https://scholar.archive.org/work/cp6cgjq7g5enzfqoqtte2bb6k4/access/wayback/http://www.acsel-lab.com/arithmetic/papers/ARITH10/ARITH10_Knofel.pdf