Aussie AI

Low-Rank Matrices

Last Updated 22 October, 2025

by David Spuler, Ph.D.

Low-rank matrices are matrices with smaller dimensions (i.e. fewer rows or columns). One form of model compression is to use matrix techniques to replace the large weight matrices with smaller "low-rank" matrices. This makes the model faster, but sometimes trades off decreased accuracy.

There are various approaches to find smaller matrices to replace a full-sized matrix. One approach is simply to look for matrices that are similar to the large model, but smaller. Another approach is to use "sparsification" to add a lot of zeros to the matrices, such that a smaller model can more easily replace it. Yet another approach is to use matrix algebra to "factorize" (also called "decompose") the large matrix into one or more smaller matrices (see also AI matrix algebra).

One common low-rank matrix technique has become popular, possibly because it's been given a friendly name: LoRA is "Low-Rank Adaptation" of matrices. If the model has been quantized first, then it is QLoRA, for "Quantized LoRA".

Singular Value Decomposition (SVD)

SVD is one of the methods of factorizing matrices into smaller sub-matrices. Research on SVD includes:

Zeyu Zhang, Haiying Shen, 7 Aug 2024, Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference, https://arxiv.org/abs/2408.04107
Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, 30 Jul 2024, Palu: Compressing KV-Cache with Low-Rank Projection, https://arxiv.org/abs/2407.21118 https://github.com/shadowpa0327/Palu
Hongyaoxing Gu, 27 May 2024, LRAMM -- Low precision approximates GEMM via RSVD, https://arxiv.org/abs/2405.16917
Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao 12 Aug 2024 (v3), A Survey on LoRA of Large Language Models, https://arxiv.org/abs/2407.11046 https://github.com/ZJU-LLMs/Awesome-LoRAs.git
Shi, J., Shi, C. (2025). Improve LLM Inference Performance with Matrix Decomposition Strategies. In: Shi, Z., Witbrock, M., Tian, Q. (eds) Intelligence Science V. ICIS 2024. IFIP Advances in Information and Communication Technology, vol 720. Springer, Cham. https://doi.org/10.1007/978-3-031-71253-1_12 https://link.springer.com/chapter/10.1007/978-3-031-71253-1_12 (Speed up matrix operations with SVD and NMF via adaptive block sizing based on batching.)
Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu, 31 Oct 2024, BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments, https://arxiv.org/abs/2410.23918 https://github.com/xinghaow99/BitStack
Shengwen Ding, Chenhui Hu, 24 Nov 2024, eFedLLM: Efficient LLM Inference Based on Federated Learning, https://arxiv.org/abs/2411.16003
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Hong Yankun, Li Xing, Zhen Hui-Ling, Yu Xianzhi, Liu Wulong, Yuan Mingxuan, 21 Feb 2025, SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention, https://arxiv.org/abs/2502.15304
Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang, 16 Mar 2025, SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression, https://arxiv.org/abs/2503.12340 https://github.com/AIoT-MLSys-Lab/SVD-LLM
Jiujun He, Huazhen Lin, 10 Jun 2025, Olica: Efficient Structured Pruning of Large Language Models without Retraining, https://arxiv.org/abs/2506.08436
Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma, 29 Jul 2025, Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration, https://arxiv.org/abs/2507.22170
Jiayu Fang, Zhiqi Shao, S T Boris Choy, Junbin Gao, 19 Aug 2025, SVDformer: Direction-Aware Spectral Graph Embedding Learning via SVD and Transformer, https://arxiv.org/abs/2508.13435
Mete Erdogan, Sebnem Demirtas, 25 Aug 2025, SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features, https://arxiv.org/abs/2504.20970
Johannes J. Brust and Michael A. Saunders, 2 Sep 2025, Fast and Accurate SVD-Type Updating in Streaming Data, https://arxiv.org/abs/2509.02840
Daniel D. Li, May 2025, Efficient ML Inference via Matrix-Vector Approximations, Master's Thesis, Department of Electrical Engineering and Computer Science, MIT, https://dspace.mit.edu/bitstream/handle/1721.1/162737/li-ddl-meng-eecs-2025-thesis.pdf?sequence=1&isAllowed=y

Research on Low-Rank Matrices

Li, Y.; Yu, Y.; Zhang, Q.; Liang, C.; He, P.; Chen, W.; and Zhao, T. 2023. LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation. In Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; and Scarlett, J., eds., Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, 20336–20350. PMLR. https://arxiv.org/abs/2306.11222
Ma, X.; Fang, G.; and Wang, X. 2023. LLM-Pruner: On the Structural Pruning of Large Language Models. arXiv:2305.11627. https://arxiv.org/abs/2305.11627 Code: https://github.com/horseee/LLM-Pruner (Pruning during training and LoRA.)
M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. BMVC, 2014, https://arxiv.org/abs/1405.3866, PDF: https://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14b/jaderberg14b.pdf
Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang and D. Shin, "Compression of deep convolutional neural networks for fast and low power mobile applications", arXiv:1511.06530, 2015. https://arxiv.org/abs/1511.06530 (Low-rank via Bayesian matrix factorization and Tucker decomposition.)
V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets and V. Lempitsky, "Speeding-up convolutional neural networks using fine-tuned CP-decomposition", arXiv:1412.6553, 2014. https://arxiv.org/abs/1412.6553
Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh, Dec 2022, KronA: Parameter Efficient Tuning with Kronecker Adapter, arXiv preprint arXiv:2212.10650, https://arxiv.org/abs/2212.10650 (Kronecker product for matrix decomposition.)
Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. 2022. DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation, arXiv preprint arXiv:2210.07558. https://arxiv.org/abs/2210.07558
Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035. https://arxiv.org/abs/2106.04647
Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia, Sep 2023, LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models, https://arxiv.org/abs/2309.12307 (Low-rank matrix attention allows up to 100k context windows.)
R Saha, V Srivastava, M Pilanci, 2023, Matrix Compression via Randomized Low Rank and Low Precision Factorization, 37th Conference on Neural Information Processing Systems (NeurIPS 2023), https://web.stanford.edu/~pilanci/papers/lplr.pdf
F Babiloni, T Tanay, J Deng, M Maggioni, S Zafeiriou, 2023, Factorized Dynamic Fully-Connected Layers for Neural Networks, ICCV workshop, https://openaccess.thecvf.com/content/ICCV2023W/RCV/papers/Babiloni_Factorized_Dynamic_Fully-Connected_Layers_for_Neural_Networks_ICCVW_2023_paper.pdf (Tensor decomposition into low-rank factors.)
Samuel Carreira, Tomás Marques, José Ribeiro, Carlos Grilo, Sep 2023, Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile, arXiv preprint arXiv:2310.01434, https://browse.arxiv.org/abs/2310.01434 (LoRA on a mobile platform.)
Tamara G Kolda and Brett W Bader, 2009, Tensor Decompositions and Applications, SIAM Rev. 51, 3 (2009), 455–500, https://epubs.siam.org/doi/abs/10.1137/07070111X (Analysis of various algorithms for tensor decomposition.)
Stephan Rabanser, Oleksandr Shchur, Stephan Günnemann, Nov 2017, Introduction to Tensor Decompositions and their Applications in Machine Learning, https://browse.arxiv.org/pdf/1711.10781.pdf
Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2016. Compression of deep convolutional neural networks for fast and low power mobile applications. In Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1511.06530 (Uses Tucker decomposition and Bayesian matrix factorization algorithms.)
Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao, Oct 2023, LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models, https://arxiv.org/abs/2310.08659 (QLoRA for LLMs.)
Chakshu Moar, Michael Pellauer, Hyoukjun Kwon, 10 May 2024, Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models, https://arxiv.org/abs/2405.06626
You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
Davis, Andrew and Arel, Itamar. 2013. Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv preprint arXiv:1312.4461, https://arxiv.org/abs/1312.4461
Y Hu, J Zhang, C Zhao, C Li, H Chen, 2023, Transformer Compression via Subspace Projection, arXiv preprint arXiv:2308.16475, https://arxiv.org/abs/2308.16475
Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A survey of model compression and acceleration for deep neural networks. CoRR, abs/1710.09282, 2017. https://arxiv.org/abs/1710.09282
Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson, 10 Jun 2024, Compute Better Spent: Replacing Dense Layers with Structured Matrices, https://arxiv.org/abs/2406.06248
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 15 Mar 2024 (v5), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer (A large survey of a variety of LLM optimizations.)
Arnav Chavan, Nahush Lele, Deepak Gupta, Dec 2023, Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models https://arxiv.org/abs/2312.07046 Code: https://github.com/transmuteAI/trailmet/tree/main/trailmet/algorithms/llm-rom
S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-attention with linear complexity,” CoRR, vol. abs/2006.04768, 2020. https://arxiv.org/abs/2006.04768 (Low-rank approximation of attention.)
Idelbayev, Y. and Carreira-Perpinan, M. A. (2020). Low-rank compression of neural nets: Learning the rank of each layer. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8046–8056. URL: https://openaccess.thecvf.com/content_CVPR_2020/html/Idelbayev_Low_Rank_Compression_of_Neural_Nets_Learning_the_Rank_of_Each_CVPR_2020_ paper.html
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861. URL: http://arxiv.org/abs/1704.04861.
Zhang, J., Lei, Q., and Dhillon, I. (2018). Stabilizing gradients for deep neural networks via efficient SVD parameterization. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5806–5814. PMLR. URL: http://proceedings.mlr.press/v80/zhang18g.html
K Nan, S Liu, J Du, H Liu, 2019, Deep model compression for mobile platforms: A survey, Tsinghua Science and Technology (Volume 24, Issue 6, December 2019), https://ieeexplore.ieee.org/abstract/document/8727762 PDF: https://ieeexplore.ieee.org/iel7/5971803/8727756/08727762.pdf
Zheng Qu, Liu Liu, Fengbin Tu, Zhaodong Chen, Yufei Ding, Yuan Xie, 2022, DOTA: Detect and Omit Weak Attentions for Scalable Transformer Acceleration, ASPLOS ’22, February 28 ś March 4, 2022, Lausanne, Switzerland, PDF: https://dl.acm.org/doi/pdf/10.1145/3503222.3507738
Ivan Markovsky, Aug 3, 2018, Low-Rank Approximation: Algorithms, Implementation, Applications (Communications and Control Engineering) Part of: Communications and Control Engineering (62 books), https://www.amazon.com/Low-Rank-Approximation-Implementation-Applications-Communications/dp/3319896199/
Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman, 9 Feb 2024 (v2), SliceGPT: Compress Large Language Models by Deleting Rows and Columns, Microsoft Research, https://arxiv.org/abs/2401.15024 Code: https://github.com/microsoft/TransformerCompression (Pruning of matrices effectively prunes along the width dimension and the "fourth" internal dimension of embeddings using techniques such as low-rank matrix factorization.)
Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He, 15 Feb 2024, Model Compression and Efficient Inference for Large Language Models: A Survey, https://arxiv.org/abs/2402.09748
Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos, MicroNet: Improving Image Recognition with Extremely Low FLOPs, 2021, https://ieeexplore.ieee.org/abstract/document/9857393 PDF: https://openaccess.thecvf.com/content/ICCV2021/papers/Li_MicroNet_Improving_Image_Recognition_With_Extremely_Low_FLOPs_ICCV_2021_paper.pdf
Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin, 2024, MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition, 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Year: 2024, Pages: 1032-1047, DOI Bookmark: 10.1109/ISCA59077.2024.00079, https://www.computer.org/csdl/proceedings-article/isca/2024/265800b032/1Z3pCEBnapO
Jiuxiang Gu, Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Junze Yin, 8 May 2024, Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers, https://arxiv.org/abs/2405.05219 (Attention optimization using multiple low-rank matrices.)
Canwen Xu, Julian McAuley, Nov 2022, A Survey on Model Compression and Acceleration for Pretrained Language Models, https://arxiv.org/abs/2202.07105
Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre, 24 Jun 2024, Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers, https://arxiv.org/abs/2406.16450 Code: https://github.com/CLAIRE-Labo/StructuredFFN/tree/main
Utkarsh Saxena, Gobinda Saha, Sakshi Choudhary, Kaushik Roy, 10 Aug 2024, Eigen Attention: Attention in Low-Rank Space for KV Cache Compression, https://arxiv.org/abs/2408.05646
Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou, 23 Aug 2024, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 (Training using low-rank matrices to approximate attention.)
Josh Alman, Zhao Song, 9 May 2023 (v2), Fast Attention Requires Bounded Entries, https://arxiv.org/abs/2302.13214 (Low-rank matrices in attention for fast inference.)
Josh Alman, Zhao Song, 6 Oct 2023, How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation, https://arxiv.org/abs/2310.04064
Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, 30 Jul 2024, Palu: Compressing KV-Cache with Low-Rank Projection, https://arxiv.org/abs/2407.21118 https://github.com/shadowpa0327/Palu
Sneha Mehta, Huzefa Rangwala, Naren Ramakrishnan, 10 Aug 2020 (v2), Low Rank Factorization for Compact Multi-Head Self-Attention, https://arxiv.org/abs/1912.00835
Ignacio Hounie, Charilaos Kanatsoulis, Arnuv Tandon, Alejandro Ribeiro, 5 Oct 2024, LoRTA: Low Rank Tensor Adaptation of Large Language Models, https://arxiv.org/abs/2410.04060
Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
Zebin Yang, Renze Chen, Taiqiang Wu, Ngai Wong, Yun Liang, Runsheng Wang, Ru Huang, Meng Li, 23 Oct 2024, MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers https://arxiv.org/abs/2410.17957
Elias Jääsaari, Ville Hyvönen, Teemu Roos, 24 Oct 2024, LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search, https://arxiv.org/abs/2410.18926
Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu, 31 Oct 2024, BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments, https://arxiv.org/abs/2410.23918 https://github.com/xinghaow99/BitStack
Liang Mi, Weijun Wang, Wenming Tu, Qingfeng He, Rui Kong, Xinyu Fang, Yazhu Dong, Yikang Zhang, Yunchun Li, Meng Li, Haipeng Dai, Guihai Chen, Yunxin Liu, 1 Nov 2024, V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM, https://arxiv.org/abs/2411.00915
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
Meyer Scetbon, James Hensman, 10 Dec 2024, Low-Rank Correction for Quantized LLMs, https://arxiv.org/abs/2412.07902
Kwangryeol Park, Seulki Lee, 12 Dec 2024, SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization, https://arxiv.org/abs/2412.08894 (Gradient optimizer Adam optimized using low-rank matrix factorization.)
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Jingcheng Hu, Houyi Li, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum, 26 Dec 2024, Multi-matrix Factorization Attention, https://arxiv.org/abs/2412.19255
Menglin Yang, Jialin Chen, Yifei Zhang, Jiahong Liu, Jiasheng Zhang, Qiyao Ma, Harshit Verma, Qianru Zhang, Min Zhou, Irwin King, Rex Ying, 31 Dec 2024, Low-Rank Adaptation for Foundation Models: A Comprehensive Review, https://arxiv.org/abs/2501.00365 (Extensive survey of LoRA.)
Q Wang, S Shen, Jan 2025, Activation-Guided Low-Rank Parameter Adaptation for Efficient Model Fine-Tuning, IEEE Access, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10852296 (Modified LoRA algorithm using activations for weighting.)
Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang, 16 Mar 2025, SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression, https://arxiv.org/abs/2503.12340 https://github.com/AIoT-MLSys-Lab/SVD-LLM
Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey, 17 Mar 2025 (v5), Efficient Learning With Sine-Activated Low-rank Matrices, ICLR 2025, AIML, https://arxiv.org/abs/2403.19243
Ray Zirui Zhang, Christopher E. Miles, Xiaohui Xie, John S. Lowengrub, 22 Jul 2025, BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part II: Efficient Uncertainty Quantification with Low-Rank Adaptation, https://arxiv.org/abs/2507.17019
Yao Wang, Jiannan Li, Yue Kang, Shanxing Gao, Zhenxin Xiao, 23 Jul 2025, Generalized Low-Rank Matrix Contextual Bandits with Graph Information, https://arxiv.org/abs/2507.17528
Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong, 23 Jul 2025, LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning, https://arxiv.org/abs/2506.15606
Etienne Zeudong, Elsa Cardoso-Bihlo and Alex Bihlo, 24 Jul 2025, Low-rank adaptive physics-informed HyperDeepONets for solving differential equations, https://arxiv.org/abs/2507.18346
Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione, 24 Jul 2025, Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning, https://arxiv.org/abs/2505.05086
Constantin Philippenko, Kevin Scaman, Laurent Massouli\'e, 21 Jul 2025, In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting, https://arxiv.org/abs/2409.08771
Jinyuan Feng and Zhiqiang Pu and Tianyi Hu and Dongmin Li and Xiaolin Ai and Huimu Wang, 21 Jul 2025, OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning, https://arxiv.org/abs/2501.10062
Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan, 20 Jul 2025, Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees, https://arxiv.org/abs/2507.08784
Sachin Garg, Micha{\l} Derezi\'nski, 19 Jul 2025, Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nystr\"om Method, https://arxiv.org/abs/2506.17556
Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Ziqiang Cui, Dugang Liu, Yuhua Li, Xiuqiang He, Ruixuan Li, 9 Aug 2025, BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity, https://arxiv.org/abs/2508.06953
Nairouz Mrabah, Nicolas Richet, Ismail Ben Ayed and \'Eric Granger, 11 Aug 2025, Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation, https://arxiv.org/abs/2504.12436
Shishir Muralidhara, Didier Stricker, Ren\'e Schuster, 26 Jul 2025, CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation, https://arxiv.org/abs/2507.19887
Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu, 28 Jul 2025, Regularizing Subspace Redundancy of Low-Rank Adaptation, https://arxiv.org/abs/2507.20745
Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei, 27 Jul 2025, Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation, https://arxiv.org/abs/2506.05713
Gin\'es Carreto Pic\'on, Illia Oleksiienko, Lukas Hedegaard, Arian Bakhtiarnia, Alexandros Iosifidis, 28 Jul 2025, Continual Low-Rank Scaled Dot-product Attention, https://arxiv.org/abs/2412.03214
Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji, 31 Jul 2025, Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models, https://arxiv.org/abs/2501.08727
Zishan Shao, Yixiao Wang, Qinsi Wang, Ting Jiang, Zhixu Du, Hancheng Ye, Danyang Zhuo, Yiran Chen, and Hai Li, 2 Aug 2025, FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models, https://arxiv.org/abs/2508.01506
Jiaxi Li, Lu Yin, Li Shen, Jinjin Xu, Liwu Xu, Tianjin Huang, Wenwu Wang, Shiwei Liu, Xilu Wang, 4 Aug 2025, LOST: Low-rank and Sparse Pre-training for Large Language Models, https://arxiv.org/abs/2508.02668
Peijia Qin, Ruiyi Zhang, Pengtao Xie, 3 Aug 2025, BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation, https://arxiv.org/abs/2410.09758
Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Aastha Verma, Natraj Raman, Sriram Gopalakrishnan, Niladri Chatterjee, Tanmoy Chakraborty, 3 Aug 2025, Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation, https://arxiv.org/abs/2411.04358
Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein, 2 Aug 2025, LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation, https://arxiv.org/abs/2504.07448
Wenwu Gong and Lili Yang, 4 Aug 2025, LRTuckerRep: Low-rank Tucker Representation Model for Multi-dimensional Data Completion, https://arxiv.org/abs/2508.03755
Igor Sokolov, Abdurakhmon Sadiev, Yury Demidovich, Fawaz S Al-Qahtani, and Peter Richt\'arik, 5 Aug 2025, Bernoulli-LoRA: A Theoretical Framework for Randomized Low-Rank Adaptation, https://arxiv.org/abs/2508.03820
Yang Li, Daniel Agyei Asante, Changsheng Zhao, Ernie Chang, Yangyang Shi, Vikas Chandra, 6 Aug 2025, Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications, https://arxiv.org/abs/2405.15877
Sajjad Ghiasvand and Haniyeh Ehsani Oskouie and Mahnoosh Alizadeh and Ramtin Pedarsani, 12 Aug 2025, Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models, https://arxiv.org/abs/2505.15130
Jialin Zhao, Yingtao Zhang, Carlo Vittorio Cannistraci, 13 Aug 2025, Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models, https://arxiv.org/abs/2501.19090
Mohammad Mozaffari, Amir Yazdanbakhsh, Maryam Mehri Dehnavi, 14 Aug 2025, SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression, https://arxiv.org/abs/2410.09615
Vedant Puri, Aditya Joglekar, Kevin Ferguson, Yu-hsuan Chen, Yongjie Jessica Zhang, Levent Burak Kara, 18 Aug 2025, FLARE: Fast Low-rank Attention Routing Engine, https://arxiv.org/abs/2508.12594
Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.23176
Liyi Zhang, Jake Snell, Thomas L. Griffiths, 19 Aug 2025, Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2508.14285
Klaudia Ba{\l}azy, Mohammadreza Banaei, Karl Aberer, Jacek Tabor, 19 Aug 2025, LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters, https://arxiv.org/abs/2405.17604
Ilja Kuzborskij, Yasin Abbasi Yadkori, 20 Aug 2025, Low-rank bias, weight decay, and model merging in neural networks, https://arxiv.org/abs/2502.17340
Yajie Zhou and Xiaoyi Pang and Zhibo Wang, 20 Aug 2025, AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption, https://arxiv.org/abs/2505.24773
Jacob Aguirre, Diego Cifuentes, Vincent Guigues, Renato D.C. Monteiro, Victor Hugo Nascimento, Arnesh Sujanani, 21 Aug 2025, A User Manual for cuHALLaR: A GPU Accelerated Low-Rank Semidefinite Programming Solver, https://arxiv.org/abs/2508.15951
Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani, 21 Aug 2025, Decentralized Low-Rank Fine-Tuning of Large Language Models, https://arxiv.org/abs/2501.15361
Muchammad Daniyal Kautsar, Afra Majida Hariono, Widyawan, Syukron Abu Ishaq Alfarozi and Kuntpong Wararatpanya, 21 Aug 2025, CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression, https://arxiv.org/abs/2508.16680
Haojie Zhang, 24 Aug 2025, DropLoRA: Sparse Low-Rank Adaptation for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2508.17337
Emanuele Zangrando, Piero Deidda, Simone Brugiapaglia, Nicola Guglielmi, Francesco Tudisco, 23 Aug 2025, Provable Emergence of Deep Neural Collapse and Low-Rank Bias in $L^2$-Regularized Nonlinear Networks, https://arxiv.org/abs/2402.03991
Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci, 23 Aug 2025, LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation, https://arxiv.org/abs/2502.20583
Bastien Dubail, Stefan Stojanovic and Alexandre Prouti\`ere, 5 Sep 2025, Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning, https://arxiv.org/abs/2509.05193
Ricardo Borsoi, Konstantin Usevich, Marianne Clausel, 25 Aug 2025, Low-Rank Tensor Decompositions for the Theory of Neural Networks, https://arxiv.org/abs/2508.18408
Tatyana Matveeva, Aleksandr Katrutsa, Evgeny Frolov, 28 Aug 2025, Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models, https://arxiv.org/abs/2508.21106
Jessica Liang, Anirudh Bharadwaj, 29 Aug 2025, QR-LoRA: QR-Based Low-Rank Adaptation for Efficient Fine-Tuning of Large Language Models, https://arxiv.org/abs/2508.21810
Liangjing Shao, Benshuang Chen, Chenkang Du, Xueli Liu, Xinrong Chen, 1 Sep 2025, Generalizable Self-supervised Monocular Depth Estimation with Mixture of Low-Rank Experts for Diverse Endoscopic Scenes, https://arxiv.org/abs/2509.01206
Tarhib Al Azad and Shahana Ibrahim, 8 Sep 2025, Tackling the Noisy Elephant in the Room: Label Noise-robust Out-of-Distribution Detection via Loss Correction and Low-rank Decomposition, https://arxiv.org/abs/2509.06918
Himanshu Thakur, Eshani Agrawal, Smruthi Mukund, 18 Aug 2025, Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors, https://arxiv.org/abs/2509.09689
Chen Li, Elena Ferro, Corey Lammie, Manuel Le Gallo, Irem Boybat, Bipin Rajendran, 11 Sep 2025, Efficient transformer adaptation for analog in-memory computing via low-rank adapters, https://arxiv.org/abs/2411.17367
Bhoomit Vasani, Jack FitzGerald, Anjie Fang, Sushmit Vaish, 13 Sep 2025, PHLoRA: data-free Post-hoc Low-Rank Adapter extraction from full-rank checkpoint, https://arxiv.org/abs/2509.10971
Chuan He, Zhanwang Deng, Zhaosong Lu, 15 Sep 2025, Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training, https://arxiv.org/abs/2509.11983
Shengping Xie, Chuyan Chen, Kun Yuan, 14 Sep 2025, From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees, https://arxiv.org/abs/2509.11254
Cooper Doyle, 15 Sep 2025, Low-rank variational dropout: Uncertainty and rank selection in adapters, https://arxiv.org/abs/2506.22809
Yang Xu, Junpeng Li, Changchun Hua, and Yana Yang, 18 Sep 2025, Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition, https://arxiv.org/abs/2509.14577
Zhiyuan Xue, Ben Yang, Xuetao Zhang, Fei Wang and Zhiping Lin, 18 Sep 2025, One-step Multi-view Clustering With Adaptive Low-rank Anchor-graph Learning, https://arxiv.org/abs/2509.14724
Andrei Chertkov, Artem Basharin, Mikhail Saygin, Evgeny Frolov, Stanislav Straupe, Ivan Oseledets, 18 Sep 2025, Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers, https://arxiv.org/abs/2509.15113
Xin Liao and Bing Yang and Cai Yu, 10 Sep 2025, A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data, https://arxiv.org/abs/2506.23629
Janne Laakkonen, Ivan Kukanov and Ville Hautam\"aki, 17 Sep 2025, Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection, https://arxiv.org/abs/2509.13878