Aussie AI

Low-Rank Matrices

  • Last Updated 27 August, 2025
  • by David Spuler, Ph.D.

Low-rank matrices are matrices with smaller dimensions (i.e. fewer rows or columns). One form of model compression is to use matrix techniques to replace the large weight matrices with smaller "low-rank" matrices. This makes the model faster, but sometimes trades off decreased accuracy.

There are various approaches to find smaller matrices to replace a full-sized matrix. One approach is simply to look for matrices that are similar to the large model, but smaller. Another approach is to use "sparsification" to add a lot of zeros to the matrices, such that a smaller model can more easily replace it. Yet another approach is to use matrix algebra to "factorize" (also called "decompose") the large matrix into one or more smaller matrices (see also AI matrix algebra).

One common low-rank matrix technique has become popular, possibly because it's been given a friendly name: LoRA is "Low-Rank Adaptation" of matrices. If the model has been quantized first, then it is QLoRA, for "Quantized LoRA".

Singular Value Decomposition (SVD)

SVD is one of the methods of factorizing matrices into smaller sub-matrices. Research on SVD includes:

Research on Low-Rank Matrices

  • Li, Y.; Yu, Y.; Zhang, Q.; Liang, C.; He, P.; Chen, W.; and Zhao, T. 2023. LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation. In Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; and Scarlett, J., eds., Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, 20336–20350. PMLR. https://arxiv.org/abs/2306.11222
  • Ma, X.; Fang, G.; and Wang, X. 2023. LLM-Pruner: On the Structural Pruning of Large Language Models. arXiv:2305.11627. https://arxiv.org/abs/2305.11627 Code: https://github.com/horseee/LLM-Pruner (Pruning during training and LoRA.)
  • M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. BMVC, 2014, https://arxiv.org/abs/1405.3866, PDF: https://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14b/jaderberg14b.pdf
  • Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang and D. Shin, "Compression of deep convolutional neural networks for fast and low power mobile applications", arXiv:1511.06530, 2015. https://arxiv.org/abs/1511.06530 (Low-rank via Bayesian matrix factorization and Tucker decomposition.)
  • V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets and V. Lempitsky, "Speeding-up convolutional neural networks using fine-tuned CP-decomposition", arXiv:1412.6553, 2014. https://arxiv.org/abs/1412.6553
  • Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh, Dec 2022, KronA: Parameter Efficient Tuning with Kronecker Adapter, arXiv preprint arXiv:2212.10650, https://arxiv.org/abs/2212.10650 (Kronecker product for matrix decomposition.)
  • Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. 2022. DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation, arXiv preprint arXiv:2210.07558. https://arxiv.org/abs/2210.07558
  • Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035. https://arxiv.org/abs/2106.04647
  • Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia, Sep 2023, LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models, https://arxiv.org/abs/2309.12307 (Low-rank matrix attention allows up to 100k context windows.)
  • R Saha, V Srivastava, M Pilanci, 2023, Matrix Compression via Randomized Low Rank and Low Precision Factorization, 37th Conference on Neural Information Processing Systems (NeurIPS 2023), https://web.stanford.edu/~pilanci/papers/lplr.pdf
  • F Babiloni, T Tanay, J Deng, M Maggioni, S Zafeiriou, 2023, Factorized Dynamic Fully-Connected Layers for Neural Networks, ICCV workshop, https://openaccess.thecvf.com/content/ICCV2023W/RCV/papers/Babiloni_Factorized_Dynamic_Fully-Connected_Layers_for_Neural_Networks_ICCVW_2023_paper.pdf (Tensor decomposition into low-rank factors.)
  • Samuel Carreira, Tomás Marques, José Ribeiro, Carlos Grilo, Sep 2023, Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile, arXiv preprint arXiv:2310.01434, https://browse.arxiv.org/abs/2310.01434 (LoRA on a mobile platform.)
  • Tamara G Kolda and Brett W Bader, 2009, Tensor Decompositions and Applications, SIAM Rev. 51, 3 (2009), 455–500, https://epubs.siam.org/doi/abs/10.1137/07070111X (Analysis of various algorithms for tensor decomposition.)
  • Stephan Rabanser, Oleksandr Shchur, Stephan Günnemann, Nov 2017, Introduction to Tensor Decompositions and their Applications in Machine Learning, https://browse.arxiv.org/pdf/1711.10781.pdf
  • Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2016. Compression of deep convolutional neural networks for fast and low power mobile applications. In Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1511.06530 (Uses Tucker decomposition and Bayesian matrix factorization algorithms.)
  • Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao, Oct 2023, LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models, https://arxiv.org/abs/2310.08659 (QLoRA for LLMs.)
  • Chakshu Moar, Michael Pellauer, Hyoukjun Kwon, 10 May 2024, Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models, https://arxiv.org/abs/2405.06626
  • You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
  • Davis, Andrew and Arel, Itamar. 2013. Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv preprint arXiv:1312.4461, https://arxiv.org/abs/1312.4461
  • Y Hu, J Zhang, C Zhao, C Li, H Chen, 2023, Transformer Compression via Subspace Projection, arXiv preprint arXiv:2308.16475, https://arxiv.org/abs/2308.16475
  • Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A survey of model compression and acceleration for deep neural networks. CoRR, abs/1710.09282, 2017. https://arxiv.org/abs/1710.09282
  • Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson, 10 Jun 2024, Compute Better Spent: Replacing Dense Layers with Structured Matrices, https://arxiv.org/abs/2406.06248
  • Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 15 Mar 2024 (v5), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer (A large survey of a variety of LLM optimizations.)
  • Arnav Chavan, Nahush Lele, Deepak Gupta, Dec 2023, Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models https://arxiv.org/abs/2312.07046 Code: https://github.com/transmuteAI/trailmet/tree/main/trailmet/algorithms/llm-rom
  • S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-attention with linear complexity,” CoRR, vol. abs/2006.04768, 2020. https://arxiv.org/abs/2006.04768 (Low-rank approximation of attention.)
  • Idelbayev, Y. and Carreira-Perpinan, M. A. (2020). Low-rank compression of neural nets: Learning the rank of each layer. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8046–8056. URL: https://openaccess.thecvf.com/content_CVPR_2020/html/Idelbayev_Low_Rank_Compression_of_Neural_Nets_Learning_the_Rank_of_Each_CVPR_2020_ paper.html
  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861. URL: http://arxiv.org/abs/1704.04861.
  • Zhang, J., Lei, Q., and Dhillon, I. (2018). Stabilizing gradients for deep neural networks via efficient SVD parameterization. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5806–5814. PMLR. URL: http://proceedings.mlr.press/v80/zhang18g.html
  • K Nan, S Liu, J Du, H Liu, 2019, Deep model compression for mobile platforms: A survey, Tsinghua Science and Technology (Volume 24, Issue 6, December 2019), https://ieeexplore.ieee.org/abstract/document/8727762 PDF: https://ieeexplore.ieee.org/iel7/5971803/8727756/08727762.pdf
  • Zheng Qu, Liu Liu, Fengbin Tu, Zhaodong Chen, Yufei Ding, Yuan Xie, 2022, DOTA: Detect and Omit Weak Attentions for Scalable Transformer Acceleration, ASPLOS ’22, February 28 ś March 4, 2022, Lausanne, Switzerland, PDF: https://dl.acm.org/doi/pdf/10.1145/3503222.3507738
  • Ivan Markovsky, Aug 3, 2018, Low-Rank Approximation: Algorithms, Implementation, Applications (Communications and Control Engineering) Part of: Communications and Control Engineering (62 books), https://www.amazon.com/Low-Rank-Approximation-Implementation-Applications-Communications/dp/3319896199/
  • Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman, 9 Feb 2024 (v2), SliceGPT: Compress Large Language Models by Deleting Rows and Columns, Microsoft Research, https://arxiv.org/abs/2401.15024 Code: https://github.com/microsoft/TransformerCompression (Pruning of matrices effectively prunes along the width dimension and the "fourth" internal dimension of embeddings using techniques such as low-rank matrix factorization.)
  • Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He, 15 Feb 2024, Model Compression and Efficient Inference for Large Language Models: A Survey, https://arxiv.org/abs/2402.09748
  • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos, MicroNet: Improving Image Recognition with Extremely Low FLOPs, 2021, https://ieeexplore.ieee.org/abstract/document/9857393 PDF: https://openaccess.thecvf.com/content/ICCV2021/papers/Li_MicroNet_Improving_Image_Recognition_With_Extremely_Low_FLOPs_ICCV_2021_paper.pdf
  • Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin, 2024, MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition, 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), Year: 2024, Pages: 1032-1047, DOI Bookmark: 10.1109/ISCA59077.2024.00079, https://www.computer.org/csdl/proceedings-article/isca/2024/265800b032/1Z3pCEBnapO
  • Jiuxiang Gu, Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Junze Yin, 8 May 2024, Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers, https://arxiv.org/abs/2405.05219 (Attention optimization using multiple low-rank matrices.)
  • Canwen Xu, Julian McAuley, Nov 2022, A Survey on Model Compression and Acceleration for Pretrained Language Models, https://arxiv.org/abs/2202.07105
  • Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre, 24 Jun 2024, Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers, https://arxiv.org/abs/2406.16450 Code: https://github.com/CLAIRE-Labo/StructuredFFN/tree/main
  • Utkarsh Saxena, Gobinda Saha, Sakshi Choudhary, Kaushik Roy, 10 Aug 2024, Eigen Attention: Attention in Low-Rank Space for KV Cache Compression, https://arxiv.org/abs/2408.05646
  • Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou, 23 Aug 2024, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 (Training using low-rank matrices to approximate attention.)
  • Josh Alman, Zhao Song, 9 May 2023 (v2), Fast Attention Requires Bounded Entries, https://arxiv.org/abs/2302.13214 (Low-rank matrices in attention for fast inference.)
  • Josh Alman, Zhao Song, 6 Oct 2023, How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation, https://arxiv.org/abs/2310.04064
  • Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, 30 Jul 2024, Palu: Compressing KV-Cache with Low-Rank Projection, https://arxiv.org/abs/2407.21118 https://github.com/shadowpa0327/Palu
  • Sneha Mehta, Huzefa Rangwala, Naren Ramakrishnan, 10 Aug 2020 (v2), Low Rank Factorization for Compact Multi-Head Self-Attention, https://arxiv.org/abs/1912.00835
  • Ignacio Hounie, Charilaos Kanatsoulis, Arnuv Tandon, Alejandro Ribeiro, 5 Oct 2024, LoRTA: Low Rank Tensor Adaptation of Large Language Models, https://arxiv.org/abs/2410.04060
  • Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
  • Zebin Yang, Renze Chen, Taiqiang Wu, Ngai Wong, Yun Liang, Runsheng Wang, Ru Huang, Meng Li, 23 Oct 2024, MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers https://arxiv.org/abs/2410.17957
  • Elias Jääsaari, Ville Hyvönen, Teemu Roos, 24 Oct 2024, LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search, https://arxiv.org/abs/2410.18926
  • Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu, 31 Oct 2024, BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments, https://arxiv.org/abs/2410.23918 https://github.com/xinghaow99/BitStack
  • Liang Mi, Weijun Wang, Wenming Tu, Qingfeng He, Rui Kong, Xinyu Fang, Yazhu Dong, Yikang Zhang, Yunchun Li, Meng Li, Haipeng Dai, Guihai Chen, Yunxin Liu, 1 Nov 2024, V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM, https://arxiv.org/abs/2411.00915
  • Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
  • M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
  • Meyer Scetbon, James Hensman, 10 Dec 2024, Low-Rank Correction for Quantized LLMs, https://arxiv.org/abs/2412.07902
  • Kwangryeol Park, Seulki Lee, 12 Dec 2024, SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization, https://arxiv.org/abs/2412.08894 (Gradient optimizer Adam optimized using low-rank matrix factorization.)
  • Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
  • Jingcheng Hu, Houyi Li, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum, 26 Dec 2024, Multi-matrix Factorization Attention, https://arxiv.org/abs/2412.19255
  • Menglin Yang, Jialin Chen, Yifei Zhang, Jiahong Liu, Jiasheng Zhang, Qiyao Ma, Harshit Verma, Qianru Zhang, Min Zhou, Irwin King, Rex Ying, 31 Dec 2024, Low-Rank Adaptation for Foundation Models: A Comprehensive Review, https://arxiv.org/abs/2501.00365 (Extensive survey of LoRA.)
  • Q Wang, S Shen, Jan 2025, Activation-Guided Low-Rank Parameter Adaptation for Efficient Model Fine-Tuning, IEEE Access, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10852296 (Modified LoRA algorithm using activations for weighting.)
  • Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang, 16 Mar 2025, SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression, https://arxiv.org/abs/2503.12340 https://github.com/AIoT-MLSys-Lab/SVD-LLM
  • Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey, 17 Mar 2025 (v5), Efficient Learning With Sine-Activated Low-rank Matrices, ICLR 2025, AIML, https://arxiv.org/abs/2403.19243
  • Ray Zirui Zhang, Christopher E. Miles, Xiaohui Xie, John S. Lowengrub, 22 Jul 2025, BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part II: Efficient Uncertainty Quantification with Low-Rank Adaptation, https://arxiv.org/abs/2507.17019
  • Yao Wang, Jiannan Li, Yue Kang, Shanxing Gao, Zhenxin Xiao, 23 Jul 2025, Generalized Low-Rank Matrix Contextual Bandits with Graph Information, https://arxiv.org/abs/2507.17528
  • Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong, 23 Jul 2025, LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning, https://arxiv.org/abs/2506.15606
  • Etienne Zeudong, Elsa Cardoso-Bihlo and Alex Bihlo, 24 Jul 2025, Low-rank adaptive physics-informed HyperDeepONets for solving differential equations, https://arxiv.org/abs/2507.18346
  • Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione, 24 Jul 2025, Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning, https://arxiv.org/abs/2505.05086
  • Constantin Philippenko, Kevin Scaman, Laurent Massouli\'e, 21 Jul 2025, In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting, https://arxiv.org/abs/2409.08771
  • Jinyuan Feng and Zhiqiang Pu and Tianyi Hu and Dongmin Li and Xiaolin Ai and Huimu Wang, 21 Jul 2025, OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning, https://arxiv.org/abs/2501.10062
  • Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan, 20 Jul 2025, Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees, https://arxiv.org/abs/2507.08784
  • Sachin Garg, Micha{\l} Derezi\'nski, 19 Jul 2025, Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nystr\"om Method, https://arxiv.org/abs/2506.17556
  • Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Ziqiang Cui, Dugang Liu, Yuhua Li, Xiuqiang He, Ruixuan Li, 9 Aug 2025, BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity, https://arxiv.org/abs/2508.06953
  • Nairouz Mrabah, Nicolas Richet, Ismail Ben Ayed and \'Eric Granger, 11 Aug 2025, Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation, https://arxiv.org/abs/2504.12436
  • Shishir Muralidhara, Didier Stricker, Ren\'e Schuster, 26 Jul 2025, CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation, https://arxiv.org/abs/2507.19887
  • Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu, 28 Jul 2025, Regularizing Subspace Redundancy of Low-Rank Adaptation, https://arxiv.org/abs/2507.20745
  • Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei, 27 Jul 2025, Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation, https://arxiv.org/abs/2506.05713
  • Gin\'es Carreto Pic\'on, Illia Oleksiienko, Lukas Hedegaard, Arian Bakhtiarnia, Alexandros Iosifidis, 28 Jul 2025, Continual Low-Rank Scaled Dot-product Attention, https://arxiv.org/abs/2412.03214
  • Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji, 31 Jul 2025, Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models, https://arxiv.org/abs/2501.08727
  • Zishan Shao, Yixiao Wang, Qinsi Wang, Ting Jiang, Zhixu Du, Hancheng Ye, Danyang Zhuo, Yiran Chen, and Hai Li, 2 Aug 2025, FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models, https://arxiv.org/abs/2508.01506
  • Jiaxi Li, Lu Yin, Li Shen, Jinjin Xu, Liwu Xu, Tianjin Huang, Wenwu Wang, Shiwei Liu, Xilu Wang, 4 Aug 2025, LOST: Low-rank and Sparse Pre-training for Large Language Models, https://arxiv.org/abs/2508.02668
  • Peijia Qin, Ruiyi Zhang, Pengtao Xie, 3 Aug 2025, BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation, https://arxiv.org/abs/2410.09758
  • Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Aastha Verma, Natraj Raman, Sriram Gopalakrishnan, Niladri Chatterjee, Tanmoy Chakraborty, 3 Aug 2025, Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation, https://arxiv.org/abs/2411.04358
  • Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein, 2 Aug 2025, LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation, https://arxiv.org/abs/2504.07448
  • Wenwu Gong and Lili Yang, 4 Aug 2025, LRTuckerRep: Low-rank Tucker Representation Model for Multi-dimensional Data Completion, https://arxiv.org/abs/2508.03755
  • Igor Sokolov, Abdurakhmon Sadiev, Yury Demidovich, Fawaz S Al-Qahtani, and Peter Richt\'arik, 5 Aug 2025, Bernoulli-LoRA: A Theoretical Framework for Randomized Low-Rank Adaptation, https://arxiv.org/abs/2508.03820
  • Yang Li, Daniel Agyei Asante, Changsheng Zhao, Ernie Chang, Yangyang Shi, Vikas Chandra, 6 Aug 2025, Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications, https://arxiv.org/abs/2405.15877
  • Sajjad Ghiasvand and Haniyeh Ehsani Oskouie and Mahnoosh Alizadeh and Ramtin Pedarsani, 12 Aug 2025, Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models, https://arxiv.org/abs/2505.15130
  • Jialin Zhao, Yingtao Zhang, Carlo Vittorio Cannistraci, 13 Aug 2025, Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models, https://arxiv.org/abs/2501.19090
  • Mohammad Mozaffari, Amir Yazdanbakhsh, Maryam Mehri Dehnavi, 14 Aug 2025, SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression, https://arxiv.org/abs/2410.09615
  • Vedant Puri, Aditya Joglekar, Kevin Ferguson, Yu-hsuan Chen, Yongjie Jessica Zhang, Levent Burak Kara, 18 Aug 2025, FLARE: Fast Low-rank Attention Routing Engine, https://arxiv.org/abs/2508.12594
  • Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.23176
  • Liyi Zhang, Jake Snell, Thomas L. Griffiths, 19 Aug 2025, Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2508.14285
  • Klaudia Ba{\l}azy, Mohammadreza Banaei, Karl Aberer, Jacek Tabor, 19 Aug 2025, LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters, https://arxiv.org/abs/2405.17604
  • Ilja Kuzborskij, Yasin Abbasi Yadkori, 20 Aug 2025, Low-rank bias, weight decay, and model merging in neural networks, https://arxiv.org/abs/2502.17340
  • Yajie Zhou and Xiaoyi Pang and Zhibo Wang, 20 Aug 2025, AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption, https://arxiv.org/abs/2505.24773
  • Jacob Aguirre, Diego Cifuentes, Vincent Guigues, Renato D.C. Monteiro, Victor Hugo Nascimento, Arnesh Sujanani, 21 Aug 2025, A User Manual for cuHALLaR: A GPU Accelerated Low-Rank Semidefinite Programming Solver, https://arxiv.org/abs/2508.15951
  • Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani, 21 Aug 2025, Decentralized Low-Rank Fine-Tuning of Large Language Models, https://arxiv.org/abs/2501.15361
  • Muchammad Daniyal Kautsar, Afra Majida Hariono, Widyawan, Syukron Abu Ishaq Alfarozi and Kuntpong Wararatpanya, 21 Aug 2025, CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression, https://arxiv.org/abs/2508.16680
  • Haojie Zhang, 24 Aug 2025, DropLoRA: Sparse Low-Rank Adaptation for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2508.17337
  • Emanuele Zangrando, Piero Deidda, Simone Brugiapaglia, Nicola Guglielmi, Francesco Tudisco, 23 Aug 2025, Provable Emergence of Deep Neural Collapse and Low-Rank Bias in $L^2$-Regularized Nonlinear Networks, https://arxiv.org/abs/2402.03991
  • Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci, 23 Aug 2025, LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation, https://arxiv.org/abs/2502.20583

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: