Aussie AI
QLoRA Models
-
Last Updated 15 August, 2025
-
by David Spuler, Ph.D.
Research on QLoRA Models
Research papers include:
- Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts, 26 Mar 2024, The Unreasonable Ineffectiveness of the Deeper Layers, https://arxiv.org/abs/2403.17887 (Static layer pruning with some PEFT re-training after removing layers, with quantization and QLoRA.)
- Intel, 2024, https://github.com/intel/intel-extension-for-transformers
- Apple, June 2024, Introducing Apple’s On-Device and Server Foundation Models, https://machinelearning.apple.com/research/introducing-apple-foundation-models (Apple's on-device models feature optimizations including small models, grouped query attention, 2-bit/4-bit quantization including activation quantization, shared embedding/unembedding tensors, small-ish vocabulary size of 49k, an undisclosed efficient KV cache optimization for neural engines, and layer-specific 16-bit LoRA/QLoRA adapters of size "10s of megabytes" for fine-tuned specialized model versions, also sometimes in 2-bit/4-bit, claiming speed rates of 0.6ms/token in prefill, and 30 tokens per second in decoding.)
- Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 23 May 2023, QLoRA: Efficient Finetuning of Quantized LLMs, https://arxiv.org/abs/2305.14314 Code: https://github.com/artidoro/qlora Code: https://github.com/TimDettmers/bitsandbytes (The original QLoRA paper.)
- Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang, 13 Apr 2024 (v4), EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models, https://arxiv.org/abs/2310.03270 Code: https://github.com/ThisisBillhe/EfficientDM
- Pranav Patel, 2024, In-depth guide to fine-tuning LLMs with LoRA and QLoRA, https://www.mercity.ai/blog-post/guide-to-fine-tuning-llms-with-lora-and-qlora
- Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
- Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li, 24 Jul 2024, Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance, https://arxiv.org/abs/2407.17029 Code: https://github.com/xiaocaigou/qbaraqahira (Combining quantization and LoRA.)
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
- Shikhar Bajpai, Sep 27, 2024, Shrinking Elephants: A Funny Guide to 4-bit and 8-bit Quantization for LLMs with LoRA, https://medium.com/@shikharstruck/shrinking-elephants-a-funny-guide-to-4-bit-and-8-bit-quantization-for-llms-with-lora-ddf9f1a62070
- Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Haotong Qin, Jinyang Guo, Michele Magno, Xianglong Liu, 25 Sep 2024, A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms, https://arxiv.org/abs/2409.16694
- Neal Lawton, Aishwarya Padmakumar, Judith Gaspers, Jack FitzGerald, Anoop Kumar, Greg Ver Steeg, Aram Galstyan, 9 Oct 2024, QuAILoRA: Quantization-Aware Initialization for LoRA, https://arxiv.org/abs/2410.14713
- Meta, October 24, 2024, Introducing quantized Llama models with increased speed and a reduced memory footprint, https://ai.meta.com/blog/meta-llama-quantized-lightweight-models/
- Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
- Towards AI, December 24, 2024, Llm Fine Tuning Guide: Do You Need It and How to Do It https://towardsai.net/p/artificial-intelligence/llm-fine-tuning-guide-do-you-need-it-and-how-to-do-it-4
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home