Aussie AI

Model Merging

  • Last Updated 29 August, 2025
  • by David Spuler, Ph.D.

Research on Model Merging

Research papers include:

  • Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao, 15 Aug 2024 (v2), Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities, https://arxiv.org/abs/2408.07666 Project: https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications (An extensive review of merging two models.)
  • Cameron R. Wolfe, Sep 16, 2024, Model Merging: A Survey: From modern LLM applications to the early days of machine learning research, https://cameronrwolfe.substack.com/p/model-merging
  • Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu, 2 Oct 2024, Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models, https://arxiv.org/abs/2410.01335
  • Yuxuan Zhang, Ruizhe Li, 2 Oct 2024, DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models, https://arxiv.org/abs/2410.01497 https://github.com/MeCuping/DLP-LoRA (Merging multiple LoRA adapters for parallel inference.)
  • Sean Michael Kerner, October 23, 2024, Differentiable Adaptive Merging is accelerating SLMs for enterprises, https://venturebeat.com/ai/differentiable-adaptive-merging-is-accelerating-slms-for-enterprises/
  • Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang, 18 Dec 2024, Channel Merging: Preserving Specialization for Merged Experts, https://arxiv.org/abs/2412.15283
  • Sakana.ai, March 21, 2024, Evolving New Foundation Models: Unleashing the Power of Automating Model Development, https://sakana.ai/evolutionary-model-merge/
  • Sakana.ai, December 03, 2024, Population-based Model Merging via Quality Diversity, https://sakana.ai/cycleqd/
  • Ayoub Ben Chaliah, Hela Dellagi, 31 Dec 2024, Superposition in Transformers: A Novel Way of Building Mixture of Experts, https://arxiv.org/abs/2501.00530 (Effectively model merging to combine a base model and its fine-tuned version, to avoid catastrophic forgetting.)
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
  • Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis, 31 Jan 2025, BTS: Harmonizing Specialized Experts into a Generalist LLM, https://arxiv.org/abs/2502.00075 (Combining multiple fine-tuned expert models via "layer stitching").
  • Yuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu, 4 Feb 2025 (v2), MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs, https://arxiv.org/abs/2502.00997
  • Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu, 11 Feb 2025 (v2), Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing, https://arxiv.org/abs/2502.04411
  • Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
  • Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan, 16 May 2025, RanDeS: Randomized Delta Superposition for Multi-Model Compression, https://arxiv.org/abs/2505.11204
  • Ryota Miyano, Yuki Arase, 30 May 2025, Adaptive LoRA Merge with Parameter Pruning for Low-Resource Generation, https://arxiv.org/abs/2505.24174
  • Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li, Vikram Appia, Emad Barsoum, 22 May 2025, Zebra-Llama: Towards Extremely Efficient Hybrid Models, https://arxiv.org/abs/2505.17272 (Merging SSM and LLM.)
  • Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
  • Jiang, H., Wang, R., Liang, W. et al. Slerp-Opt: merging large language models via adaptive strategies. J Supercomput 81, 1223 (2025). https://doi.org/10.1007/s11227-025-07727-4 https://link.springer.com/article/10.1007/s11227-025-07727-4
  • Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao, 14 Aug 2025, LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint, https://arxiv.org/abs/2502.16770
  • Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
  • Kotaro Yoshida, Yuji Naraki, Takafumi Horie, Ryotaro Shimizu, Hiroki Naganuma, 2 Aug 2025, DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging, https://arxiv.org/abs/2508.01148
  • Xin He, Junxi Shen, Zhenheng Tang, Xiaowen Chu, Bo Li, Ivor W. Tsang, Yew-Soon Ong, 3 Aug 2025, RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging, https://arxiv.org/abs/2508.01784
  • The-Hai Nguyen, Dang Huu-Tien, Takeshi Suzuki, and Le-Minh Nguyen, 5 Aug 2025, RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging, https://arxiv.org/abs/2508.03121
  • Youngeun Kim, Seunghwan Lee, Aecheon Jung, Bogon Ryu, Sungeun Hong, 7 Aug 2025, Task Vector Quantization for Memory-Efficient Model Merging, https://arxiv.org/abs/2503.06921
  • Yingfeng Luo, Dingyang Lin, Junxin Wang, Ziqiang Xu, Kaiyan Chang, Tong Zheng, Bei Li, Anxiang Ma, Tong Xiao, Zhengtao Yu, Jingbo Zhu, 8 Aug 2025, One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging, https://arxiv.org/abs/2508.06163
  • Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodol\`a, 8 Aug 2025, ATM: Improving Model Merging by Alternating Tuning and Merging, https://arxiv.org/abs/2411.03055
  • Ilja Kuzborskij, Yasin Abbasi Yadkori, 20 Aug 2025, Low-rank bias, weight decay, and model merging in neural networks, https://arxiv.org/abs/2502.17340
  • Haris Khan, Shumaila Asif, Sadia Asif, 28 Jul 2025, Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition, https://arxiv.org/abs/2507.20997
  • Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub, 19 Aug 2025, Rethinking Weight-Averaged Model-merging, https://arxiv.org/abs/2411.09263

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research Topics

Read more about: