Aussie AI

Model Merging

  • Last Updated 17 November, 2025
  • by David Spuler, Ph.D.

What is Model Merging?

Model merging is the technique of creating a new LLM by combining two models together. The idea is that the training from each model is now inside a single model, and the new model knows the answers the same as either of the two original models. Surprisingly, two models can often be merged simply by combining the two sets of parameters, via addition or averaging, although there are exceptions and limitations to this. Also, it is tricky to merge models of different sizes in any of the dimensions, and the models need a common tokenizer and vocabulary. Another similar technique is "adapter merging" of two adapters that are not full models, such as with multi-LoRA adapters. See also: model training, adapter merging, tokenizer, vocabulary.

Research on Model Merging

Research papers include:

  • Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao, 15 Aug 2024 (v2), Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities, https://arxiv.org/abs/2408.07666 Project: https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications (An extensive review of merging two models.)
  • Cameron R. Wolfe, Sep 16, 2024, Model Merging: A Survey: From modern LLM applications to the early days of machine learning research, https://cameronrwolfe.substack.com/p/model-merging
  • Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu, 2 Oct 2024, Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models, https://arxiv.org/abs/2410.01335
  • Yuxuan Zhang, Ruizhe Li, 2 Oct 2024, DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models, https://arxiv.org/abs/2410.01497 https://github.com/MeCuping/DLP-LoRA (Merging multiple LoRA adapters for parallel inference.)
  • Sean Michael Kerner, October 23, 2024, Differentiable Adaptive Merging is accelerating SLMs for enterprises, https://venturebeat.com/ai/differentiable-adaptive-merging-is-accelerating-slms-for-enterprises/
  • Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang, 18 Dec 2024, Channel Merging: Preserving Specialization for Merged Experts, https://arxiv.org/abs/2412.15283
  • Sakana.ai, March 21, 2024, Evolving New Foundation Models: Unleashing the Power of Automating Model Development, https://sakana.ai/evolutionary-model-merge/
  • Sakana.ai, December 03, 2024, Population-based Model Merging via Quality Diversity, https://sakana.ai/cycleqd/
  • Ayoub Ben Chaliah, Hela Dellagi, 31 Dec 2024, Superposition in Transformers: A Novel Way of Building Mixture of Experts, https://arxiv.org/abs/2501.00530 (Effectively model merging to combine a base model and its fine-tuned version, to avoid catastrophic forgetting.)
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
  • Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis, 31 Jan 2025, BTS: Harmonizing Specialized Experts into a Generalist LLM, https://arxiv.org/abs/2502.00075 (Combining multiple fine-tuned expert models via "layer stitching").
  • Yuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu, 4 Feb 2025 (v2), MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs, https://arxiv.org/abs/2502.00997
  • Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu, 11 Feb 2025 (v2), Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing, https://arxiv.org/abs/2502.04411
  • Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
  • Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan, 16 May 2025, RanDeS: Randomized Delta Superposition for Multi-Model Compression, https://arxiv.org/abs/2505.11204
  • Ryota Miyano, Yuki Arase, 30 May 2025, Adaptive LoRA Merge with Parameter Pruning for Low-Resource Generation, https://arxiv.org/abs/2505.24174
  • Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li, Vikram Appia, Emad Barsoum, 22 May 2025, Zebra-Llama: Towards Extremely Efficient Hybrid Models, https://arxiv.org/abs/2505.17272 (Merging SSM and LLM.)
  • Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
  • Jiang, H., Wang, R., Liang, W. et al. Slerp-Opt: merging large language models via adaptive strategies. J Supercomput 81, 1223 (2025). https://doi.org/10.1007/s11227-025-07727-4 https://link.springer.com/article/10.1007/s11227-025-07727-4
  • Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao, 14 Aug 2025, LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint, https://arxiv.org/abs/2502.16770
  • Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
  • Kotaro Yoshida, Yuji Naraki, Takafumi Horie, Ryotaro Shimizu, Hiroki Naganuma, 2 Aug 2025, DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging, https://arxiv.org/abs/2508.01148
  • Xin He, Junxi Shen, Zhenheng Tang, Xiaowen Chu, Bo Li, Ivor W. Tsang, Yew-Soon Ong, 3 Aug 2025, RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging, https://arxiv.org/abs/2508.01784
  • The-Hai Nguyen, Dang Huu-Tien, Takeshi Suzuki, and Le-Minh Nguyen, 5 Aug 2025, RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging, https://arxiv.org/abs/2508.03121
  • Youngeun Kim, Seunghwan Lee, Aecheon Jung, Bogon Ryu, Sungeun Hong, 7 Aug 2025, Task Vector Quantization for Memory-Efficient Model Merging, https://arxiv.org/abs/2503.06921
  • Yingfeng Luo, Dingyang Lin, Junxin Wang, Ziqiang Xu, Kaiyan Chang, Tong Zheng, Bei Li, Anxiang Ma, Tong Xiao, Zhengtao Yu, Jingbo Zhu, 8 Aug 2025, One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging, https://arxiv.org/abs/2508.06163
  • Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodol\`a, 8 Aug 2025, ATM: Improving Model Merging by Alternating Tuning and Merging, https://arxiv.org/abs/2411.03055
  • Ilja Kuzborskij, Yasin Abbasi Yadkori, 20 Aug 2025, Low-rank bias, weight decay, and model merging in neural networks, https://arxiv.org/abs/2502.17340
  • Haris Khan, Shumaila Asif, Sadia Asif, 28 Jul 2025, Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition, https://arxiv.org/abs/2507.20997
  • Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub, 19 Aug 2025, Rethinking Weight-Averaged Model-merging, https://arxiv.org/abs/2411.09263
  • Marcin Osial, Bartosz W\'ojcik, Bartosz Zieli\'nski, Sebastian Cygert, 26 Aug 2025, Efficient Multi-Source Knowledge Transfer by Model Merging, https://arxiv.org/abs/2508.19353
  • Pietro Buzzega, Riccardo Salami, Angelo Porrello and Simone Calderara, 29 Aug 2025, Rethinking Layer-wise Model Merging through Chain of Merges, https://arxiv.org/abs/2508.21421
  • Touayouch Brahim and Fosse Lo\"ic and Damnati G\'eraldine and Lecorv\'e Gw\'enol\'e, 2 Sep 2025, DivMerge: A divergence-based model merging method for multi-tasking, https://arxiv.org/abs/2509.02108
  • Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa, 2 Sep 2025, Surrogate Benchmarks for Model Merging Optimization, https://arxiv.org/abs/2509.02555
  • Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh, 31 Aug 2025, To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging, https://arxiv.org/abs/2503.05320
  • Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard, 1 Sep 2025, A Log-Linear Analytics Approach to Cost Model Regularization for Inpatient Stays through Diagnostic Code Merging, https://arxiv.org/abs/2507.03843
  • Shilian Chen, Jie Zhou, Tianyu Huai, Yujiang Lu, Junsong Li, Bihao Zhan, Qianjun Pan, Yutao Yang, Xin Li, Qin Chen, Hang Yan, Liang He, 16 Sep 2025, Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories, https://arxiv.org/abs/2509.12951
  • Xuefeng Liu, Songhao Jiang, Qinan Huang, Tinson Xu, Ian Foster, Mengdi Wang, Hening Lin, Jinbo Xu, Rick Stevens, 14 Sep 2025, FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design, https://arxiv.org/abs/2509.11044
  • Pouria Mahdavinia, Hamed Mahdavi, Niloofar Mireshghallah, and Mehrdad Mahdavi, 14 Sep 2025, Harnessing Optimization Dynamics for Curvature-Informed Model Merging, https://arxiv.org/abs/2509.11167
  • Haiquan Qiu, You Wu, Dong Li, Jianmin Guo, Quanming Yao, 18 Sep 2025, Superpose Task-specific Features for Model Merging, https://arxiv.org/abs/2502.10698
  • Yuanyi Wang, Yanggan Gu, Yiming Zhang, Qi Zhou, Zhaoyi Yan, Congkai Xie, Xinyao Wang, Jianbo Yuan, Hongxia Yang, 1 Oct 2025, Model Merging Scaling Laws in Large Language Models, https://arxiv.org/abs/2509.24244
  • Bowen Wang, Haiyuan Wan, Liwen Shi, Chen Yang, Peng He, Yue Ma, Haochen Han, Wenhao Li, Tiao Tan, Yongjian Li, Fangming Liu, Yifan Gong, Sheng Zhang, 23 Oct 2025, RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging, https://arxiv.org/abs/2510.20479
  • Tiancheng Hu, Benjamin Minixhofer, Nigel Collier, 20 Oct 2025, Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging, https://arxiv.org/abs/2510.17426
  • Aniello Panariello, Daniel Marczak, Simone Magistri, Angelo Porrello, Bart{\l}omiej Twardowski, Andrew D. Bagdanov, Simone Calderara, Joost van de Weijer, 20 Oct 2025, Accurate and Efficient Low-Rank Model Merging in Core Space, https://arxiv.org/abs/2509.17786
  • Yujie Feng, Jian Li, Xiaoyu Dong, Pengfei Xu, Xiaohui Zhou, Yujia Zhang, Zexin LU, Yasha Wang, Alan Zhao, Xu Chu, Xiao-Ming Wu, 22 Sep 2025, AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning, https://arxiv.org/abs/2509.17348
  • Wenju Sun, Qingyong Li, Wen Wang, Yang Liu, Yangli-ao Geng, Boyang Li, 26 Oct 2025, Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration, https://arxiv.org/abs/2505.23859
  • Yunfei Liang, 26 Oct 2025, MIN-Merging: Merge the Important Neurons for Model Merging, https://arxiv.org/abs/2510.17890
  • Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche, 27 Oct 2025, SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging, https://arxiv.org/abs/2503.17239
  • Xiaochong Lan, Yu Zheng, Shiteng Cao, Yong Li, 26 Sep 2025, The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging, https://arxiv.org/abs/2509.22034
  • Zihuan Qiu, Lei Wang, Yang Cao, Runtong Zhang, Bing Su, Yi Xu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li, 25 Sep 2025, Null-Space Filtering for Data-Free Continual Model Merging: Preserving Transparency, Promoting Fidelity, https://arxiv.org/abs/2509.21413
  • Lucas Bandarkar, Nanyun Peng, 7 Oct 2025, The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs, https://arxiv.org/abs/2505.18356
  • Hoang Phan, Sungmin Cha, Tung Lam Tran, Qi Lei, 28 Sep 2025, Toward a Holistic Approach to Continual Model Merging, https://arxiv.org/abs/2509.23592
  • Ankit Gangwal, Aaryan Ajay Sharma, 28 Sep 2025, Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability, https://arxiv.org/abs/2509.23689
  • Chanhyuk Lee, Jiho Choi, Chanryeol Lee, Donggyun Kim, and Seunghoon Hong, 29 Sep 2025, AdaRank: Adaptive Rank Pruning for Enhanced Model Merging, https://arxiv.org/abs/2503.22178
  • Zihuan Qiu, Yi Xu, Chiyuan He, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li, 29 Sep 2025, MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging, https://arxiv.org/abs/2505.11883
  • Guofu Xie, Chen Zhang, Xiao Zhang, Yunsheng Shi, Ting Yao and Jun Xu, 4 Oct 2025, Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation, https://arxiv.org/abs/2510.03782
  • Chenxiang Zhang, Alexander Theus, Damien Teney, Antonio Orvieto, Jun Pang, Sjouke Mauw, 6 Oct 2025, How does the optimizer implicitly bias the model merging loss landscape?, https://arxiv.org/abs/2510.04686
  • Qixiang Yin, Huanjin Yao, Jianghao Chen, Jiaxing Huang, Zhicheng Zhao, Fei Su, 10 Oct 2025, Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging, https://arxiv.org/abs/2510.08987
  • Kexuan Shi and Yandong Wen and Weiyang Liu, 24 Oct 2025, Model Merging with Functional Dual Anchors, https://arxiv.org/abs/2510.21223
  • Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao, 23 Sep 2025, OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging, https://arxiv.org/abs/2505.19892
  • Yedi Hu, Yunzhi Yao, Ningyu Zhang, Huajun Chen, Shumin Deng, 23 Sep 2025, Exploring Model Kinship for Merging Large Language Models, https://arxiv.org/abs/2410.12613
  • Dengming Zhang, Xiaowen Ma, Zhenliang Ni, Zhenkai Wu, Han Shu, Xin Jiang, and Xinghao Chen, 30 Sep 2025, Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking, https://arxiv.org/abs/2509.25712
  • Hao Mark Chen, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan, 29 Sep 2025, FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization, https://arxiv.org/abs/2503.12649
  • Bang An, Yibo Yang, Philip Torr, Bernard Ghanem, 16 Oct 2025, Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging, https://arxiv.org/abs/2510.14697
  • Levy Chaves, Eduardo Valle, Sandra Avila, 15 Oct 2025, Weight Weaving: Parameter Pooling for Data-Free Model Merging, https://arxiv.org/abs/2510.13921
  • Mohammadsajad Alipour, Mohammad Mohammadi Amiri, 15 Oct 2025, Towards Reversible Model Merging For Low-rank Weights, https://arxiv.org/abs/2510.14163
  • Ronald Skorobogat and Karsten Roth and Mariana-Iuliana Georgescu, 16 Oct 2025, Subspace-Boosted Model Merging, https://arxiv.org/abs/2506.16506

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research Topics

Read more about: