Aussie AI

Model Merging

Last Updated 22 October, 2025

by David Spuler, Ph.D.

Research on Model Merging

Research papers include:

Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao, 15 Aug 2024 (v2), Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities, https://arxiv.org/abs/2408.07666 Project: https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications (An extensive review of merging two models.)
Cameron R. Wolfe, Sep 16, 2024, Model Merging: A Survey: From modern LLM applications to the early days of machine learning research, https://cameronrwolfe.substack.com/p/model-merging
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu, 2 Oct 2024, Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models, https://arxiv.org/abs/2410.01335
Yuxuan Zhang, Ruizhe Li, 2 Oct 2024, DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models, https://arxiv.org/abs/2410.01497 https://github.com/MeCuping/DLP-LoRA (Merging multiple LoRA adapters for parallel inference.)
Sean Michael Kerner, October 23, 2024, Differentiable Adaptive Merging is accelerating SLMs for enterprises, https://venturebeat.com/ai/differentiable-adaptive-merging-is-accelerating-slms-for-enterprises/
Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang, 18 Dec 2024, Channel Merging: Preserving Specialization for Merged Experts, https://arxiv.org/abs/2412.15283
Sakana.ai, March 21, 2024, Evolving New Foundation Models: Unleashing the Power of Automating Model Development, https://sakana.ai/evolutionary-model-merge/
Sakana.ai, December 03, 2024, Population-based Model Merging via Quality Diversity, https://sakana.ai/cycleqd/
Ayoub Ben Chaliah, Hela Dellagi, 31 Dec 2024, Superposition in Transformers: A Novel Way of Building Mixture of Experts, https://arxiv.org/abs/2501.00530 (Effectively model merging to combine a base model and its fine-tuned version, to avoid catastrophic forgetting.)
Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis, 31 Jan 2025, BTS: Harmonizing Specialized Experts into a Generalist LLM, https://arxiv.org/abs/2502.00075 (Combining multiple fine-tuned expert models via "layer stitching").
Yuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu, 4 Feb 2025 (v2), MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs, https://arxiv.org/abs/2502.00997
Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu, 11 Feb 2025 (v2), Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing, https://arxiv.org/abs/2502.04411
Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan, 16 May 2025, RanDeS: Randomized Delta Superposition for Multi-Model Compression, https://arxiv.org/abs/2505.11204
Ryota Miyano, Yuki Arase, 30 May 2025, Adaptive LoRA Merge with Parameter Pruning for Low-Resource Generation, https://arxiv.org/abs/2505.24174
Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li, Vikram Appia, Emad Barsoum, 22 May 2025, Zebra-Llama: Towards Extremely Efficient Hybrid Models, https://arxiv.org/abs/2505.17272 (Merging SSM and LLM.)
Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
Jiang, H., Wang, R., Liang, W. et al. Slerp-Opt: merging large language models via adaptive strategies. J Supercomput 81, 1223 (2025). https://doi.org/10.1007/s11227-025-07727-4 https://link.springer.com/article/10.1007/s11227-025-07727-4
Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao, 14 Aug 2025, LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint, https://arxiv.org/abs/2502.16770
Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
Kotaro Yoshida, Yuji Naraki, Takafumi Horie, Ryotaro Shimizu, Hiroki Naganuma, 2 Aug 2025, DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging, https://arxiv.org/abs/2508.01148
Xin He, Junxi Shen, Zhenheng Tang, Xiaowen Chu, Bo Li, Ivor W. Tsang, Yew-Soon Ong, 3 Aug 2025, RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging, https://arxiv.org/abs/2508.01784
The-Hai Nguyen, Dang Huu-Tien, Takeshi Suzuki, and Le-Minh Nguyen, 5 Aug 2025, RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging, https://arxiv.org/abs/2508.03121
Youngeun Kim, Seunghwan Lee, Aecheon Jung, Bogon Ryu, Sungeun Hong, 7 Aug 2025, Task Vector Quantization for Memory-Efficient Model Merging, https://arxiv.org/abs/2503.06921
Yingfeng Luo, Dingyang Lin, Junxin Wang, Ziqiang Xu, Kaiyan Chang, Tong Zheng, Bei Li, Anxiang Ma, Tong Xiao, Zhengtao Yu, Jingbo Zhu, 8 Aug 2025, One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging, https://arxiv.org/abs/2508.06163
Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodol\`a, 8 Aug 2025, ATM: Improving Model Merging by Alternating Tuning and Merging, https://arxiv.org/abs/2411.03055
Ilja Kuzborskij, Yasin Abbasi Yadkori, 20 Aug 2025, Low-rank bias, weight decay, and model merging in neural networks, https://arxiv.org/abs/2502.17340
Haris Khan, Shumaila Asif, Sadia Asif, 28 Jul 2025, Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition, https://arxiv.org/abs/2507.20997
Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub, 19 Aug 2025, Rethinking Weight-Averaged Model-merging, https://arxiv.org/abs/2411.09263
Marcin Osial, Bartosz W\'ojcik, Bartosz Zieli\'nski, Sebastian Cygert, 26 Aug 2025, Efficient Multi-Source Knowledge Transfer by Model Merging, https://arxiv.org/abs/2508.19353
Pietro Buzzega, Riccardo Salami, Angelo Porrello and Simone Calderara, 29 Aug 2025, Rethinking Layer-wise Model Merging through Chain of Merges, https://arxiv.org/abs/2508.21421
Touayouch Brahim and Fosse Lo\"ic and Damnati G\'eraldine and Lecorv\'e Gw\'enol\'e, 2 Sep 2025, DivMerge: A divergence-based model merging method for multi-tasking, https://arxiv.org/abs/2509.02108
Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa, 2 Sep 2025, Surrogate Benchmarks for Model Merging Optimization, https://arxiv.org/abs/2509.02555
Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh, 31 Aug 2025, To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging, https://arxiv.org/abs/2503.05320
Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard, 1 Sep 2025, A Log-Linear Analytics Approach to Cost Model Regularization for Inpatient Stays through Diagnostic Code Merging, https://arxiv.org/abs/2507.03843
Shilian Chen, Jie Zhou, Tianyu Huai, Yujiang Lu, Junsong Li, Bihao Zhan, Qianjun Pan, Yutao Yang, Xin Li, Qin Chen, Hang Yan, Liang He, 16 Sep 2025, Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories, https://arxiv.org/abs/2509.12951
Xuefeng Liu, Songhao Jiang, Qinan Huang, Tinson Xu, Ian Foster, Mengdi Wang, Hening Lin, Jinbo Xu, Rick Stevens, 14 Sep 2025, FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design, https://arxiv.org/abs/2509.11044
Pouria Mahdavinia, Hamed Mahdavi, Niloofar Mireshghallah, and Mehrdad Mahdavi, 14 Sep 2025, Harnessing Optimization Dynamics for Curvature-Informed Model Merging, https://arxiv.org/abs/2509.11167
Haiquan Qiu, You Wu, Dong Li, Jianmin Guo, Quanming Yao, 18 Sep 2025, Superpose Task-specific Features for Model Merging, https://arxiv.org/abs/2502.10698