Aussie AI
Collaborative Inference
-
Last Updated 27 August, 2025
-
by David Spuler, Ph.D.
Collaborative inference is a type of multi-model ensemble AI optimization strategy where two or more engines combine to perform inference calculations. There are two basic architectures:
- Multi-component partial inference
- Multi-component full inference
In multi-component partial inference, multiple sub-components contribute to a single inference computation. For example, parts of the inference computation can be spread out across multiple machines or multiple GPUs, and then combined together to complete the inference result. The output is a single prediction for decoding.
The alternative is multi-component full inference, where multiple components (or entire models) perform a full inference, with results combined at the end. All of the inference computations occur independently. Each model or component generates its own separate prediction of output tokens and their probabilities. Then a decision mechanism analyzes the outputs of each model, and decides on which final token to output.
There are several variations on either of these two approaches. Particular types of collaborative inference include:
- Speculative Decoding
- Consensus-based decoding
- Mutually-guided decoding
- Big-Little Architectures
- Committee-based inference
- Ensemble Decoding
- Swarm inference (swarm decoding)
Research on Collaborative Inference (Generally)
Research papers on collaborative inference include:
- G Xu, Z Hao, Y Luo, H Hu, J An, S Mao, 2023, DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices, arXiv preprint arXiv:2309.05015, https://arxiv.org/abs/2309.05015
- Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Hao Peng, Ximing Lu, Dragomir Radev, Yejin Choi, Noah A. Smith, Oct 2022, Twist Decoding: Diverse Generators Guide Each Other, https://arxiv.org/abs/2205.09273, Code: https://github.com/jungokasai/twist_decoding (Twist decoding is a type of collaborative inference.)
- J Kasai, 2023, Towards Efficient, Customizable, and Communal Natural Language Processing, Ph.D. thesis, Computer Science and Engineering, University of Washington, https://www.proquest.com/openview/604084b574dcd05e41eb6e33682a3537/1 (Impressive thesis includes twist decoding amid other topics.)
- Jinduo Song, Zhicheng Liu, Xiaofei Wang, Chao Qiu, Xu Chen, 2021, "Adaptive and Collaborative Edge Inference in Task Stream with Latency Constraint", ICC 2021, IEEE International Conference on Communications, pp.1-6, https://ieeexplore.ieee.org/document/9500892
- C Luo, J Chen, X Feng, J Zhang, J Li, 2023, Sustainable Collaborative Inference in Intelligent Transportation Systems IEEE Transactions on Intelligent Transportation, https://ieeexplore.ieee.org/document/10239242
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, Lingjia Tang, 2017, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Comput. Archit. News, vol. 52, no. 4, pp. 615–629, https://dl.acm.org/doi/10.1145/3037697.3037698
- Z. Hao, G. Xu, Y. Luo, H. Hu, J. An, and S. Mao, June 2022, “Multi-agent collaborative inference via dnn decoupling: Intermediate feature compression and edge learning,” IEEE Trans. Mob. Comput., 2022, https://arxiv.org/abs/2205.11854
- J. Kim, Y. Park, G. Kim, and S. J. Hwang, “Splitnet: Learning to semantically split deep networks for parameter reduction and model parallelization,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70. PMLR, 2017, pp. 1866–1874. http://proceedings.mlr.press/v70/kim17b/kim17b.pdf
- Y. Kim, J. Kim, D. Chae, D. Kim, and J. Kim, “ µlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization,” in Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, G. Candea, R. van Renesse, and C. Fetzer, Eds. ACM, 2019, pp. 45:1–45:15. https://dl.acm.org/doi/10.1145/3302424.3303950
- T. Mohammed, C. Joe-Wong, R. Babbar, and M. D. Francesco, “Distributed inference acceleration with adaptive DNN partitioning and offloading,” in 39th IEEE Conference on Computer Communications, INFOCOM 2020, Toronto, ON, Canada, July 6-9, 2020. IEEE, 2020, pp. 854–863, https://ieeexplore.ieee.org/document/9155237
- S. Yang, Z. Zhang, C. Zhao, X. Song, S. Guo, and H. Li, “CNNPC: end-edge-cloud collaborative CNN inference with joint model partition and compression,” IEEE Trans. Parallel Distributed Syst., vol. 33, no. 10, pp. 4039–4056, 2022. https://ieeexplore.ieee.org/document/9782528
- X Xu, K Yan, S Han, B Wang, X Tao, P Zhang, 2023, Learning-Based Edge-Device Collaborative DNN Inference in IoVT Networks IEEE Internet of Things Journal, https://ieeexplore.ieee.org/abstract/document/10258387
- Dec 2023, Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation, Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, Ji-Rong Wen, https://arxiv.org/abs/2311.09049 Code: https://github.com/RUCAIBox/LC-Rec/
- Mikolaj Jankowski, Deniz Gunduz, Krystian Mikolajczyk, Nov 2023, Adaptive Early Exiting for Collaborative Inference over Noisy Wireless Channels, https://arxiv.org/abs/2311.18098 (Early exiting combined with collaborative inference.)
- Junho Wohn, February 2024, Optimizing Deep Learning Model Inference using Efficient Model Partitioning on Edge Devices, Thesis for the Master of Science, Graduate School of Hanyang University, https://repository.hanyang.ac.kr/handle/20.500.11754/188388, PDF: https://hanyang.dcollection.net/public_resource/pdf/200000726139_20240331200233.pdf (Compiles models using the TVM deep learning compiler and then partitions them across multiple edge devices for collaborative edge inference.)
- Nir Shlezinger; Erez Farhan; Hai Morgenstern; Yonina C. Eldar, 2021, Collaborative Inference via Ensembles on the Edge, ICASSP 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/9414740
- Nir Shlezinger; Ivan V. Bajić, 2022, Collaborative Inference for AI-Empowered IoT Devices, IEEE Internet of Things Magazine (Volume: 5, Issue: 4, December 2022), https://ieeexplore.ieee.org/abstract/document/10012474
- Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
- Emre Kilcioglu, March 2024, Collaborative On-device CNN Inference: Design and Optimization of Communication and Computation, Ph.D. thesis, Engineering Sciences and Technology, UCLouvain, PDF: https://dial.uclouvain.be/pr/boreal/object/boreal%3A286224/datastream/PDF_01/view
- David Spuler, March 2024, Chapter 54. Ensemble Multi-Model Architectures, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
- Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou, 18 Jun 2024, Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding, https://arxiv.org/abs/2406.12295 Code: https://github.com/TsinghuaC3I/FS-GEN
- Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
- Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
- Mingjin Zhang, 2024, High-performance scheduling of deep learning tasks in collaborative edge computing, Ph.D. Thesis, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, https://theses.lib.polyu.edu.hk/bitstream/200/13080/3/7528.pdf (Scheduling of inference and training tasks on edge devices with techniques such as model splitting/partitioning.)
- Eric Samikwa, 2024, Resource-Aware Distributed Machine Learning for Artificial Intelligence of Things, Ph.D. thesis, Faculty of Science, University of Bern, Switzerland, https://boristheses.unibe.ch/5378/1/24samikwa_e_1_.pdf https://doi.org/10.48549/5378 (Multi-edge device with early exit, "micro-split" scheduling, split/federated learning, and distributed inference.)
- Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
- J. Niu, W. Zhang, C. J. Xue and N. Guan, 2024, "RTiL: Real-Time Inference of Large Language Models on Memory-Constrained GPU Devices," 2024 IEEE 30th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Sokcho, Korea, Republic of, 2024, pp. 21-30, doi: 10.1109/RTCSA62462.2024.00013. https://ieeexplore.ieee.org/abstract/document/10695719
- Akrit Mudvari, Yuang Jiang, Leandros Tassiulas, 16 Oct 2024 (v2), SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization, https://arxiv.org/abs/2410.10759
- Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen, 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models, https://arxiv.org/abs/2411.00492
- Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Wenjun Zhang, Ping Zhang, 11 Nov 2024, WDMoE: Wireless Distributed Mixture of Experts for Large Language Models, https://arxiv.org/abs/2411.06681
- Yingxuan Yang, Qiuying Peng, Jun Wang, Weinan Zhang, 21 Nov 2024, Multi-LLM-Agent Systems: Techniques and Business Perspectives, https://arxiv.org/abs/2411.14033
- Yuntian Chen, Zhanyong Tang, Tianpei Lu, Bingsheng Zhang, Zhiying Shi, Zheng Wang, 21 Dec 2024, Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation. https://arxiv.org/abs/2412.16537
- Sehoon Kim, Oct 2024, Full Stack Approach for Efficient Deep Learning Inference, Doctor of Philosophy, Computer Science, University of California, Berkeley, https://escholarship.org/content/qt4wf834q8/qt4wf834q8.pdf
- X. Zheng, W. Zhang, C. Hu, L. Zhu and C. Zhang, "Cloud-Edge-End Collaborative Inference in Mobile Networks: Challenges and Solutions," in IEEE Network, doi: 10.1109/MNET.2025.3533581. https://ieeexplore.ieee.org/abstract/document/10852347
- Shangbin Feng, Wenxuan Ding, Alisa Liu, Zifeng Wang, Weijia Shi, Yike Wang, Zejiang Shen, Xiaochuang Han, Hunter Lang, Chen-Yu Lee, Tomas Pfister, Yejin Choi, Yulia Tsvetkov, 6 Feb 2025, When One LLM Drools, Multi-LLM Collaboration Rules, https://arxiv.org/abs/2502.04506
- Chan-Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu, 16 May 2025, Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity, https://arxiv.org/abs/2505.11107
- Yang Liu, Bingjie Yan, Tianyuan Zou, Jianqing Zhang, Zixuan Gu, Jianbing Ding, Xidong Wang, Jingyi Li, Xiaozhou Ye, Ye Ouyang, Qiang Yang, Ya-Qin Zhang, 24 Apr 2025, Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks, https://arxiv.org/abs/2504.17421
- J. Pablo Mu\~noz and Jinjie Yuan, 7 Aug 2025, RTTC: Reward-Guided Collaborative Test-Time Compute, https://arxiv.org/abs/2508.10024
- Alex Clinton, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy, 14 Aug 2025, Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution, https://arxiv.org/abs/2407.15881
- Haoran Jiang, Shaohan Shi, Yunjie Yao, Chang Jiang, Quan Li, 23 Jul 2025, HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery, https://arxiv.org/abs/2507.17209
- Arpan Dasgupta, Mizhaan Maniyar, Awadhesh Srivastava, Sanat Kumar, Amrita Mahale, Aparna Hedge, Arun Suggala, Karthikeyan Shanmugam, Aparna Taneja, Milind Tambe, 22 Jul 2025, Learning to Call: A Field Trial of a Collaborative Bandit Algorithm for Improved Message Delivery in Mobile Maternal Health, https://arxiv.org/abs/2507.16356
- Bo Hou and Xin Tan and Kai Zheng and Fang Liu and Yinghao Zhu and Li Zhang, 22 Jul 2025, LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning, https://arxiv.org/abs/2507.16395
- Sabrina Livanec, Laura Londo\~no, Michael Gorki, Adrian R\"ofer, Abhinav Valada, Andrea Kiesel, 22 Jul 2025, Designing for Difference: How Human Characteristics Shape Perceptions of Collaborative Robots, https://arxiv.org/abs/2507.16480
- Hao Tuo, Yan Li, Xuanning Hu, Haishi Zhao, Xueyan Liu, Bo Yang, 22 Jul 2025, A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design, https://arxiv.org/abs/2507.13580
- Liang Zhang, Xiaoming Zhai, Jionghao Lin, Jionghao Lin, Jennifer Kleiman, Diego Zapata-Rivera, Carol Forsyth, Yang Jiang, Xiangen Hu, Arthur C. Graesser, 2 May 2025, Exploring Communication Strategies for Collaborative LLM Agents in Mathematical Problem-Solving, https://arxiv.org/abs/2507.17753
- Zhangqi Liu, 22 Jul 2025, Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems, https://arxiv.org/abs/2507.17774
- Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun, 23 Jul 2025, Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale, https://arxiv.org/abs/2507.17985
- Donghoon Shin, Daniel Lee, Gary Hsieh, Gromit Yeuk-Yin Chan, 24 Jul 2025, PosterMate: Audience-driven Collaborative Persona Agents for Poster Design, https://arxiv.org/abs/2507.18572
- Kester Wong, Sahan Bulathwela and Mutlu Cukurova, 19 Jul 2025, Explainable Collaborative Problem Solving Diagnosis with BERT using SHAP and its Implications for Teacher Adoption, https://arxiv.org/abs/2507.14584
- Xinheng Lyu, Yuci Liang, Wenting Chen, Meidan Ding, Jiaqi Yang, Guolin Huang, Daokun Zhang, Xiangjian He, and Linlin Shen, 19 Jul 2025, WSI-Agents: A Collaborative Multi-Agent System for Multi-Modal Whole Slide Image Analysis, https://arxiv.org/abs/2507.14680
- Tuo Zhang, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, Efficient Edge LLMs Deployment via HessianAware Quantization and CPU GPU Collaborative, https://arxiv.org/abs/2508.07329
- Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni, 9 Aug 2025, A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning, https://arxiv.org/abs/2408.07057
- Simone Bendazzoli, Sanna Persson, Mehdi Astaraki, Sebastian Pettersson, Vitali Grozman, Rodrigo Moreno, 28 May 2025, MAIA: A Collaborative Medical AI Platform for Integrated Healthcare Innovation, https://arxiv.org/abs/2507.19489
- Tolga Dimlioglu, Anna Choromanska, 27 Jul 2025, Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning, https://arxiv.org/abs/2507.20424
- Yizhe Zhang, 28 Jul 2025, Beyond Manual Annotation: A Human-AI Collaborative Framework for Medical Image Segmentation Using Only "Better or Worse" Expert Feedback, https://arxiv.org/abs/2507.05815
- Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He, 29 Jul 2025, Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning, https://arxiv.org/abs/2507.21494
- Yukino Terui, Yuka Inoue, Yohei Hamakawa, Kosuke Tatsumura, Kazue Kudo, 29 Jul 2025, Collaborative filtering based on nonnegative/binary matrix factorization, https://arxiv.org/abs/2410.10381
- Hongyan Cheng, Chengzhang Yu, Yanshu Shi, Chiyue Wang, Cong Liu, and Zhanpeng Jin, 30 Jul 2025, Collaborative Medical Triage under Uncertainty: A Multi-Agent Dynamic Matching Approach, https://arxiv.org/abs/2507.22504
- Peng-Yi Wu, Pei-Cing Huang, Ting-Yu Chen, Chantung Ku, Ming-Yen Lin, Yihuang Kang, 30 Jul 2025, Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework, https://arxiv.org/abs/2507.22464
- Yuzhen Gao, Qianqian Wang, Yongheng Sun, Cui Wang, Yongquan Liang, Mingxia Liu, 30 Jul 2025, Learning from Heterogeneous Structural MRI via Collaborative Domain Adaptation for Late-Life Depression Assessment, https://arxiv.org/abs/2507.22321
- Thanh Hoang-Minh, 30 Jul 2025, Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs, https://arxiv.org/abs/2507.03947
- Evan Rose, Hidde Lycklama, Harsh Chaudhari, Anwar Hithnawi, Alina Oprea, 1 Aug 2025, UTrace: Poisoning Forensics for Private Collaborative Learning, https://arxiv.org/abs/2409.15126
- Shiyang Duan, Yuan Tian, Qi Bing, Xiaowei Shao, 3 Aug 2025, Bayes-Entropy Collaborative Driven Agents for Research Hypotheses Generation and Optimization, https://arxiv.org/abs/2508.01746
- En Yu, Jie Lu, Kun Wang, Xiaoyu Yang, Guangquan Zhang, 3 Aug 2025, Drift-aware Collaborative Assistance Mixture of Experts for Heterogeneous Multistream Learning, https://arxiv.org/abs/2508.01598
- Ziqi Sheng, Junyan Wu, Wei Lu, Jiantao Zhou, 2 Aug 2025, Weakly-Supervised Image Forgery Localization via Vision-Language Collaborative Reasoning Framework, https://arxiv.org/abs/2508.01338
- Yi Jiang, Sendong Zhao, Jianbo Li, Haochun Wang, Lizhe Zhang, Yan Liu, Bin Qin, 3 Aug 2025, Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy, https://arxiv.org/abs/2508.01696
- Yang Zhao, Chengxiao Dai, Wei Zhuo, Tan Chuan Fu, Yue Xiu, Dusit Niyato, Jonathan Z. Low, Eugene Ho Hong Zhuang, Daren Zong Loong Tan, 3 Aug 2025, AGENTICT$^2$S:Robust Text-to-SPARQL via Agentic Collaborative Reasoning over Heterogeneous Knowledge Graphs for the Circular Economy, https://arxiv.org/abs/2508.01815
- Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, Yuheng Cheng, Suyuchen Wang, Xiaoqiang Wang, Yuyu Luo, Haibo Jin, Peiyan Zhang, Ollie Liu, Jiaqi Chen, Huan Zhang, Zhaoyang Yu, Haochen Shi, Boyan Li, Dekun Wu, Fengwei Teng, Xiaojun Jia, Jiawei Xu, Jinyu Xiang, Yizhang Lin, Tianming Liu, Tongliang Liu, Yu Su, Huan Sun, Glen Berseth, Jianyun Nie, Ian Foster, Logan Ward, Qingyun Wu, Yu Gu, Mingchen Zhuge, Xinbing Liang, Xiangru Tang, Haohan Wang, Jiaxuan You, Chi Wang, Jian Pei, Qiang Yang, Xiaoliang Qi, Chenglin Wu, 2 Aug 2025, Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems, https://arxiv.org/abs/2504.01990
- Siyuan Li, Yifan Yu, Yanchen Deng, Zhihao Zhang, Mengjing Chen, Fangzhou Zhu, Tao Zhong, Jianye Hao, Peng Liu, Bo An, 5 Aug 2025, Collab-Solver: Collaborative Solving Policy Learning for Mixed-Integer Linear Programming, https://arxiv.org/abs/2508.03030
- Arthur Cho, 4 Aug 2025, GrandJury: A Collaborative Machine Learning Model Evaluation Protocol for Dynamic Quality Rubrics, https://arxiv.org/abs/2508.02926
- Marta Moscati, Shah Nawaz, Markus Schedl, 5 Aug 2025, Parameter-Efficient Single Collaborative Branch for Recommendation, https://arxiv.org/abs/2508.03518
- Asutosh Hota, Jussi P.P. Jokinen, 7 Aug 2025, NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making, https://arxiv.org/abs/2508.05344
- Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam, 6 Aug 2025, Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC), https://arxiv.org/abs/2508.04745
- Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, and Minlie Huang, 7 Aug 2025, JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering, https://arxiv.org/abs/2508.05087
- Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Raymond Mooney, Roberto Mart\'in-Mart\'in, 7 Aug 2025, Mixed-Initiative Dialog for Human-Robot Collaborative Manipulation, https://arxiv.org/abs/2508.05535
- Nikita Sukhorukov, Danil Gusak, Evgeny Frolov, 8 Aug 2025, Maximum Impact with Fewer Features: Efficient Feature Selection for Cold-Start Recommenders through Collaborative Importance Weighting, https://arxiv.org/abs/2508.06455
- Shibin Su, Guoqiang Liang, De Cheng, Shizhou Zhang, Lingyan Ran, Yanning Zhang, 12 Aug 2025, Multi-level Collaborative Distillation Meets Global Workspace Model: A Unified Framework for OCIL, https://arxiv.org/abs/2508.08677
- Ratun Rahman, 12 Aug 2025, Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence, https://arxiv.org/abs/2504.17703
- Jing Liu, Yao Du, Kun Yang, Jiaqi Wu, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C.M. Leung, 12 Aug 2025, Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey, https://arxiv.org/abs/2505.01821
- Yue Yao, Zhen Xu, Youzhu Liu, Kunyuan Ma, Yuxiu Lin, Mohan Jiang, 13 Aug 2025, Integrating Feature Attention and Temporal Modeling for Collaborative Financial Risk Assessment, https://arxiv.org/abs/2508.09399
- Hao Yu, Xin Yang, Boyang Fan, Xuemei Cao, Hanlin Gu, Lixin Fan, Qiang Yang, 13 Aug 2025, Large-Small Model Collaborative Framework for Federated Continual Learning, https://arxiv.org/abs/2508.09489
- Muqing Li, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, CoMoE: Collaborative Optimization of Expert Aggregation and Offloading for MoE-based LLMs at Edge, https://arxiv.org/abs/2508.09208
- Lingyu Chen, Yawen Zeng, Yue Wang, Peng Wan, Guo-chen Ning, Hongen Liao, Daoqiang Zhang, Fang Chen, 13 Aug 2025, COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets, https://arxiv.org/abs/2508.09886
- Xinyi Li, Sai Wang, Yutian Lin, Yu Wu, Yi Yang, 14 Aug 2025, Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis, https://arxiv.org/abs/2508.10967
- Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui, 15 Aug 2025, CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems, https://arxiv.org/abs/2508.11287
- Xuyang Zhao, Shiwan Zhao, Hualong Yu, Liting Zhang, Qicheng Li, 16 Aug 2025, AgentCDM: Enhancing Multi-Agent Collaborative Decision-Making via ACH-Inspired Structured Reasoning, https://arxiv.org/abs/2508.11995
- Wentao Li, Yonghu He, Kun Gao, Qing Liu and Yali Zheng, 7 Aug 2025, Collaborative Learning-Enhanced Lightweight Models for Predicting Arterial Blood Pressure Waveform in a Large-scale Perioperative Dataset, https://arxiv.org/abs/2508.11669
- Mohammad Ishzaz Asif Rafid, Morsalin Sakib, 16 Aug 2025, Substituting Proof of Work in Blockchain with Training-Verified Collaborative Model Computation, https://arxiv.org/abs/2508.12138
- Chiranjit Mitra, 17 Aug 2025, Synchronization Dynamics of Heterogeneous, Collaborative Multi-Agent AI Systems, https://arxiv.org/abs/2508.12314
- Chen Qian, Xinran Yu, Zewen Huang, Danyang Li, Qiang Ma, Fan Dang, Xuan Ding, Guangyong Shang, Zheng Yang, 18 Aug 2025, SpotVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer, https://arxiv.org/abs/2508.12638
- Xizhan Gao, Wei Hu, 18 Aug 2025, DCSCR: A Class-Specific Collaborative Representation based Network for Image Set Classification, https://arxiv.org/abs/2508.12745
- Saptarshi Nath, Christos Peridis, Eseoghene Benjamin, Xinran Liu, Soheil Kolouri, Peter Kinnell, Zexin Li, Cong Liu, Shirin Dora, and Andrea Soltoggio, 18 Aug 2025, Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems, https://arxiv.org/abs/2506.05577
- Can Jin, Hongwu Peng, Qixin Zhang, Yujin Tang, Dimitris N. Metaxas, Tong Che, 19 Aug 2025, Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning, https://arxiv.org/abs/2504.09772
- Jo\~ao Vitor de Carvalho Silva and Douglas G. Macharet, 20 Aug 2025, Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination, https://arxiv.org/abs/2508.14635
- Lixiang Yan, 20 Aug 2025, From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning, https://arxiv.org/abs/2508.14825
- Amir Kermanshahani, Ebrahim Ardeshir-Larijani, Rakesh Saini and Saif Al-Kuwari, 12 Aug 2025, Collaborative Filtering using Variational Quantum Hopfield Associative Memory, https://arxiv.org/abs/2508.14906
- Simon Lepage, Jeremie Mary, David Picard, 12 Aug 2025, Closing the Performance Gap in Generative Recommenders with Collaborative Tokenization and Efficient Modeling, https://arxiv.org/abs/2508.14910
- Sindhuja Penchala, Saketh Reddy Kontham, Prachi Bhattacharjee, Sareh Karami, Mehdi Ghahremani, Noorbakhsh Amiri Golilarz, and Shahram Rahimi, 5 Aug 2025, Learning in Focus: Detecting Behavioral and Collaborative Engagement Using Vision Transformers, https://arxiv.org/abs/2508.15782
- Zirui Li and Stephan Husung and Haoze Wang, 22 Aug 2025, LLM-Assisted Semantic Alignment and Integration in Collaborative Model-Based Systems Engineering Using SysML v2, https://arxiv.org/abs/2508.16181
- Yu Yan, Sheng Sun, Zixiang Tang, Teli Liu, Min Liu, 22 Aug 2025, Collaborative Stance Detection via Small-Large Language Model Consistency Verification, https://arxiv.org/abs/2502.19954
Consensus Decoding
Consensus decoding is a type of collaborative inference where multiple models must form a "consensus" for the predicted output token. The idea is that two or more models perform inference independently, each predicting token probabilities, and then their results are combined to output a "best" token. Note that this differs from approaches such as speculative decoding (or other more generalized types of collaborative inference), where the two models affect each other's inference in progress.
Research papers on consensus decoding include:
- Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, Ji-Rong Wen, Dec 2023, Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation, https://arxiv.org/abs/2311.09049 Code: https://github.com/RUCAIBox/LC-Rec/
- Mikolaj Jankowski, Deniz Gunduz, Krystian Mikolajczyk, Nov 2023, Adaptive Early Exiting for Collaborative Inference over Noisy Wireless Channels, https://arxiv.org/abs/2311.18098 (Early exiting combined with collaborative inference.)
- Adam Pauls, John DeNero and Dan Klein, 2009, Consensus Training for Consensus Decoding in Machine Translation, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1418–1427, https://aclanthology.org/D09-1147.pdf
- Nir Shlezinger; Erez Farhan; Hai Morgenstern; Yonina C. Eldar, 2021, Collaborative Inference via Ensembles on the Edge, ICASSP 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/9414740
- Nir Shlezinger; Ivan V. Bajić, 2022, Collaborative Inference for AI-Empowered IoT Devices, IEEE Internet of Things Magazine (Volume: 5, Issue: 4, December 2022), https://ieeexplore.ieee.org/abstract/document/10012474
- Caelin Kaplan, Tareq Si Salem, Angelo Rodio, Chuan Xu, Giovanni Neglia, 7 May 2024, Federated Learning for Cooperative Inference Systems: The Case of Early Exit Networks, https://arxiv.org/abs/2405.04249
- David Spuler, March 2024, Chapter 54. Ensemble Multi-Model Architectures, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Gengrui Zhang, Shiquan Zhang, Michail Bachras, Yuqiu Zhang, Hans-Arno Jacobsen, 11 Mar 2025, Cabinet: Dynamically Weighted Consensus Made Fast, https://arxiv.org/abs/2503.08914
- Luyao Tang, Kunze Huang, Chaoqi Chen, Yuxuan Yuan, Chenxin Li, Xiaotong Tu, Xinghao Ding, Yue Huang, 14 Aug 2025, Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction, https://arxiv.org/abs/2508.10731
- Shijun Guo, Haoran Xu, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyi Zhang, Yishan Song, Jiwei Chen, 11 Jul 2025, H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance, https://arxiv.org/abs/2507.13370
- Myeung Suk Oh, Zhiyao Zhang, FNU Hairi, Alvaro Velasquez, Jia Liu, 9 Aug 2025, Consensus-based Decentralized Multi-agent Reinforcement Learning for Random Access Network Optimization, https://arxiv.org/abs/2508.07001
- Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, and Sara Beery, 31 Jul 2025, Consensus-Driven Active Model Selection, https://arxiv.org/abs/2507.23771
- Cathy Speed, Ahmed A. Metwally, 12 Aug 2025, The Human-AI Hybrid Delphi Model: A Structured Framework for Context-Rich, Expert Consensus in Complex Domains, https://arxiv.org/abs/2508.09349
More Research on Decoding Algorithms
- Decoding algorithms (overview)
— Non-autoregressive decoding
— Greedy decoding
— Top-k decoding
— Top-p decoding
— Min-P Sampling
— Flash decoding
— Beam search decoding
— Edit decoding
— Contrastive decoding
— Constrained decoding - Parallel decoding (overview)
— Blockwise parallel decoding
— n-gram parallel decoding
— Lookahead decoding
— Medusa decoding
— Consensus decoding - Speculative decoding (overview)
— Generalized speculative decoding
— Aggressive decoding
— Lookup decoding
— Retrieval lookup decoding
— Prompt lookup decoding
— Self speculative decoding
— Tree speculative decoding
— Superposed decoding
— Hierarchical speculative decoding
— Heuristic speculative decoding
— Multi-token speculative decoding
— Sequential speculative decoding
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about:
- Ensemble Model Architectures
- Speculative Decoding
- Inference Optimizations
- Loop Optimizations
- Code Optimizations
- « Research Home