Aussie AI
Knowledge Distillation Research
-
Last Updated 17 November, 2025
-
by David Spuler, Ph.D.
Knowledge Distillation (KD) is a model optimization technique where a larger pre-trained model is used to train a smaller more-efficient model. When used successfully, the result is a small model with faster inference that closely matches the accuracy of the larger model.
Distillation is not technically an ensemble method, because the larger model is not used during inference. Hence, it is not the same as "big-small" dual inference architectures.
Distillation also differs from "fine tuning" or "re-training", which involve extra training on the (large) model, whereas knowledge distillation involves training a new, smaller model from scratch.
Recent advances in Knowledge Distillation include novel ways to directly transfer the learning, weighting approaches rather than exact probability transfer, and multi-model distillation approaches whereby the smaller student model can gain information from multiple teachers.
Survey Papers on Knowledge Distillation
Review papers with coverage of KD include:
- Y Tian, S Pei, X Zhang, C Zhang, 2023, Knowledge Distillation on Graphs: A Survey, arXiv preprint, https://arxiv.org/abs/2302.00219
- Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A survey of model compression and acceleration for deep neural networks. CoRR, abs/1710.09282, 2017. https://arxiv.org/abs/1710.09282 (A survey paper from 2017 that includes KD.)
- Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei Li, A Survey on Green Deep Learning, Nov 2021, https://arxiv.org/abs/2111.05193 (Extensive survey paper on multiple areas, including a section on Knowledge Distillation.)
- Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang, A Survey on Model Compression for Large Language Models, arXiv preprint arXiv:2308.07633, Aug 2023 https://arxiv.org/abs/2308.07633 (Recent 2023 survey paper on various model compression approaches including knowledge distillation.)
- Wang L, Yoon KJ. Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks. 2021;44:3048-3068 https://arxiv.org/abs/2004.05937 (Distillation in vision context.)
- Tom Wallace, Naser Ezzati-Jivan, Beatrice Ombuki-Berman, 16 Jan 2025, Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models, https://arxiv.org/abs/2502.00046
Research on Knowledge Distillation
KD is a longstanding method of optimizing model inference that is one of the most popular techniques. Research papers on KD include:
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. https://arxiv.org/abs/1503.02531 (The early paper that seems to have coined the name.)
- Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf, Oct 2019 (revised March 2020), DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019), https://arxiv.org/abs/1910.01108
- Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu, TinyBERT: Distilling BERT for Natural Language Understanding, arXiv preprint arXiv:1909.10351, Sep 2019 (updated Oct 2020), https://arxiv.org/abs/1909.10351, Code: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT
- Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou, MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices, arXiv preprint arXiv:2004.02984 (2020), https://arxiv.org/abs/2004.02984
- Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu, Patient Knowledge Distillation for BERT Model Compression, arXiv preprint arXiv:1908.09355 (Aug 2019), https://arxiv.org/abs/1908.09355
- Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. MINILMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers, arXiv preprint arXiv:2002.10957, 2020 (revised June 2021), https://arxiv.org/abs/2012.15828
- Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin, Distilling Task-Specific Knowledge from BERT into Simple Neural Networks, arXiv preprint arXiv:1903.12136 (Mar 2019), https://arxiv.org/abs/1903.12136
- Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, and Qi Ju. FastBERT: a self-distilling BERT with adaptive inference time. arXiv preprint arXiv:2004.02178, 2020. https://arxiv.org/abs/2004.02178
- Jean Senellart, Dakun Zhang, Bo Wang, Guillaume Klein, Jean-Pierre Ramatchandirin, Josep Crego, and Alexander Rush. OpenNMT system description for WNMT 2018: 800 words/sec on a single-core CPU. In Proc. of WNG, 2018. https://www.aclweb.org/anthology/W18-2715
- Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-read students learn better: The impact of student initialization on knowledge distillation. arXiv preprint arXiv:1908.08962, https://arxiv.org/abs/1908.08962v1
- Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, and Amelia Archer. 2019. Small and Practical BERT Models for Sequence Labeling. arXiv preprint arXiv:1909.00100. https://arxiv.org/abs/1909.00100
- Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin. 2019. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks, arXiv preprint arXiv:1903.12136, https://arxiv.org/abs/1903.12136
- Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D Manning, and Quoc V Le. July 2019. BAM! Born-Again Multi-Task Networks for Natural Language Understanding arXiv preprint arXiv:1907.04829. https://arxiv.org/abs/1907.04829
- Yoon Kim and Alexander M Rush. Sep 2016. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947, https://arxiv.org/abs/1606.07947
- Kaixin Wu, Bojie Hu, and Qi Ju. 2021. TenTrans High-Performance Inference Toolkit for WMT2021 Efficiency Task. In Proceedings of the Sixth Conference on Machine Translation, pages 795–798, Online. Association for Computational Linguistics, https://aclanthology.org/2021.wmt-1.77/, Code: https://github.com/TenTrans/TenTrans
- Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Advances in Neural Information Processing Systems, pages 742–751. https://dl.acm.org/doi/10.5555/3294771.3294842
- Mao, Y.; Wang, Y.; Wu, C.; Zhang, C.; Wang, Y.; Zhang, Q.; Yang, Y.; Tong, Y.; and Bai, J. 2020. LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression. In COLING, 3225–3234. International Committee on Computational Linguistics. https://arxiv.org/abs/2004.04124 (A combination of weight pruning, matrix factorization and knowledge distillation.)
- Lin, S.C.; Yang, J.H.; Lin, J., Distilling dense representations for ranking using tightly-coupled teachers. arXiv preprint arXiv:2010.11386 2020. https://arxiv.org/abs/2010.11386
- Dawei Li, Xiaolong Wang, and Deguang Kong. 2018. DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices. In AAAI’18, https://arxiv.org/abs/1708.04728 (Includes a distinct type of distillation.)
- Cristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos (eds.), Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, pp. 535–541. ACM, 2006. https://www.semanticscholar.org/paper/Model-compression-Bucila-Caruana/30c9bb327b7f2b9f1d1e5b69b9d0c97b410948d9, PDF: http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf (Early 2006 paper on teaching models before it became called "distillation" in 2015.)
- Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, and Zhi Jin. Distilling word embeddings: An encoding approach. In CIKM, pp. 1977–1980. ACM, 2016. https://arxiv.org/abs/1506.04488 (Distillation of embeddings.)
- Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian J. McAuley, and Furu Wei. Beyond preserved accuracy: Evaluating loyalty and robustness of BERT compression. CoRR, abs/2109.03228, 2021, https://arxiv.org/abs/2109.03228 (Evaluation of the efficiency of distillation.)
- Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, and Andrew Gordon Wilson. Does knowledge distillation really work? CoRR, abs/2106.05945, 2021, https://arxiv.org/abs/2106.05945 (Evaluation of the efficacy of distillation.)
- Dae Young Park, Moon-Hyun Cha, Changwook Jeong, Daesin Kim, and Bohyung Han. Learning student-friendly teacher networks for knowledge distillation. CoRR, abs/2102.07650, 2021. https://arxiv.org/abs/2102.07650
- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014 https://arxiv.org/abs/1412.6550
- Anil Kag, 2023, Novel neural architectures & algorithms for efficient inference, Ph.D. thesis, College of Engineering, Boston University, https://open.bu.edu/handle/2144/46649, PDF: https://open.bu.edu/bitstream/handle/2144/46649/Kag_bu_0017E_18472.pdf?sequence=8&isAllowed=y Code: https://github.com/anilkagak2/DiSK_Distilling_Scaffolded_Knowledge (See Chapter 13: Distilling Selective/Scaffolded Knowledge)
- Zhang, C.; Yang, Y.; Liu, J.; Wang, J.; Xian, Y.; Wang, B.; and Song, D. 2023. Lifting the Curse of Capacity Gap in Distilling Language Models. arXiv:2305.12129. https://arxiv.org/abs/2305.12129
- Chen X, He B, Hui K, Sun L, Sun Y. Simplified Tinybert: Knowledge Distillation for Document Retrieval. 2020. Arxiv preprint, https://arxiv.org/abs/2009.07531
- Tian Y, Krishnan D, Isola P. Contrastive Representation Distillation. 2019. Arxiv Preprint: https://arxiv.org/pdf/1910.10699.pdf
- Do T, Tran H, Do T, Tjiputra E, Tran Q. Compact Trilinear Interaction for Visual Question Answering. In: Proceedings of the proceedings of the IEEE International Conference on Computer Vision. 2019:392-401. https://arxiv.org/abs/1909.11874
- T Chen, S Liu, Z Chen, W Hu, D Chen, Y Wang, Q Lyu, 2023, Faster, Stronger, and More Interpretable: Massive Transformer Architectures for Vision-Language Tasks, https://www.oajaiml.com/uploads/archivepdf/27841181.pdf
- B. Heo, M. Lee, S. Yun and J. Y. Choi, "Knowledge transfer via distillation of activation boundaries formed by hidden neurons", Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 33, no. 1, pp. 3779-3787, 2019, https://arxiv.org/abs/1811.03233
- Guorui Zhou, Ying Fan, Runpeng Cui, Weijie Bian, Xiaoqiang Zhu, Kun Gai, "Rocket launching: A universal and efficient framework for training well-performing light net", Proc. AAAI Conf. Artif. Intell., pp. 1-8, 2018. https://arxiv.org/abs/1708.04106 (Combined training of teacher and student models.)
- A. Chaulwar et al., "Extreme compression of sentence-transformer ranker models: Faster inference longer battery life and less storage on edge devices", arXiv:2207.12852, 2022. https://arxiv.org/abs/2207.12852v1 (Distillation from the point of view of embeddings.)
- T Shen, C Lee, V Narayanan, Oct 2023, Multi-Exit Vision Transformer with Custom Fine-Tuning for Fine-Grained Image Recognition, 2023 IEEE International Conference on Image Processing (ICIP), https://ieeexplore.ieee.org/abstract/document/10222298 (Early exit from multiple places, combined with self-distillation.)
- Z Zhao, Q Liu, H Gui, B An, L Hong, H Chi, Oct 2023, Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication, arXiv preprint arXiv:2310.03188, https://arxiv.org/pdf/2310.03188.pdf
- T Udagawa, A Trivedi, M Merler, B Bhattacharjee, Oct 2023, A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models, arXiv preprint arXiv:2310.08797, https://arxiv.org/abs/2310.08797
- Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal, Oct 2023, DistillSpec: Improving Speculative Decoding via Knowledge Distillation, https://arxiv.org/abs/2310.08461
- Jin Wang, Dawei Liao, You Zhang, Dan Xu, Xuejie Zhang, 2024, Layerwised multimodal knowledge distillation for vision-language pretrained model, Neural Networks Available online 26 March 2024, 106272, https://doi.org/10.1016/j.neunet.2024.106272
- Zao Zhang, 23 May 2024, Design Efficient Deep Neural Networks with System Optimization, Ph.D. Thesis, School of Electrical and Information Engineering, Faculty of Engineering, The University of Sydney, Australia, PDF: https://ses.library.usyd.edu.au/bitstream/handle/2123/32642/zhang_z_thesis.pdf?sequence=1&isAllowed=y https://ses.library.usyd.edu.au/handle/2123/32642 https://hdl.handle.net/2123/32642
- Maolin Wang, Yao Zhao, Jiajia Liu, Jingdong Chen, Chenyi Zhuang, Jinjie Gu, Ruocheng Guo, Xiangyu Zhao, May 2024, Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation, WWW '24: Companion Proceedings of the ACM on Web Conference 2024May 2024, Pages 235–244, https://doi.org/10.1145/3589335.3648321
- Maolin Wang, Yao Zhao, Jiajia Liu, Jingdong Chen, Chenyi Zhuang, Jinjie Gu, Ruocheng Guo, Xiangyu Zhao, Dec 2023, Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup, https://arxiv.org/abs/2312.05795
- Canwen Xu, 2024, Efficient Natural Language Processing for Language Models, Ph.D. thesis, Computer Science, UNIVERSITY OF CALIFORNIA SAN DIEGO, PDF: https://escholarship.org/uc/item/9dv1k5xv PDF: https://escholarship.org/content/qt9dv1k5xv/qt9dv1k5xv.pdf?t=sc34ay (Evaluates several acceleration methods including early-exit, PEFT, and distillation.)
- Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
- Georgy Tyukin, 2 Apr 2024, Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations, Masters Thesis, Data Science and Machine Learning, University College London., https://arxiv.org/abs/2404.05741 (Reviews various model compression and inference optimization techniques, and specifically analyzes layer skipping and sublayer skipping, such as attention head pruning and FFN/MLP pruning.)
- Busayo Awobade, Mardiyyah Oduwole, Steven Kolawole, 6 Apr 2024, What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models, https://arxiv.org/abs/2404.04759 (General article shows that the big three of model compression work not just on compression big LLMs, but also on making small models even smaller.)
- Zuo, G., Zhang, C., Zheng, Z. et al., 2024, Knowledge distillation based on projector integration and classifier sharing. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01394-3 https://link.springer.com/article/10.1007/s40747-024-01394-3
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
- You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
- Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
- Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu, Nov 2023, Initializing Models with Larger Ones, https://arxiv.org/abs/2311.18823 Code: https://github.com/OscarXZQ/weight-selection
- Y Liu, Z Lin, F Yuan, 2021, Rosita: Refined bert compression with integrated techniques, The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), https://arxiv.org/abs/2103.11367 Code: https://github.com/llyx97/Rosita (Pruning on multiple dimensions of layer, FFN outputs, and embeddings, also combined with distillation.)
- Canwen Xu, Julian McAuley, Nov 2022, A Survey on Model Compression and Acceleration for Pretrained Language Models, https://arxiv.org/abs/2202.07105
- Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A survey of model compression and acceleration for deep neural networks. CoRR, abs/1710.09282, 2017. https://arxiv.org/abs/1710.09282
- Sean Farhat, Deming Chen, 4 Apr 2024, On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models, https://arxiv.org/abs/2404.03263
- Rachel Gordon, Publication Date:March 21, 2024, AI generates high-quality images 30 times faster in a single step, MIT News, https://news.mit.edu/2024/ai-generates-high-quality-images-30-times-faster-single-step-0321 (MIT's new image generation framework called "distribution matching distillation" is faster than diffusion models.)
- Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He, 15 Feb 2024, Model Compression and Efficient Inference for Large Language Models: A Survey, https://arxiv.org/abs/2402.09748 (General survey of various model compression and other inference optimizations.)
- Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi, 26 Feb 2024, Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding, https://arxiv.org/abs/2402.16844 (Using a large model to train parallel decoding for a small language model.)
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 15 Mar 2024 (v5), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer (A large survey of a variety of LLM optimizations.)
- Nathan Brown, Ashton Williamson, Tahj Anderson, Logan Lawrence, 22 Nov 2023, Efficient Transformer Knowledge Distillation: A Performance Review, https://arxiv.org/abs/2311.13657
- Chang Liu, Chongyang Tao, Jianxin Liang, Jiazhan Feng, Tao Shen, 2023, Quzhe Huang, Dongyan Zhao,Length-Adaptive Distillation: Customizing Small Language Model for Dynamic Token Pruning, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4452–4463, December 6-10, 2023, https://aclanthology.org/2023.findings-emnlp.294.pdf (Explores combining static model compression via knowledge distillation with dynamic adaptive inference via token pruning. This creates a modified distillation algorithm that prepares the model for token pruning during training.)
- QuaLA-MiniLM: a Quantized Length Adaptive MiniLM, Shira Guskin, Moshe Wasserblat, Chang Wang, Haihao Shen, May 2023, https://arxiv.org/abs/2210.17114 (Intel labs paper. Low-bit quantization, distillation, and Length-Adaptive Transformer (LAT) technique. )
- Shashank Verma and Neal Vaidya, Mastering LLM Techniques: Inference Optimization, Nov 17, 2023, NVIDIA Technical Blog, https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/
- Erik Pettersson, Sep 2023, Knowledge distillation for anomaly detection, Master's Thesis, Faculty of Science and Technology, Uppsala University, https://www.diva-portal.org/smash/get/diva2:1805667/FULLTEXT01.pdf
- Ba, J. and Caruana, R. (2014). Do deep nets really need to be deep? In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2014/file/ ea8fcd92d59581717e06eb187f10666d-Paper.pdf.
- Bucila, C., Caruana, R., and Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, page 535–541, New York, NY, USA. Association for Computing Machinery. URL: https://doi.org/10.1145/1150402.1150464.
- Zeng, X. and Martinez, T. R. (2000). Using a neural networks to approximate an ensemble of classifiers. In Neural Processing Letters, page 2000. URL: https://doi.org/10.1023/A:1026530200837
- K Nan, S Liu, J Du, H Liu, 2019, Deep model compression for mobile platforms: A survey, Tsinghua Science and Technology (Volume 24, Issue 6, December 2019), https://ieeexplore.ieee.org/abstract/document/8727762 PDF: https://ieeexplore.ieee.org/iel7/5971803/8727756/08727762.pdf
- David Spuler, March 2024, Chapter 45. Knowledge Distillation, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- A Gudibande, E Wallace, C Snell, X Geng, H Liu 2023, The false promise of imitating proprietary llms, https://arxiv.org/abs/2305.15717
- Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang 2023, Aligning large language models with human: A survey, https://arxiv.org/abs/2307.12966
- Y Gu, L Dong, F Wei, M Huang, 2023, Knowledge Distillation of Large Language Models, https://arxiv.org/abs/2306.08543
- X Wan, R Sun, H Dai, SO Arik, T Pfister, 2023, Better zero-shot reasoning with self-adaptive prompting, https://arxiv.org/abs/2305.14106
- S Horawalavithana, S Munikoti, I Stewart, 2023, SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions, https://arxiv.org/abs/2307.01139
- X Daull, P Bellot, E Bruno, V Martin, 2023, Complex QA and language models hybrid architectures, Survey, https://arxiv.org/abs/2302.09051
- K Wu, J Zhang, H Peng, M Liu, B Xiao, J Fu, 2022, Tinyvit: Fast pretraining distillation for small vision transformers, https://arxiv.org/pdf/2207.10666.pdf%E2%80%8B
- S Norouzi, R Hosseinzadeh, F Perez, 2023, DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers for Machine Translation, https://aclanthology.org/2023.findings-acl.542/
- Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu, 6 Jul 2023 (v2), A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, https://arxiv.org/pdf/2204.09269.pdf
- Asit Mishra and Debbie Marr. 2017. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. arXiv:1711.05852 [cs] (Nov. 2017). http://arxiv.org/abs/1711.05852 arXiv: 1711.05852.
- Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta, 24 Apr 2024 (v2), Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward, https://arxiv.org/abs/2402.01799 Code: https://github.com/nyunAI/Faster-LLM-Survey
- 8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
- Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis, 12 Jul 2024, Inference Optimization of Foundation Models on AI Accelerators, KDD’24, August 25–29, 2024, Barcelona, Spain, https://arxiv.org/abs/2407.09111
- 18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
- Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
- Leo Donisch, Sigurd Schacht, Carsten Lanquillon, 6 Aug 2024, Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations, https://arxiv.org/abs/2408.03130
- Louie Peters, Aug 27, 2024, Two Paths to Small LMs? Synthetic Data (Phi 3.5) vs Pruning & Distillation (Llama-3.1-Minitron), https://newsletter.towardsai.net/p/114-two-paths-to-small-lms-synthetic
- Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao, 18 Sep 2024, Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview, https://arxiv.org/abs/2409.11650 (Extensive survey of quantization from the basics to SOTA approaches, with also some coverage of knowledge distillation and KV cache compression.)
- Meta, August 14, 2024, How NVIDIA is using structured weight pruning and knowledge distillation to build new Llama models, Meta AI Blog, https://ai.meta.com/blog/nvidia-llama/
- Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov, 19 Jul 2024, Compact Language Models via Pruning and Knowledge Distillation, https://arxiv.org/abs/2407.14679 https://github.com/NVlabs/Minitron (Combination of distillation and structured pruning on the depth and width dimensions.)
- Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov, 26 Aug 2024 (v2), LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796
- Sharath Sreenivas, Vinh Nguyen, Saurav Muralidharan, Marcin Chochowski and Raviraj Joshi, How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model, Aug 14, 2024, https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/
- Siru Ouyang, Shuohang Wang, Minhao Jiang, Ming Zhong, Donghan Yu, Jiawei Han, Yelong Shen, 14 Oct 2024, Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation, https://arxiv.org/abs/2410.10141 https://github.com/ozyyshr/TempSpec
- Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
- Yuzhe Yang, Yipeng Du, Ahmad Farhan, Claudio Angione, Yue Zhao, Harry Yang, Fielding Johnston, James Buban, Patrick Colangelo, 28 Oct 2024, Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments, https://arxiv.org/abs/2410.21340 (Choosing between multiple acceleration techniques).
- Shen, J., Liu, Y., Jiang, Y., Chen, Y., Han, W. (2025). Model-Agnostic Knowledge Distillation Between Heterogeneous Models. In: Wong, D.F., Wei, Z., Yang, M. (eds) Natural Language Processing and Chinese Computing. NLPCC 2024. Lecture Notes in Computer Science(), vol 15359. Springer, Singapore. https://doi.org/10.1007/978-981-97-9431-7_19 https://link.springer.com/chapter/10.1007/978-981-97-9431-7_19
- Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
- M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
- Thanaphon Suwannaphong, Ferdian Jovan, Ian Craddock, Ryan McConville, 12 Dec 2024, Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices, https://arxiv.org/abs/2412.09289
- Dongting Hu, Jierun Chen, Xijie Huang, Huseyin Coskun, Arpit Sahni, Aarush Gupta, Anujraaj Goyal, Dishani Lahiri, Rajesh Singh, Yerlan Idelbayev, Junli Cao, Yanyu Li, Kwang-Ting Cheng, S.-H. Gary Chan, Mingming Gong, Sergey Tulyakov, Anil Kag, Yanwu Xu, Jian Ren, 12 Dec 2024, SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training, https://arxiv.org/abs/2412.09619
- Guiyu Li, Shang Zheng, Haitao Zou, Hualong Yu, Shang Gao, 2024, Model compression through distillation with cross-layer integrated guidance at word level, Neurocomputing, 129162, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2024.129162. https://www.sciencedirect.com/science/article/abs/pii/S0925231224019337
- Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen, https://arxiv.org/abs/2412.13437 18 Dec 2024, Deploying Foundation Model Powered Agent Services: A Survey, (A survey of not just deployment, but many inference optimization techniques.)
- Giordano d'Aloisio, Luca Traini, Federica Sarro, Antinisca Di Marco, 18 Dec 2024, On the Compression of Language Models for Code: An Empirical Study on CodeBERT, https://arxiv.org/abs/2412.13737 (Quantization, pruning and distillation on code generation models.)
- Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber, 2 Nov 2023, Implicit Chain of Thought Reasoning via Knowledge Distillation, https://arxiv.org/abs/2311.01460 (Knowledge distillation applied to optimizing the interim computations in Chain-of-Thought.)
- Kalle Kujanpää, Harri Valpola, Alexander Ilin, 19 Dec 2024, Knowledge Injection via Prompt Distillation, https://arxiv.org/abs/2412.14964
- Yifan Yu, Yu Gan, Lily Tasi, Nikhil Sarda, Jiaming Shen, Yanqi Zhou, Arvind Krishnamurthy, Fan Lai, Henry M. Levy, David Culler, 22 Jan 2025, EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation, https://arxiv.org/abs/2501.12689 (Using a semantic cache to prepend previously computed answers from similar queries as promopt examples, to improve results from a smaller LLM's final result.)
- DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, et al. (100+ additional authors not shown), 22 Jan 2025, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948 (The DeepSeek R1 large reasoning model.)
- Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
- Cristian Leo, Feb 2025, How to Distill a LLM: Step-by-step. The Google Paper that started efficient LLM distillation. Let’s explore how it works, the math behind this technique, and how to implement it with code. https://medium.com/data-science-collective/how-to-distill-a-llm-step-by-step-58f06fcf4bfa
- Jasmine Wu, Deirdre Bosa, Feb 21 2025, How DeepSeek used distillation to train its artificial intelligence model, and what it means for companies such as OpenAI, https://www.cnbc.com/2025/02/21/deepseek-trained-ai-model-using-distillation-now-a-disruptive-force.html
- Kayhan Behdin, Yun Dai, Ata Fatahibaarzi, Aman Gupta, Qingquan Song, Shao Tang, Hejian Sang, Gregory Dexter, Sirou Zhu, Siyu Zhu, Tejas Dharamsi, Maziar Sanjabi, Vignesh Kothapalli, Hamed Firooz, Zhoutong Fu, Yihan Cao, Pin-Lun Hsu, Fedor Borisyuk, Zhipeng Wang, Rahul Mazumder, Natesh Pillai, Luke Simon, 20 Feb 2025, Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications, https://arxiv.org/abs/2502.14305 (Deploying small models for efficiency via distillation and quantization/pruning.)
- Shaibal Saha, Lanyu Xu, 26 Feb 2025, Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies, https://arxiv.org/abs/2503.02891
- Yuichiro Hoshino, Hideyuki Tachibana, Muneyoshi Inahara, Hiroto Takegawa, 28 May 2025, RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding, https://arxiv.org/abs/2505.22135
- Michael List, July 2025, Distillation for Efficient History Compression in Reinforcement Learning, Master’s Thesis, https://epub.jku.at/obvulihs/content/titleinfo/12295461/full.pdf
- Youping Gu, Xiaolong Li, Yuhao Hu, Bohan Zhuang, 14 Aug 2025, Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation, https://arxiv.org/abs/2508.10774
- Juntao Lin, Xianghao Zhan, 22 Jul 2025, Sensor Drift Compensation in Electronic-Nose-Based Gas Recognition Using Knowledge Distillation, https://arxiv.org/abs/2507.17071
- Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma, 23 Jul 2025, Dataset Distillation as Data Compression: A Rate-Utility Perspective, https://arxiv.org/abs/2507.17221
- Songming Zhang, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen, Jinan Xu, 23 Jul 2025, AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation, https://arxiv.org/abs/2503.02832
- Cairong Zhao, Yufeng Jin, Zifan Song, Haonan Chen, Duoqian Miao, Guosheng Hu, 22 Jul 2025, Cross-Modal Distillation For Widely Differing Modalities, https://arxiv.org/abs/2507.16296
- Norah Alballa, Ahmed M. Abdelmoniem, Marco Canini, 22 Jul 2025, Practical Insights into Knowledge Distillation for Pre-Trained Models, https://arxiv.org/abs/2402.14922
- Yuki Kadokawa, Hirotaka Tahara, Takamitsu Matsubara, 22 Jul 2025, Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning, https://arxiv.org/abs/2412.07477
- Lakshmana Sri Harsha Nemani, P.K. Srijith, Tomasz Ku\'smierczyk, 24 Jul 2025, Efficient Uncertainty in LLMs through Evidential Knowledge Distillation, https://arxiv.org/abs/2507.18366
- Magnus Bengtsson, Kenneth \"Ostberg, 24 Jul 2025, C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation, https://arxiv.org/abs/2507.18533
- Zhen Han, Mattias Teye, Derek Yadgaroff, Judith B\"utepage, 24 Jul 2025, Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation, https://arxiv.org/abs/2507.18352
- Anshumann, Mohd Abbas Zaidi, Akhil Kedia, Jinwoo Ahn, Taehwak Kwon, Kangwook Lee, Haejun Lee, Joohyung Lee, 24 Jul 2025, Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs, https://arxiv.org/abs/2503.16870
- Jiequan Cui, Beier Zhu, Qingshan Xu, Xiaogang Xu, Pengguang Chen, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong, 19 Jul 2025, Generative Distribution Distillation, https://arxiv.org/abs/2507.14503
- Zihao Hu (1), Jia Yan (2), Ying-Jun Angela Zhang (1), Jun Zhang (3), Khaled B. Letaief (3) ((1) The Chinese University of Hong Kong, (2) The Hong Kong University of Science and Technology (Guangzhou), (3) The Hong Kong University of Science and Technology), 21 Jul 2025, Optimal Transceiver Design in Over-the-Air Federated Distillation, https://arxiv.org/abs/2507.15256
- Songming Zhang and Yuxiao Luo and Ziyu Lyu and Xiaofeng Chen, 19 Jul 2025, ShiftKD: Benchmarking Knowledge Distillation under Distribution Shift, https://arxiv.org/abs/2312.16242
- Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, Hao Wu, 20 Jul 2025, Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting, https://arxiv.org/abs/2507.02939
- Ashley Lewis, Michael White, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang, 21 Jul 2025, Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents, https://arxiv.org/abs/2502.19545
- Changli Wang, Fang Yin, Jiafeng Liu, Rui Wu, 20 Jul 2025, HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space, https://arxiv.org/abs/2507.09487
- Hayeon Kim, Ji Ha Jang and Se Young Chun, 21 Jul 2025, Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling, https://arxiv.org/abs/2507.11061
- Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou, 11 Aug 2025, ThinkTuning: Instilling Cognitive Reflections without Distillation, https://arxiv.org/abs/2508.07616
- Ziqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu, Yifeng Cai, Hamed Haddadi, 9 Aug 2025, Membership and Memorization in LLM Knowledge Distillation, https://arxiv.org/abs/2508.07054
- Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro, Aaron Courville, Perouz Taslakian, 9 Aug 2025, BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation, https://arxiv.org/abs/2508.06781
- Zihao Hu (1), Jia Yan (2), Ying-Jun Angela Zhang (1) ((1) The Chinese University of Hong Kong, (2) The Hong Kong University of Science and Technology (Guangzhou)), 6 Aug 2025, Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation, https://arxiv.org/abs/2508.06557
- Simon Baur, Alexandra Benova, Emilio Dolgener Cant\'u, Jackie Ma, 6 Aug 2025, On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications, https://arxiv.org/abs/2508.06558
- Deepon Halder, Thanmay Jayakumar, Raj Dabre, 9 Aug 2025, CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation, https://arxiv.org/abs/2506.19952
- Robert Frenken, Sidra Ghayour Bhatti, Hanqin Zhang, Qadeer Ahmed, 25 Jul 2025, KD-GAT: Combining Knowledge Distillation and Graph Attention Transformer for a Controller Area Network Intrusion Detection System, https://arxiv.org/abs/2507.19686
- Yang Zhao, Shusheng Li, Xueshang Feng, 28 Jul 2025, Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit, https://arxiv.org/abs/2507.20623
- Joey Chan, Zhen Chen, Ershun Pan, 27 Jul 2025, Foundation Models Knowledge Distillation For Battery Capacity Degradation Forecast, https://arxiv.org/abs/2505.08151
- Ren Zhuang, Ben Wang, Shuifa Sun, 25 Jul 2025, AGORA: Incentivizing Group Emergence Capability in LLMs via Group Distillation, https://arxiv.org/abs/2507.21166
- Siddhartha Pradhan, Shikshya Shiwakoti, Neha Bathuri, 29 Jul 2025, Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation, https://arxiv.org/abs/2507.21992
- Sheng-Feng Yu, Jia-Jiun Yao, and Wei-Chen Chiu, 29 Jul 2025, Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation, https://arxiv.org/abs/2507.21455
- Giovanni Dispoto, Paolo Bonetti, Marcello Restelli, 29 Jul 2025, "So, Tell Me About Your Policy...": Distillation of interpretable policies from Deep Reinforcement Learning agents, https://arxiv.org/abs/2507.07848
- Guopeng Li, Qiang Wang, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia, 29 Jul 2025, Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation, https://arxiv.org/abs/2410.12342
- Kunyang Li, Jeffrey A Chan Santiago, Sarinda Dhanesh Samarasinghe, Gaowen Liu, Mubarak Shah, 30 Jul 2025, GVD: Guiding Video Diffusion Model for Scalable Video Distillation, https://arxiv.org/abs/2507.22360
- Wenchao Gu and Zongyi Lyu and Yanlin Wang and Hongyu Zhang and Cuiyun Gao and Michael R. Lyu, 1 Aug 2025, SPENCER: Self-Adaptive Model Distillation for Efficient Code Retrieval, https://arxiv.org/abs/2508.00546
- Zhen Wu, Ritam Dutt, Luke M. Breitfeller, Armineh Nourbakhsh, Siddharth Parekh, Carolyn Ros\'e, 2 Aug 2025, $R^2$-CoD: Understanding Text-Graph Complementarity in Relational Reasoning via Knowledge Co-Distillation, https://arxiv.org/abs/2508.01475
- Kotaro Yoshida, Yuji Naraki, Takafumi Horie, Ryotaro Shimizu, Hiroki Naganuma, 2 Aug 2025, DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging, https://arxiv.org/abs/2508.01148
- Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King, Yifei Zhang, 2 Aug 2025, Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning, https://arxiv.org/abs/2508.01251
- Kuiyuan DIng, Caili Guo, Yang Yang, Zhongtian Du, and Walid Saad, 4 Aug 2025, Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation, https://arxiv.org/abs/2508.02148
- Runkai Zheng, Vishnu Asutosh Dasu, Yinong Oliver Wang, Haohan Wang, Fernando De la Torre, 3 Aug 2025, Improving Noise Efficiency in Privacy-preserving Dataset Distillation, https://arxiv.org/abs/2508.01749
- Tobias J\"ulg, Wolfram Burgard, Florian Walter, 4 Aug 2025, Refined Policy Distillation: From VLA Generalists to RL Experts, https://arxiv.org/abs/2503.05833
- Chaoyang Gao, Xiang Chen, Jiyu Wang, Jibin Wang, Guang Yang, 30 Jul 2025, Resource-Efficient Automatic Software Vulnerability Assessment via Knowledge Distillation and Particle Swarm Optimization, https://arxiv.org/abs/2508.02840
- Jiahui Bai, Hai Dong, A. K. Qin, 5 Aug 2025, On the Fast Adaptation of Delayed Clients in Decentralized Federated Learning: A Centroid-Aligned Distillation Approach, https://arxiv.org/abs/2508.02993
- Hyung Gun Chi, Florian Pesce, Wonil Chang, Oggi Rudovic, Arturo Argueta, Stefan Braun, Vineet Garg, Ahmed Hussen Abdelaziz, 4 Aug 2025, Adaptive Knowledge Distillation for Device-Directed Speech Detection, https://arxiv.org/abs/2508.02801
- Jisoo Kim, Wooseok Seo, Junwan Kim, Seungho Park, Sooyeon Park, Youngjae Yu, 5 Aug 2025, V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models, https://arxiv.org/abs/2508.03254
- Seyedhamidreza Mousavi, Seyedali Mousavi and Masoud Daneshtalab, 5 Aug 2025, ProARD: progressive adversarial robustness distillation: provide wide range of robust students, https://arxiv.org/abs/2506.07666
- Ryota Ikeda, 5 Aug 2025, Do GNN-based QEC Decoders Require Classical Knowledge? Evaluating the Efficacy of Knowledge Distillation from MWPM, https://arxiv.org/abs/2508.03782
- Sriram Mandalika, Lalitha V, 6 Aug 2025, CoMAD: A Multiple-Teacher Self-Supervised Distillation Framework, https://arxiv.org/abs/2508.04816
- Haonan Shangguan, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, and Ge Yu, 7 Aug 2025, Resource-Limited Joint Multimodal Sentiment Reasoning and Classification via Chain-of-Thought Enhancement and Distillation, https://arxiv.org/abs/2508.05234
- Martin Weyssow, Chengran Yang, Junkai Chen, Ratnadira Widyasari, Ting Zhang, Huihui Huang, Huu Hung Nguyen, Yan Naing Tun, Tan Bui, Yikun Li, Ang Han Wei, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, David Lo, 7 Aug 2025, R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation, https://arxiv.org/abs/2504.04699
- Lingyuan Liu, Mengxiang Zhang, 8 Aug 2025, Less is More: Selective Reflection for Compatible and Efficient Knowledge Distillation in Large Language Models, https://arxiv.org/abs/2508.06135
- Lucas Caccia, Alan Ansell, Edoardo Ponti, Ivan Vuli\'c, Alessandro Sordoni, 8 Aug 2025, Training Plug-n-Play Knowledge Modules with Deep Context Distillation, https://arxiv.org/abs/2503.08727
- Shibin Su, Guoqiang Liang, De Cheng, Shizhou Zhang, Lingyan Ran, Yanning Zhang, 12 Aug 2025, Multi-level Collaborative Distillation Meets Global Workspace Model: A Unified Framework for OCIL, https://arxiv.org/abs/2508.08677
- Jinlin Xiang, Minho Choi, Yubo Zhang, Zhihao Zhou, Arka Majumdar, Eli Shlizerman, 11 Aug 2025, Neural Tangent Knowledge Distillation for Optical Convolutional Networks, https://arxiv.org/abs/2508.08421
- Kristian Miok, Blaz \v{S}krlj, Daniela Zaharie, and Marko Robnik \v{S}ikonja, 30 Jul 2025, TT-XAI: Trustworthy Clinical Text Explanations via Keyword Distillation and LLM Reasoning, https://arxiv.org/abs/2508.08273
- Xiaojun Wu, Xiaoguang Jiang, Huiyang Li, Jucai Zhai, Dengfeng Liu, Qiaobo Hao, Huang Liu, Zhiguo Yang, Ji Xie, Ninglun Gu, Jin Yang, Kailai Zhang, Yelun Bao, Jun Wang, 13 Aug 2025, Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning, https://arxiv.org/abs/2508.09883
- Aman Anand, Elyas Rashno, Amir Eskandari, Farhana Zulkernine, 12 Aug 2025, Depth-Guided Self-Supervised Human Keypoint Detection via Cross-Modal Distillation, https://arxiv.org/abs/2410.14700
- Van Duc Cuong, Ta Dinh Tam, Tran Duc Chinh and Nguyen Thi Hanh, 10 Aug 2025, FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning, https://arxiv.org/abs/2508.07264
- Durgesh Mishra, Rishabh Uikey, 15 Aug 2025, Unified Knowledge Distillation Framework: Fine-Grained Alignment and Geometric Relationship Preservation for Deep Face Recognition, https://arxiv.org/abs/2508.11376
- Siyamalan Manivannan, 15 Aug 2025, Semi-Supervised Learning with Online Knowledge Distillation for Skin Lesion Classification, https://arxiv.org/abs/2508.11511
- Thinh Dao, Khoa D Doan, Kok-Seng Wong, 14 Aug 2025, Clean-Label Physical Backdoor Attacks with Data Distillation, https://arxiv.org/abs/2407.19203
- Manning Zhu, Songtao Guo, Pengzhan Zhou, Yansong Ning, Chang Han, Dewen Qiao, 18 Aug 2025, FedSODA: Federated Fine-tuning of LLMs via Similarity Group Pruning and Orchestrated Distillation Alignment, https://arxiv.org/abs/2508.12727
- Taeheon Kim, San Kim, Minhyuk Seo, Dongjae Jeon, Wonje Jeong, Jonghyun Choi, 18 Aug 2025, Multi-Level Knowledge Distillation and Dynamic Self-Supervised Learning for Continual Learning, https://arxiv.org/abs/2508.12692
- Xinhe Li, Jiajun Liu, Peng Wang, 18 Aug 2025, Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction, https://arxiv.org/abs/2508.13037
- Anshul Ahluwalia, Payman Behnam, Rohit Das, Alind Khare, Biswadeep Chakraborty, Pan Li, Alexey Tumanov, 16 Aug 2025, STRIDE: Structure and Embedding Distillation with Attention for Graph Neural Networks, https://arxiv.org/abs/2310.15938
- Nikita Gushchin, David Li, Daniil Selikhanovych, Evgeny Burnaev, Dmitry Baranchuk, Alexander Korotin, 18 Aug 2025, Inverse Bridge Matching Distillation, https://arxiv.org/abs/2502.01362
- Jihyun Lim, Junhyuk Jo, Tuo Zhang, Sunwoo Lee, 17 Aug 2025, Enabling Weak Client Participation via On-device Knowledge Distillation in Heterogenous Federated Learning, https://arxiv.org/abs/2503.11151
- Simardeep Singh, 15 Aug 2025, From Teacher to Student: Tracking Memorization Through Model Distillation, https://arxiv.org/abs/2506.16170
- Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Zhenqiang Huang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, Chenghao Zhu, Yi Yao, Shuying Fan, Xiaowan Li, Tiannan Wang, Pai Liu, King Zhu, He Zhu, Dingfeng Shi, Piaohong Wang, Yeyi Guan, Xiangru Tang, Minghao Liu, Yuchen Eleanor Jiang, Jian Yang, Jiaheng Liu, Ge Zhang, Wangchunshu Zhou, 6 Aug 2025, Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL, https://arxiv.org/abs/2508.13167
- Yunxiang Yang, Ningning Xu, Jidong J. Yang, 19 Aug 2025, Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference, https://arxiv.org/abs/2508.13439
- Wenhao Li, Xiu Su, Jingyi Wu, Feng Yang, Yang Liu, Yi Chen, Shan You, Chang Xu, 19 Aug 2025, Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation, https://arxiv.org/abs/2507.04680
- Ye Su, Hezhe Qiao, Wei Huang, Lin Chen, 12 Aug 2025, Toward Generalist Semi-supervised Regression via Decoupled Representation Distillation, https://arxiv.org/abs/2508.14082
- Ahmed Mujtaba, Gleb Radchenko, Radu Prodan, Marc Masana, 20 Aug 2025, Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data, https://arxiv.org/abs/2508.14769
- Suleyman Olcay Polat, Poli A. Nemkova, Mark V. Albert, 20 Aug 2025, Synthetic Adaptive Guided Embeddings (SAGE): A Novel Knowledge Distillation Method, https://arxiv.org/abs/2508.14783
- Yichen Li, Xiuying Wang, Wenchao Xu, Haozhao Wang, Yining Qi, Jiahua Dong, Ruixuan Li, 20 Aug 2025, Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning, https://arxiv.org/abs/2507.10348
- Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Guzelant, Aysegul Dundar, 20 Aug 2025, Identity Preserving 3D Head Stylization with Multiview Score Distillation, https://arxiv.org/abs/2411.13536
- Yifan Zhang, Junhui Hou, 20 Aug 2025, Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?, https://arxiv.org/abs/2412.08973
- Jiacheng Xie, Ziyang Zhang, Biplab Poudel, Congyu Guo, Yang Yu, Guanghui An, Xiaoting Tang, Lening Zhao, Chunhui Xu, Dong Xu, 19 Aug 2025, TOM: An Open-Source Tongue Segmentation Method with Multi-Teacher Distillation and Task-Specific Data Augmentation, https://arxiv.org/abs/2508.14932
- Aqib Nazir Mir, Danish Raza Rizvi, 21 Aug 2025, Explainable Knowledge Distillation for Efficient Medical Image Classification, https://arxiv.org/abs/2508.15251
- Ruiqi Wang, Zezhou Yang, Cuiyun Gao, Xin Xia, Qing Liao, 21 Aug 2025, An Empirical Study of Knowledge Distillation for Code Understanding Tasks, https://arxiv.org/abs/2508.15423
- Nicholas Mehlman, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Kelvin Niu, Alexander H. Miller, Shagun Sodhani, 29 Jul 2025, Scaling and Distilling Transformer Models for sEMG, https://arxiv.org/abs/2507.22094
- Chen Zhang, Qiuchi Li, Dawei Song, Zheyu Ye, Yan Gao, Yan Hu, 30 Jul 2025, Towards the Law of Capacity Gap in Distilling Language Models, https://arxiv.org/abs/2311.07052
- Maciej Mozolewski, Szymon Bobek, Grzegorz J. Nalepa, 3 Aug 2025, From SHAP to Rules: Distilling Expert Knowledge from Post-hoc Model Explanations in Time Series Classification, https://arxiv.org/abs/2508.01687
- Sisuo Lyu, Siru Zhong, Weilin Ruan, Qingxiang Liu, Qingsong Wen, Hui Xiong, Yuxuan Liang, 3 Aug 2025, OccamVTS: Distilling Vision Models to 1% Parameters for Time Series Forecasting, https://arxiv.org/abs/2508.01727
- Diana-Nicoleta Grigore, Neelu Madan, Andreas Mogelmose, Thomas B. Moeslund, Radu Tudor Ionescu, 5 Aug 2025, SlotMatch: Distilling Temporally Consistent Object-Centric Representations for Unsupervised Video Segmentation, https://arxiv.org/abs/2508.03411
- Connor Wilhelm, Dan Ventura, 12 Aug 2025, Distilling Reinforcement Learning into Single-Batch Datasets, https://arxiv.org/abs/2508.09283
- Abdul Matin, Tanjim Bin Faruk, Shrideep Pallickara, Sangmi Lee Pallickara, 13 Aug 2025, HyperKD: Distilling Cross-Spectral Knowledge in Masked Autoencoders via Inverse Domain Shift with Spatial-Aware Masking and Specialized Loss, https://arxiv.org/abs/2508.09453
- Duygu Altinok, 18 Aug 2025, Whispering Context: Distilling Syntax and Semantics for Long Speech Transcripts, https://arxiv.org/abs/2508.13376
- Haydn Thomas Jones, Natalie Maus, Josh Magnus Ludan, Maggie Ziyu Huan, Jiaming Liang, Marcelo Der Torossian Torres, Jiatao Liang, Zachary Ives, Yoseph Barash, Cesar de la Fuente-Nunez, Jacob R. Gardner, Mark Yatskar, 14 Aug 2025, A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design, https://arxiv.org/abs/2508.10899
- Max Rehman Linder, 14 Aug 2025, KL-based self-distillation for large language models, https://arxiv.org/abs/2508.15807
- Stephen Ekaputra Limantoro, 22 Aug 2025, Parameter-Free Logit Distillation via Sorting Mechanism, https://arxiv.org/abs/2508.16544
- Rosni Vasu, Chandrayee Basu, Bhavana Dalvi Mishra, Cristina Sarasua, Peter Clark, Abraham Bernstein, 21 Aug 2025, HypER: Literature-grounded Hypothesis Generation and Distillation with Provenance, https://arxiv.org/abs/2506.12937
- Khoi Do, Binh-Son Hua, 21 Aug 2025, Text-to-3D Generation using Jensen-Shannon Score Distillation, https://arxiv.org/abs/2503.10660
- Gousia Habib, Tausifa Jan Saleem, Ishfaq Ahmad Malik, Brejesh Lall, 21 Aug 2025, LIB-KD: Teaching Inductive Bias for Efficient Vision Transformer Distillation and Compression, https://arxiv.org/abs/2310.00369
- Lei Jiang, Wen Ge, Niels Cariou-Kotlarek, Mingxuan Yi, Po-Yu Chen, Lingyi Yang, Francois Buet-Golfouse, Gaurav Mittal, Hao Ni, 23 Aug 2025, Sig-DEG for Distillation: Making Diffusion Models Faster and Lighter, https://arxiv.org/abs/2508.16939
- Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo, 25 Aug 2025, FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation, https://arxiv.org/abs/2508.17868
- Hariprasath Govindarajan, Maciej K. Wozniak, Marvin Klingner, Camille Maurice, B Ravi Kiran, Senthil Yogamani, 25 Aug 2025, CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation, https://arxiv.org/abs/2503.09878
- Xuhui Fan, Zhangkai Wu and Hongyu Wu, 15 Aug 2025, A Survey on Pre-Trained Diffusion Model Distillations, https://arxiv.org/abs/2502.08364
- Justin Kur, Kaiqi Zhao, 4 Sep 2025, Data-Augmented Quantization-Aware Knowledge Distillation, https://arxiv.org/abs/2509.03850
- Hong Ye Tan, Emma Slade, 3 Sep 2025, Dataset Distillation as Pushforward Optimal Quantization, https://arxiv.org/abs/2501.07681
- Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Zhiqi Shen, 5 Sep 2025, Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation, https://arxiv.org/abs/2408.05748
- Md Anwar Hossen, Fatema Siddika, Wensheng Zhang, Anuj Sharma, and Ali Jannesari, 26 Aug 2025, FedProtoKD: Dual Knowledge Distillation with Adaptive Class-wise Prototype Margin for Heterogeneous Federated Learning, https://arxiv.org/abs/2508.19009
- Viktor N. Zhuravlev and Artur R. Khairullin and Ernest A. Dyagin and Alena N. Sitkina and Nikita I. Kulin, 26 Aug 2025, Automatic Prompt Optimization with Prompt Distillation, https://arxiv.org/abs/2508.18992
- Yihe Tang, Wenlong Huang, Yingke Wang, Chengshu Li, Roy Yuan, Ruohan Zhang, Jiajun Wu, Li Fei-Fei, 25 Aug 2025, UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation, https://arxiv.org/abs/2506.09284
- Wangyang Ying, Jinghan Zhang, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Kunpeng Liu, Chandan K. Reddy, Yanjie Fu, 27 Aug 2025, Data-Efficient Symbolic Regression via Foundation Model Distillation, https://arxiv.org/abs/2508.19487
- Felix N\"utzel, Mischa Dombrowski, Bernhard Kainz, 27 Aug 2025, Ontology-Based Concept Distillation for Radiology Report Retrieval and Labeling, https://arxiv.org/abs/2508.19915
- Yanfei Li, Teng Yin, Wenyi Shang, Jingyu Liu, Xi Wang, Kaiyang Zhao, 27 Aug 2025, PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis, https://arxiv.org/abs/2503.04836
- Suyoung Kim, Seonguk Park, Junhoo Lee, Nojun Kwak, 27 Aug 2025, The Role of Teacher Calibration in Knowledge Distillation, https://arxiv.org/abs/2508.20224
- Leyang Wang, Mingtian Zhang, Zijing Ou and David Barber, 28 Aug 2025, VarDiU: A Variational Diffusive Upper Bound for One-Step Diffusion Distillation, https://arxiv.org/abs/2508.20646
- Ayaka Tsutsumi, Guang Li, Ren Togo, Takahiro Ogawa, Satoshi Kondo, Miki Haseyama, 28 Aug 2025, Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification, https://arxiv.org/abs/2508.20461
- Jiahao Xiao, Jiangming Liu, 28 Aug 2025, Adaptive Federated Distillation for Multi-Domain Non-IID Textual Data, https://arxiv.org/abs/2508.20557
- Kang-Hyun Lee, Faez Ahmed, 27 Aug 2025, MicroLad: 2D-to-3D Microstructure Reconstruction and Generation via Latent Diffusion and Score Distillation, https://arxiv.org/abs/2508.20138
- Holger Severin Bovbjerg (1), Jan {\O}stergaard (1), Jesper Jensen (1, 2), Shinji Watanabe (3), Zheng-Hua Tan ((1) Aalborg University (2) Eriksholm Research Centre, (3) Carnegie Mellon University), 28 Aug 2025, Learning Robust Spatial Representations from Binaural Audio through Feature Distillation, https://arxiv.org/abs/2508.20914
- Yifei Yuan, Jiatong Li, Weijia Zhang, Mohammad Aliannejadi, Evangelos Kanoulas, Renjun Hu, 29 Aug 2025, Summarize-Exemplify-Reflect: Data-driven Insight Distillation Empowers LLMs for Few-shot Tabular Classification, https://arxiv.org/abs/2508.21561
- Can Cui, Zilong Fu, Penghe Huang, Yuanyuan Li, Wu Deng, Dongyan Li, 30 Aug 2025, An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment, https://arxiv.org/abs/2509.00560
- Armin Had\v{z}i\'c, Milan Papez, Tom\'a\v{s} Pevn\'y, 1 Sep 2025, Distillation of a tractable model from the VQ-VAE, https://arxiv.org/abs/2509.01400
- Xinlu Zhang, Na Yan, Yang Su, Yansha Deng, Toktam Mahmoodi, 1 Sep 2025, Communication-Aware Knowledge Distillation for Federated LLM Fine-Tuning over Wireless Networks, https://arxiv.org/abs/2509.01750
- Long Jiang, Yang Yang, Ting Fong May Chui, Morgan Thornwell, Hoshin Vijai Gupta, 2 Sep 2025, Knowledge distillation as a pathway toward next-generation intelligent ecohydrological modeling systems, https://arxiv.org/abs/2509.01972
- Hainan Wang, Mehdi Hosseinzadeh, Reza Rawassizadeh, 31 Aug 2025, TinyMusician: On-Device Music Generation with Knowledge Distillation and Mixed Precision Quantization, https://arxiv.org/abs/2509.00914
- Kanchon Gharami, Hansaka Aluvihare, Shafika Showkat Moni, Berker Pek\"oz, 31 Aug 2025, Clone What You Can't Steal: Black-Box LLM Replication via Logit Leakage and Distillation, https://arxiv.org/abs/2509.00973
- Xingyu Su, Xiner Li, Masatoshi Uehara, Sunwoo Kim, Yulai Zhao, Gabriele Scalia, Ehsan Hajiramezanali, Tommaso Biancalani, Degui Zhi, Shuiwang Ji, 30 Aug 2025, Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design, https://arxiv.org/abs/2507.00445
- Shiva Raj Pokhrel, Deol Satish, Jonathan Kua and Anwar Walid, 2 Sep 2025, Distilling Large Language Models for Network Active Queue Management, https://arxiv.org/abs/2501.16734
- Mingfeng Lin, 3 Sep 2025, Deep Self-knowledge Distillation: A hierarchical supervised learning for coronary artery segmentation, https://arxiv.org/abs/2509.03173
- Zongheng Guo, Tao Chen, Manuela Ferrario, 8 Sep 2025, QualityFM: a Multimodal Physiological Signal Foundation Model with Self-Distillation for Signal Quality Challenges in Critically Ill Patients, https://arxiv.org/abs/2509.06516
- Saghar Ganji, Mohammad Naisipour, Alireza Hassani, Arash Adib, 7 Sep 2025, Distillation of CNN Ensemble Results for Enhanced Long-Term Prediction of the ENSO Phenomenon, https://arxiv.org/abs/2509.06227
- Eduardo Fernandes Montesuma, 8 Sep 2025, KD$^{2}$M: A unifying framework for feature knowledge distillation, https://arxiv.org/abs/2504.01757
- Devansh, Sep 2025, The Chocolate Milk Cult’s Guide to Inference Scaling for AI Models: How to Reduce the costs of Running LLMs https://machine-learning-made-simple.medium.com/the-chocolate-milk-cults-guide-to-inference-scaling-for-ai-models-50aa2290eb50 (Deep analysis of using many progressive optimizations to real-life LLM inference.)
- Xinyu Zhang and Changzhi Zhou and Linmei Hu and Luhao Zhang and Xiancai Chen and Haomin Fu and Yang Yang and Mengdi Zhang, 9 Sep 2025, SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs, https://arxiv.org/abs/2509.07858
- Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, 9 Sep 2025, Involution and BSConv Multi-Depth Distillation Network for Lightweight Image Super-Resolution, https://arxiv.org/abs/2503.14779
- Xin Jin, Bohan Li, BAAO Xie, Wenyao Zhang, Jinming Liu, Ziqiang Li, Tao Yang, Wenjun Zeng, 9 Sep 2025, Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback, https://arxiv.org/abs/2402.02346
- Yang Chen, Shuai Fu, Yu Zhang, 12 Sep 2025, MoPD: Mixture-of-Prompts Distillation for Vision-Language Models, https://arxiv.org/abs/2412.19087
- Haipeng Liu, Ting Long, Jing Fu, 11 Sep 2025, Constructing a Question-Answering Simulator through the Distillation of LLMs, https://arxiv.org/abs/2509.09226
- Seung Gyu Jeong, Seong Eun Kim, 11 Sep 2025, Adaptive Knowledge Distillation using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification, https://arxiv.org/abs/2509.09262
- Zhanming Shen, Zeyu Qin, Zenan Huang, Hao Chen, Jiaqi Hu, Yihong Zhuang, Guoshan Lu, Gang Chen, Junbo Zhao, 11 Sep 2025, Merge-of-Thought Distillation, https://arxiv.org/abs/2509.08814
- Anas Anwarul Haq Khan, Utkarsh Verma, Ganesh Ramakrishnan, 11 Sep 2025, Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization, https://arxiv.org/abs/2504.21831
- Davide Ettori, Nastaran Darabi, Sureshkumar Senthilkumar, and Amit Ranjan Trivedi, 19 Sep 2025, RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation, https://arxiv.org/abs/2509.15724
- Qiaolin Wang, Xilin Jiang, Linyang He, Junkai Wu, Nima Mesgarani, 19 Sep 2025, SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models, https://arxiv.org/abs/2509.15661
- Luca Della Libera, Cem Subakan, Mirco Ravanelli, 19 Sep 2025, FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation, https://arxiv.org/abs/2509.16195
- Zhangkai Wu, Xuhui Fan, Hongyu Wu, Longbing Cao, 18 Sep 2025, SCoT: Straight Consistent Trajectory for Pre-Trained Diffusion Model Distillations, https://arxiv.org/abs/2502.16972
- Xiang Xue, Yatu Ji, Qing-dao-er-ji Ren, Bao Shi, Min Lu, Nier Wu, Xufei Zhuang, Haiteng Xu, Gan-qi-qi-ge Cha, 16 Sep 2025, iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining, https://arxiv.org/abs/2509.12553
- Florian Zager and Hamza A. A. Gardi, 15 Sep 2025, GhostNetV3-Small: A Tailored Architecture and Comparative Study of Distillation Strategies for Tiny Images, https://arxiv.org/abs/2509.12380
- Liming Lu, Shuchao Pang, Xu Zheng, Xiang Gu, Anan Du, Yunhuai Liu, Yongbin Zhou, 16 Sep 2025, CIARD: Cyclic Iterative Adversarial Robustness Distillation, https://arxiv.org/abs/2509.12633
- Robin Vujanic, Thomas Rueckstiess, 16 Sep 2025, LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations, https://arxiv.org/abs/2509.12539
- Lin Luo, Xin Wang, Bojia Zi, Shihao Zhao, Xingjun Ma, Yu-Gang Jiang, 16 Sep 2025, Adversarial Prompt Distillation for Vision-Language Models, https://arxiv.org/abs/2411.15244
- Jing Zou, Shungeng Zhang, Meikang Qiu, Chong Li, 15 Sep 2025, DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks, https://arxiv.org/abs/2509.11525
- Chenghan Li and Garnet Kin-Lic Chan, 13 Sep 2025, Predictive Free Energy Simulations Through Hierarchical Distillation of Quantum Hamiltonians, https://arxiv.org/abs/2509.10967
- Tong Wang, K. Sudhir and Dat Hong, 13 Sep 2025, Can Advanced LLMs Coach Smaller LLMs? Knowledge Distillation for Goal-Oriented Dialogs, https://arxiv.org/abs/2408.07238
- Amirhossein Abaskohi, Spandana Gella, Giuseppe Carenini, Issam H. Laradji, 13 Sep 2025, FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering, https://arxiv.org/abs/2412.07030
- Yuanjie Lyu, Chengyu Wang, Jun Huang, Tong Xu, 12 Sep 2025, From Correction to Mastery: Reinforced Distillation of Large Language Model Agents, https://arxiv.org/abs/2509.14257
- Yihan Cao, Yanbin Kang, Zhengming Xing, Ruijie Jiang, 18 Sep 2025, Delta Knowledge Distillation for Large Language Models, https://arxiv.org/abs/2509.14526
- Enzhi Wang, Qicheng Li, Zhiyuan Tang, Yuhang Jia, 18 Sep 2025, Cross-Modal Knowledge Distillation for Speech Large Language Models, https://arxiv.org/abs/2509.14930
- Botao Zhu, Jeslyn Wang, Dusit Niyato, Xianbin Wang, 9 Sep 2025, Trust Semantics Distillation for Collaborator Selection via Memory-Augmented Agentic AI, https://arxiv.org/abs/2509.08151
- Matthew Nolan and Lina Yao and Robert Davidson, 10 Sep 2025, Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition, https://arxiv.org/abs/2509.08225
- Chuanqi Cheng, Jian Guan, Wei Wu, Rui Yan, 10 Sep 2025, Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation, https://arxiv.org/abs/2504.02438
- Qin Shi, Amber Yijia Zheng, Qifan Song, Raymond A. Yeh, 2 Oct 2025, Knowledge Distillation Detection for Open-weights Models, https://arxiv.org/abs/2510.02302
- Anna Kuzina, Maciej Pioro, Paul N. Whatmough, Babak Ehteshami Bejnordi, 2 Oct 2025, KaVa: Latent Reasoning via Compressed KV-Cache Distillation, https://arxiv.org/abs/2510.02312
- Weijia Dou, Xu Zhang, Yi Bin, Jian Liu, Bo Peng, Guoqing Wang, Yang Yang, and Heng Tao Shen, 2 Oct 2025, GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation, https://arxiv.org/abs/2510.02186
- Dhaathri Vijay and Anandaswarup Vadapalli, 2 Oct 2025, The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact, https://arxiv.org/abs/2509.23990
- Israel Mason-Williams, Gabryel Mason-Williams and Helen Yannakoudakis, 14 Oct 2025, Rethinking Knowledge Distillation: A Data Dependent Regulariser With a Negative Asymmetric Payoff, https://arxiv.org/abs/2510.12615
- Jiwan Kim, Kibum Kim, Sangwoo Seo, Chanyoung Park, 14 Oct 2025, CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs, https://arxiv.org/abs/2510.12184
- Eric He, Akash Gupta, Adian Liusie, Vatsal Raina, Piotr Molenda, Shirom Chabra, Vyas Raina, 13 Oct 2025, Embedding the Teacher: Distilling vLLM Preferences for Scalable Image Retrieval, https://arxiv.org/abs/2510.12014
- Chengyu Li and Debo Cheng and Guixian Zhang and Yi Li and Shichao Zhang, 14 Oct 2025, Toward Fair Graph Neural Networks Via Dual-Teacher Knowledge Distillation, https://arxiv.org/abs/2412.00382
- Huitao Yang, Guanting Chen, 30 Sep 2025, In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks, https://arxiv.org/abs/2510.00347
- Xiangyu Wen, Junhua Huang, Zeju Li, Min Li, Jianyuan Zhong, Zhijian Xu, Mingxuan Yuan, Yongxiang Huang, Qiang Xu, 1 Oct 2025, Reasoning Scaffolding: Distilling the Flow of Thought from LLMs, https://arxiv.org/abs/2509.23619
- Jiayi Huang, Sangwoo Park, Nicola Paoletti, and Osvaldo Simeone, 1 Oct 2025, Distilling Calibration via Conformalized Credal Inference, https://arxiv.org/abs/2501.06066
- Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Dongseop Kim, Sung Ju Hwang, 1 Oct 2025, PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models, https://arxiv.org/abs/2506.00910
- Feiyang Fu, Tongxian Guo, Zhaoqiang Liu, 24 Sep 2025, Learnable Sampler Distillation for Discrete Diffusion Models, https://arxiv.org/abs/2509.19962
- Zhengpeng Xie, Jiahang Cao, Changwei Wang, Fan Yang, Marco Hutter, Qiang Zhang, Jianxiong Zhang, Renjing Xu, 24 Sep 2025, Representation Convergence: Mutual Distillation is Secretly a Form of Regularization, https://arxiv.org/abs/2501.02481
- Yuqi Jin, Zhenhao Shuai, Zihan Hu, Weiteng Zhang, Weihao Xie, Jianwei Shuai, Xian Shen, Zhen Feng, 24 Sep 2025, CANDLE: A Cross-Modal Agentic Knowledge Distillation Framework for Interpretable Sarcopenia Diagnosis, https://arxiv.org/abs/2507.21179
- Haiduo Huang, Jiangcheng Song, Yadong Zhang, Pengju Ren, 28 Oct 2025, SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs, https://arxiv.org/abs/2510.24021
- Quan Li, Wenchao Yu, Suhang Wang, Minhua Lin, Lingwei Chen, Wei Cheng, Haifeng Chen, 23 Oct 2025, xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion, https://arxiv.org/abs/2510.20651
- Saif Ur Rehman Khan, Muhammad Nabeel Asim, Sebastian Vollmer, Andreas Dengel, 23 Oct 2025, Dynamic Weight Adjustment for Knowledge Distillation: Leveraging Vision Transformer for High-Accuracy Lung Cancer Detection and Real-Time Deployment, https://arxiv.org/abs/2510.20438
- Nimrod Berman, Ilan Naiman, Moshe Eliasof, Hedi Zisling, Omri Azencot, 23 Oct 2025, One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling, https://arxiv.org/abs/2505.13358
- Penghao Wang, Yuhao Zhou, Mengxuan Wu, Panpan Zhang, Zhangyang Wang, Kai Wang, 23 Oct 2025, Data Efficient Any Transformer-to-Mamba Distillation via Attention Bridge, https://arxiv.org/abs/2510.19266
- Amir Jalilifard, Anderson de Rezende Rocha, Marcos Medeiros Raimundo, 20 Oct 2025, Reasoning Distillation and Structural Alignment for Improved Code Generation, https://arxiv.org/abs/2510.17598
- Donghyeok Shin, Yeongmin Kim, Suhyeon Jo, Byeonghu Na, Il-Chul Moon, 13 Oct 2025, AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution, https://arxiv.org/abs/2510.15982
- Ziming Dai, Tuo Zhang, Fei Gao, Xingyi Cai, Xiaofei Wang, Cheng Zhang, Wenyu Wang, Chengjie Zang, 14 Oct 2025, Stratos: An End-to-End Distillation Pipeline for Customized LLMs under Distributed Cloud Environments, https://arxiv.org/abs/2510.15992
- SeongKu Kang, Jianxun Lian, Dongha Lee, Wonbin Kweon, Sanghwan Jang, Jaehyun Lee, Jindong Wang, Xing Xie, Hwanjo Yu, 17 Oct 2025, BPL: Bias-adaptive Preference Distillation Learning for Recommender System, https://arxiv.org/abs/2510.16076
- Pingzhi Li, Morris Yu-Chao Huang, Zhen Tan, Qingquan Song, Jie Peng, Kai Zou, Yu Cheng, Kaidi Xu, Tianlong Chen, 19 Oct 2025, Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures, https://arxiv.org/abs/2510.16968
- Duo Su, Huyu Wu, Huanran Chen, Yiming Shi, Yuzhu Wang, Xi Ye, Jun Zhu, 20 Oct 2025, Diffusion Models as Dataset Distillation Priors, https://arxiv.org/abs/2510.17421
- Asmita Mohanty, Gezheng Kang, Lei Gao, Murali Annavaram, 19 Oct 2025, DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge, https://arxiv.org/abs/2510.16716
- Guiquan Sun, Xikun Zhang, Jingchao Ni, Dongjin Song, 19 Oct 2025, HERO: Heterogeneous Continual Graph Learning via Meta-Knowledge Distillation, https://arxiv.org/abs/2505.17458
- Pingzhi Li, Zhen Tan, Mohan Zhang, Huaizhi Qu, Huan Liu, Tianlong Chen, 19 Oct 2025, DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation, https://arxiv.org/abs/2505.19504
- Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C.-W. Phan, 19 Oct 2025, Seeing in the Dark: A Teacher-Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive Learning, https://arxiv.org/abs/2502.03724
- Kai Yu and Binbin Cai and Song Lin, 20 Sep 2025, Knowledge Distillation for Variational Quantum Convolutional Neural Networks on Heterogeneous Data, https://arxiv.org/abs/2509.16699
- David Anton, Henning Wessels, Ulrich R\"omer, Alexander Henkes, Jorge-Humberto Urrea-Quintero, 25 Oct 2025, Uncertainty quantification in model discovery by distilling interpretable material constitutive models from Gaussian process posteriors, https://arxiv.org/abs/2510.22345
- Gurpreet Singh, Keshav Sood, P. Rajalakshmi, and Yong Xiang, 27 Oct 2025, Sentinel: Dynamic Knowledge Distillation for Personalized Federated Intrusion Detection in Heterogeneous IoT Networks, https://arxiv.org/abs/2510.23019
- Seonghoon Yu, Dongjun Nam, Dina Katabi, Jeany Son, 26 Oct 2025, Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity, https://arxiv.org/abs/2510.22480
- Rupasree Dey, Abdul Matin, Everett Lewark, Tanjim Bin Faruk, Andrei Bachinin, Sam Leuthold, M. Francesca Cotrufo, Shrideep Pallickara, Sangmi Lee Pallickara, 27 Oct 2025, DeepSalt: Bridging Laboratory and Satellite Spectra through Domain Adaptation and Knowledge Distillation for Large-Scale Soil Salinity Estimation, https://arxiv.org/abs/2510.23124
- Rongrong Xie, Yizhou Xu, Guido Sanguinetti, 15 Oct 2025, Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning, https://arxiv.org/abs/2510.13182
- Zexin Wang, Lin Shi, Haoyu Wu, Junru Luo, Xiangzeng Kong, Jun Qi, 15 Oct 2025, DistilCLIP-EEG: Enhancing Epileptic Seizure Detection Through Multi-modal Learning and Knowledge Distillation, https://arxiv.org/abs/2510.13497
- Hengxiang Zhang, Hyeong Kyu Choi, Sharon Li, Hongxin Wei, 15 Oct 2025, Detecting Distillation Data from Reasoning Models, https://arxiv.org/abs/2510.04850
- Shehtab Zaman, Chengyan Liu, Kenneth Chiu, 25 Sep 2025, Score-based Idempotent Distillation of Diffusion Models, https://arxiv.org/abs/2509.21470
- Hua Yuan, Ning Xu, Xin Geng, Yong Rui, 26 Sep 2025, Enriching Knowledge Distillation with Intra-Class Contrastive Learning, https://arxiv.org/abs/2509.22053
- Zahid Iqbal, 26 Sep 2025, Adaptive Dual-Mode Distillation with Incentive Schemes for Scalable, Heterogeneous Federated Learning on Non-IID Data, https://arxiv.org/abs/2509.22507
- Jillian Xu, Dylan Zhou, Vinay Shukla, Yang Yang, Junrui Ruan, Shuhuai Lin, Wenfei Zou, Yinxiao Liu, Karthik Lakshmanan, 25 Sep 2025, Dual-Head Reasoning Distillation: Improving Classifier Accuracy with Train-Time-Only Reasoning, https://arxiv.org/abs/2509.21487
- Yicheng Jiang, Jin Yuan, Hua Yuan, Yao Zhang, Yong Rui, 26 Sep 2025, REFINE-CONTROL: A Semi-supervised Distillation Method For Conditional Image Generation, https://arxiv.org/abs/2509.22139
- Nikita Kornilov, David Li, Tikhon Mavrin, Aleksei Leonov, Nikita Gushchin, Evgeny Burnaev, Iaroslav Koshelev, Alexander Korotin, 26 Sep 2025, Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs), https://arxiv.org/abs/2509.22459
- Zhengxiao Li, Liming Lu, Xu Zheng, Siyuan Liang, Zhenghan Chen, Yongbin Zhou, Shuchao Pang, 26 Sep 2025, FERD: Fairness-Enhanced Data-Free Robustness Distillation, https://arxiv.org/abs/2509.20793
- Jingzhi Hu, Geoffrey Ye Li, 26 Sep 2025, Distillation-Enabled Knowledge Alignment Protocol for Semantic Communication in AI Agent Networks, https://arxiv.org/abs/2505.17030
- Khoa Trinh, Gaurav Menghani, Erik Vee, 7 Oct 2025, GUIDE: Guided Initialization and Distillation of Embeddings, https://arxiv.org/abs/2510.06502
- Jiqun Pan, Zhenke Duan, Jiani Tu, Anzhi Cheng, Yanqing Wang, 3 Oct 2025, Knowledge Graph-Guided Multi-Agent Distillation for Reliable Industrial Question Answering with Datasets, https://arxiv.org/abs/2510.06240
- Zhiyuan Wei, Xiaoxuan Yang, Jing Sun, Zijian Zhang, 8 Oct 2025, Distilling Lightweight Language Models for C/C++ Vulnerabilities, https://arxiv.org/abs/2510.06645
- Didrik Bergstr\"om and Deniz G\"und\"uz and Onur G\"unl\"u, 8 Oct 2025, Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Retrieval, https://arxiv.org/abs/2510.06868
- Arjun Krishnakumar, Rhea Sanjay Sukthanker, Hannan Javed Mahadik, Gabriela Kadlecov\'a, Vladyslav Moroshan, Timur Carstensen, Frank Hutter, Aaron Klein, 8 Oct 2025, Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation, https://arxiv.org/abs/2510.07227
- Xinyi Gao, Jingxi Zhang, Lijian Chen, Tong Chen, Lizhen Cui, Hongzhi Yin, 8 Oct 2025, Relational Database Distillation: From Structured Tables to Condensed Graph Data, https://arxiv.org/abs/2510.06980
- Brian B. Moser, Federico Raue, Sebastian Palacio, Stanislav Frolov and Andreas Dengel, 8 Oct 2025, Unlocking Dataset Distillation with Diffusion Models, https://arxiv.org/abs/2403.03881
- Flavio Giorgi, Matteo Silvestri, Cesare Campagnano, Fabrizio Silvestri, Gabriele Tolomei, 3 Oct 2025, Enhancing XAI Narratives through Multi-Narrative Refinement and Knowledge Distillation, https://arxiv.org/abs/2510.03134
- Zilai Li, 30 Sep 2025, Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model, https://arxiv.org/abs/2510.02390
- Justus Arweiler and Indra Jungjohann and Aparna Muraleedharan and Heike Leitte and Jakob Burger and Kerstin M\"unnemann and Fabian Jirasek and Hans Hasse, 20 Oct 2025, Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods, https://arxiv.org/abs/2510.18075
- Giovanni De Muri, Mark Vero, Robin Staab, Martin Vechev, 21 Oct 2025, Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation, https://arxiv.org/abs/2510.18541
- Gilles Audemard, Sylvie Coste-Marquis, Pierre Marquis, Mehdi Sabiri, Nicolas Szczepanski, 21 Oct 2025, A Rectification-Based Approach for Distilling Boosted Trees into Decision Trees, https://arxiv.org/abs/2510.18615
- Philippe Formont, Maxime Darrin, Banafsheh Karimian, Jackie CK Cheung, Eric Granger, Ismail Ben Ayed, Mohammadhadi Shateri, Pablo Piantanida, 21 Oct 2025, Learning Task-Agnostic Representations through Multi-Teacher Distillation, https://arxiv.org/abs/2510.18680
- Yongmin Lee, Hye Won Chung, 21 Oct 2025, CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder, https://arxiv.org/abs/2510.18583
- Yijun Quan, Zushu Li, Giovanni Montana, 21 Oct 2025, Efficient Verified Machine Unlearning For Distillation, https://arxiv.org/abs/2503.22539
- Xing Wei and Chunchun Chen and Rui Fan and Xiaofeng Cao and Sourav Medya and Wei Ye, 21 Oct 2025, Preference-driven Knowledge Distillation for Few-shot Node Classification, https://arxiv.org/abs/2510.10116
- Zhangchi Zhu, Wei Zhang, 25 Sep 2025, Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems, https://arxiv.org/abs/2509.20989
- Hmrishav Bandyopadhyay, Rahim Entezari, Jim Scott, Reshinth Adithyan, Yi-Zhe Song, Varun Jampani, 25 Sep 2025, SD3.5-Flash: Distribution-Guided Distillation of Generative Flows, https://arxiv.org/abs/2509.21318
- Matthieu Zimmer, Xiaotong Ji, Tu Nguyen, Haitham Bou Ammar, 26 Sep 2025, Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective, https://arxiv.org/abs/2509.22921
- Raviteja Anantha, Soheil Hor, Teodor Nicola Antoniu, Layne C. Price, 27 Sep 2025, NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning, https://arxiv.org/abs/2509.23252
- Sungmin Cha, Kyunghyun Cho, 28 Sep 2025, Why Alignment Must Precede Distillation: A Minimal Working Explanation, https://arxiv.org/abs/2509.23667
- Aasheesh Singh, Vishal Vaddina, Dagnachew Birru, 29 Sep 2025, ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation, https://arxiv.org/abs/2509.25100
- Jianzhi Yan, Le Liu, Youcheng Pan, Shiwei Chen, Yang Xiang, Buzhou Tang, 28 Sep 2025, Towards Efficient CoT Distillation: Self-Guided Rationale Selector for Better Performance with Fewer Rationales, https://arxiv.org/abs/2509.23574
- Yukun Chen, Boheng Li, Yu Yuan, Leyi Qi, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren, 28 Sep 2025, Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack, https://arxiv.org/abs/2509.23871
- Mingyuan Zhou, Yi Gu, Huangjie Zheng, Liangchen Song, Guande He, Yizhe Zhang, Wenze Hu, Yinfei Yang, 29 Sep 2025, Score Distillation of Flow Matching Models, https://arxiv.org/abs/2509.25127
- Md Mubtasim Ahasan, Md Fahim, Tasnim Mohiuddin, A K M Mahbubur Rahman, Aman Chadha, Tariq Iqbal, M Ashraful Amin, Md Mofijul Islam, Amin Ahsan Ali, 29 Sep 2025, DM-Codec: Distilling Multimodal Representations for Speech Tokenization, https://arxiv.org/abs/2410.15017
- Berkcan Kapusuzoglu, Supriyo Chakraborty, Chia-Hsuan Lee, Sambit Sahu, 26 Sep 2025, Critique-Guided Distillation for Efficient and Robust Language Model Reasoning, https://arxiv.org/abs/2505.11628
- Giulia Lanzillotta, Felix Sarnthein, Gil Kur, Thomas Hofmann, Bobby He, 17 Oct 2025, Revisiting Knowledge Distillation: The Hidden Role of Dataset Size, https://arxiv.org/abs/2510.15516
- Wenyuan Li, Guang Li, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama, 17 Oct 2025, Hyperbolic Dataset Distillation, https://arxiv.org/abs/2505.24623
- Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee, 17 Oct 2025, DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech, https://arxiv.org/abs/2505.19687
- Chenhao Ye, Ming Tang, 27 Sep 2025, Learning without Global Backpropagation via Synergistic Information Distillation, https://arxiv.org/abs/2510.03273
- Renrong Shao and Wei Zhang and Jun wang, 3 Oct 2025, Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation, https://arxiv.org/abs/2510.03375
- Wei-Lung Mao, Chun-Chi Wang, Po-Heng Chou, Kai-Chun Liu, Yu Tsao, 4 Oct 2025, MECKD: Deep Learning-Based Fall Detection in Multilayer Mobile Edge Computing With Knowledge Distillation, https://arxiv.org/abs/2510.03601
- Hoang Anh Just, Myeongseob Ko, Ruoxi Jia, 5 Oct 2025, Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data, https://arxiv.org/abs/2510.03988
- Seong Jin Ahn and Myoung-Ho Kim, 5 Oct 2025, Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs, https://arxiv.org/abs/2510.04241
- Sara Kangaslahti, Nihal V. Nayak, Jonathan Geuter, Marco Fumero, Francesco Locatello, David Alvarez-Melis, 6 Oct 2025, Boomerang Distillation Enables Zero-Shot Model Size Interpolation, https://arxiv.org/abs/2510.05064
- Xiaoyu Yang, Jie Lu, En Yu, 5 Oct 2025, Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs, https://arxiv.org/abs/2510.04142
- Mattia Scardecchia, 4 Oct 2025, Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops, https://arxiv.org/abs/2510.03606
- Kuang Yuan, Yang Gao, Xilin Li, Xinhao Mei, Syavosh Zadissa, Tarun Pruthi, Saeed Bagheri Sereshki, 4 Oct 2025, Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation, https://arxiv.org/abs/2510.03728
- Muquan Li, Hang Gou, Dongyang Zhang, Shuang Liang, Xiurui Xie, Deqiang Ouyang, and Ke Qin, 6 Oct 2025, Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation, https://arxiv.org/abs/2510.04838
- Martial Guidez, Stefan Duffner, Yannick Alpou, Oscar R\"oth, Christophe Garcia, 6 Oct 2025, ERDE: Entropy-Regularized Distillation for Early-exit, https://arxiv.org/abs/2510.04856
- Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden, 5 Oct 2025, How to build a consistency model: Learning flow maps via self-distillation, https://arxiv.org/abs/2505.18825
- Harsh Chaudhari, Jamie Hayes, Matthew Jagielski, Ilia Shumailov, Milad Nasr, Alina Oprea, 5 Oct 2025, Cascading Adversarial Bias from Injection to Distillation in Language Models, https://arxiv.org/abs/2505.24842
- Margarita Belova, Jiaxin Xiao, Shikhar Tuli, Niraj K. Jha, 10 Oct 2025, GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data, https://arxiv.org/abs/2510.09580
- Jionghao Lou, Jian Zhang, Zhongmei Li, Lanlan Chen and Enbo Feng, 10 Oct 2025, FedL2T: Personalized Federated Learning with Two-Teacher Distillation for Seizure Prediction, https://arxiv.org/abs/2510.08984
- Muhammad Ali Shafique, Kanwal Mehreen, Muhammad Arham, Maaz Amjad, Sabur Butt, Hamza Farooq, 10 Oct 2025, Alif: Advancing Urdu Large Language Models via Multilingual Synthetic Data Distillation, https://arxiv.org/abs/2510.09051
- Enshu Liu, Qian Chen, Xuefei Ning, Shengen Yan, Guohao Dai, Zinan Lin, Yu Wang, 23 Oct 2025, Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation, https://arxiv.org/abs/2510.21003
- Faisal Hamman, Pasan Dissanayake, Yanjun Fu, Sanghamitra Dutta, 24 Oct 2025, Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations, https://arxiv.org/abs/2510.21631
- Han Yang, Guangjun Qin, 24 Oct 2025, A Dynamic Knowledge Distillation Method Based on the Gompertz Curve, https://arxiv.org/abs/2510.21649
- Bruce W. Lee, Addie Foote, Alex Infanger, Leni Shor, Harish Kamath, Jacob Goldman-Wetzler, Bryce Woodworth, Alex Cloud, Alexander Matt Turner, 24 Oct 2025, Distillation Robustifies Unlearning, https://arxiv.org/abs/2506.06278
- Ejafa Bassam, Dawei Zhu, Kaigui Bian, 23 Oct 2025, PLD: A Choice-Theoretic List-Wise Knowledge Distillation, https://arxiv.org/abs/2506.12542
- Sehyun Park, Jongjin Lee, Yunseop Shin, Ilsang Ohn and Yongdai Kim, 24 Oct 2025, Knowledge Distillation of Uncertainty using Deep Latent Factor Model, https://arxiv.org/abs/2510.19290
- Jiaxing Xu, Mengcheng Lan, Xia Dong, Kai He, Wei Zhang, Qingtian Bian, Yiping Ke, 24 Oct 2025, Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion, https://arxiv.org/abs/2410.08228
- Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin, 23 Oct 2025, Self-Refining Language Model Anonymizers via Adversarial Distillation, https://arxiv.org/abs/2506.01420
- Changchang Sun and Vickie Chen and Yan Yan, 7 Oct 2025, Semantic-Cohesive Knowledge Distillation for Deep Cross-modal Hashing, https://arxiv.org/abs/2510.09664
- Hengyuan Zhang, Shiping Yang, Xiao Liang, Chenming Shang, Yuxuan Jiang, Chaofan Tao, Jing Xiong, Hayden Kwok-Hay So, Ruobing Xie, Angel X. Chang, Ngai Wong, 13 Oct 2025, Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation, https://arxiv.org/abs/2510.10925
- Hyeseon Ahn, Shinwoo Park, Yo-Sub Han, 13 Oct 2025, DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation, https://arxiv.org/abs/2510.10987
- Runze Xia, Yupeng Ji, Yuxi Zhou, Haodong Liu, Teng Zhang, Piji Li, 13 Oct 2025, From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance, https://arxiv.org/abs/2510.11056
- Yesung Cho, Sungmin Lee, Geongyu Lee, Minkyung Lee, Jongbae Park, Dongmyung Shin, 13 Oct 2025, G2L:From Giga-Scale to Cancer-Specific Large-Scale Pathology Foundation Models via Knowledge Distillation, https://arxiv.org/abs/2510.11176
- Xurong Xie, Zhucun Xue, Jiafu Wu, Jian Li, Yabiao Wang, Xiaobin Hu, Yong Liu, Jiangning Zhang, 13 Oct 2025, LLM-Oriented Token-Adaptive Knowledge Distillation, https://arxiv.org/abs/2510.11615
- Qihang Zhou, Shenhao Fang, Shibo He, Wenchao Meng, Jiming Chen, 12 Oct 2025, FairDD: Fair Dataset Distillation, https://arxiv.org/abs/2411.19623
- Cuipeng Wang, Haipeng Wang, 13 Oct 2025, Contrastive Representation Distillation via Multi-Scale Feature Decoupling, https://arxiv.org/abs/2502.05835
- Jitai Hao, Qiang Huang, Hao Liu, Xinyan Xiao, Zhaochun Ren, Jun Yu, 11 Oct 2025, A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone, https://arxiv.org/abs/2505.12781
- Feng Hong, Yu Huang, Zihua Zhao, Zhihan Zhou, Jiangchao Yao, Dongsheng Li, Ya Zhang, Yanfeng Wang, 9 Oct 2025, Dual-granularity Sinkhorn Distillation for Enhanced Learning from Long-tailed Noisy Data, https://arxiv.org/abs/2510.08179
- Jingyu Peng, Maolin Wang, Hengyi Cai, Yuchen Li, Kai Zhang, Shuaiqiang Wang, Dawei Yin, Xiangyu Zhao, 9 Oct 2025, AdaSwitch: Adaptive Switching Generation for Knowledge Distillation, https://arxiv.org/abs/2510.07842
- Kyumin Lee, Minjin Jeon, Sanghwan Jang, Hwanjo Yu, 9 Oct 2025, STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models, https://arxiv.org/abs/2510.07923
- Yifang Yin, Shengkai Chen, Yiyao Li, Lu Wang, Ruibing Jin, Wei Cui, Shili Xiang, 9 Oct 2025, SimCast: Enhancing Precipitation Nowcasting with Short-to-Long Term Knowledge Distillation, https://arxiv.org/abs/2510.07953
- Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang, 9 Oct 2025, Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency, https://arxiv.org/abs/2510.08431
- Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu, 8 Oct 2025, CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation, https://arxiv.org/abs/2505.21904
- Hengran Zhang and Keping Bi and Jiafeng Guo and Jiaming Zhang and Shuaiqiang Wang and Dawei Yin and Xueqi Cheng, 9 Oct 2025, Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation, https://arxiv.org/abs/2507.19102
- Rui Liu, Zikang Wang, Peng Gao, Yu Shen, Pratap Tokekar, Ming Lin, 19 Sep 2025, MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation, https://arxiv.org/abs/2509.18198
- Yang Li, Chenyu Wang, Tingrui Wang, Yongwei Wang, Haonan Li, Zhunga Liu, Quan Pan, 23 Sep 2025, Latent Danger Zone: Distilling Unified Attention for Cross-Architecture Black-box Attacks, https://arxiv.org/abs/2509.19044
- Juntong Ni, Saurabh Kataria, Shengpu Tang, Carl Yang, Xiao Hu, Wei Jin, 23 Sep 2025, PPG-Distill: Efficient Photoplethysmography Signals Analysis via Foundation Model Distillation, https://arxiv.org/abs/2509.19215
- Zeyu Liu, Souvik Kundu, Lianghao Jiang, Anni Li, Srikanth Ronanki, Sravan Bodapati, Gourav Datta, Peter A. Beerel, 22 Sep 2025, LAWCAT: Efficient Distillation from Quadratic to Linear Attention with Convolution across Tokens for Long Context Modeling, https://arxiv.org/abs/2509.18467
- Ziyuan Liu, Ruifei Zhu, Long Gao, Yuanxiu Zhou, Jingyu Ma, and Yuantao Gu, 23 Sep 2025, JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework, https://arxiv.org/abs/2502.13407
- Yifei Sun, 21 Oct 2025, Towards Universal Solvers: Using PGD Attack in Active Learning to Increase Generalizability of Neural Operators as Knowledge Distillation from Numerical PDE Solvers, https://arxiv.org/abs/2510.18989
- Yuezhou Hu, Jiaxin Guo, Xinyu Feng, Tuo Zhao, 22 Oct 2025, AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders, https://arxiv.org/abs/2510.19779
- Prajjwal Bhattarai, Mohammad Amjad, Dmytro Zhylko, Tuka Alhanai, 27 Sep 2025, Knowledge distillation through geometry-aware representational alignment, https://arxiv.org/abs/2509.25253
- Yeongmin Kim, Donghyeok Shin, Mina Kang, Byeonghu Na, Il-Chul Moon, 30 Sep 2025, Distillation of Large Language Models via Concrete Score Matching, https://arxiv.org/abs/2509.25837
- Chenyang Jiang, Zhengcen Li, Hang Zhao, Qiben Shan, Shaocong Wu, Jingyong Su, 30 Sep 2025, Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation, https://arxiv.org/abs/2509.26219
- Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Sung Ju Hwang, 30 Sep 2025, Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization, https://arxiv.org/abs/2505.07675
- Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Tianqi Li, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Pu Zhao, Xue Lin, Dong Huang and Yanzhi Wang, 30 Sep 2025, Structured Agent Distillation for Large Language Model, https://arxiv.org/abs/2505.13820
- Harry Dong, Bilge Acun, Beidi Chen, Yuejie Chi, 30 Sep 2025, Scalable LLM Math Reasoning Acceleration with Low-rank Distillation, https://arxiv.org/abs/2505.07861
- Yixiao Wang, Mingxiao Huo, Zhixuan Liang, Yushi Du, Lingfeng Sun, Haotian Lin, Jinghuan Shang, Chensheng Peng, Mohit Bansal, Mingyu Ding, Masayoshi Tomizuka, 6 Oct 2025, VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing, https://arxiv.org/abs/2510.05213
- Zhoutong Fu, Yihan Cao, Yi-Lin Chen, Aman Lunia, Liming Dong, Neha Saraf, Ruijie Jiang, Yun Dai, Qingquan Song, Tan Wang, Guoyao Li, Derek Koh, Haichao Wei, Zhipeng Wang, Aman Gupta, Chengming Jiang, Jianqiang Shen, Liangjie Hong, Wenjing Zhang, 7 Oct 2025, LANTERN: Scalable Distillation of Large Language Models for Job-Person Fit and Explanation, https://arxiv.org/abs/2510.05490
- Zachary Ravichandran, Ignacio Hounie, Fernando Cladera, Alejandro Ribeiro, George J. Pappas, Vijay Kumar, 6 Oct 2025, Distilling On-device Language Models for Robot Planning with Minimal Human Intervention, https://arxiv.org/abs/2506.17486
- Xun Wu, Shaohan Huang, Wenhui Wang, Ting Song, Li Dong, Yan Xia, Furu Wei, 15 Oct 2025, BitNet Distillation, https://arxiv.org/abs/2510.13998
- Hansheng Chen, Kai Zhang, Hao Tan, Leonidas Guibas, Gordon Wetzstein, Sai Bi, 16 Oct 2025, pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation, https://arxiv.org/abs/2510.14974
Ensemble Knowledge Distillation (Multi-Model)
Rather than a single teacher-student pair of models, there is research to suggest that it can be even more effective to use multiple teacher models, or various other ensemble distillation techniques.
- Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, and Lei Li. Learning from deep model via exploring local targets, 2021. https://openreview.net/forum?id=5slGDu_bVc6 (Distillation with multiple teachers)
- Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. Improved knowledge distillation via teacher assistant. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 5191– 5198. AAAI Press, 2020. https://arxiv.org/abs/1902.03393 (multiple teachers)
- Jangho Kim, Minsung Hyun, Inseop Chung, and Nojun Kwak. Feature fusion for online mutual knowledge distillation. In 25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event / Milan, Italy, January 10-15, 2021, pp. 4619–4625. IEEE, 2020. https://arxiv.org/abs/1904.09058 (Ensemble methods for distillation.)
- Inseop Chung, Seonguk Park, Jangho Kim, and Nojun Kwak. Feature-map-level online adversarial knowledge distillation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 2006–2015. PMLR, 2020. https://arxiv.org/abs/2002.01775 (Multiple teacher models.)
- Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, and Chun Chen. Online knowledge distillation with diverse peers. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 3430–3437. AAAI Press, 2020a https://arxiv.org/abs/1912.00350 (Ensemble distillation with multiple "peer" teachers.)
- Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Orm ´ andi, George E. Dahl, and Geoffrey E. Hinton. Large scale distributed neural network training through online distillation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. https://arxiv.org/abs/1804.03235
- Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi, Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher, arXiv preprint arXiv:2110.08532, 2021. https://arxiv.org/abs/2110.08532
- Y. Zhang, T. Xiang, T. M. Hospedales and H. Lu, "Deep mutual learning", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4320-4328, Jun. 2018. https://arxiv.org/abs/1706.00384
- L. Yuan, F. E. Tay, G. Li, T. Wang and J. Feng, "Revisiting knowledge distillation via label smoothing regularization", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3903-3911, Jun. 2020. https://arxiv.org/abs/1909.11723 (Improved learning, and also looks at reverse student-to-teacher learning.)
Unnatural Data Set Creation
Training one model on the output of another is not exactly distillation, but it is a widespread practice. Research papers on "unnatural instructions" that omit human curation, include:
- Or Honovich, Thomas Scialom, Omer Levy, Timo Schick, Dec 2022, Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor, https://arxiv.org/abs/2212.09689, https://github.com/orhonovich/unnatural-instructions (Using a model to automatically create a training data set, including automatically creating both instructions and responses.)
Dataset Distillation
The technique of "dataset distillation" borrows the same terminology, but is a different technique to knowledge distillation. This term refers to methods to reduce a training dataset to a derived set of training data, such as to avoid privacy or copyright concerns. The dataset is smaller and theoretically can be used to train a similarly capable model.
Papers on dataset distillation:
- T. Wang, J.-Y. Zhu, A. Torralba and A. A. Efros, "Dataset distillation", arXiv:1811.10959, 2018. https://arxiv.org/abs/1811.10959
- Yu R, Liu S, Wang X, 2023, Dataset Distillation: A Comprehensive Review. https://arxiv.org/abs/2301.07014
- Or Honovich, Thomas Scialom, Omer Levy, Timo Schick, Dec 2022, Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor, https://arxiv.org/abs/2212.09689, https://github.com/orhonovich/unnatural-instructions (Using a model to automatically create a training data set, including automatically creating both instructions and responses.)
- Mami Nagoya, Keiichi Shiohara, Xing Chen, 2017, "A method for reducing the amounts of training samples for developing AI systems", 2017 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC), pp.13-20, 2017. https://ieeexplore.ieee.org/document/8228448
- David Spuler, March 2024, Chapter 45. Knowledge Distillation, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Black Box Knowledge Distillation
Black box knowledge distillation is any type of KD whereby the student model is treated as a black box. Such architectures use the teacher model to create inputs for the student model, but do not directly modify any internal weights of the student.
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- 8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
- Leo Donisch, Sigurd Schacht, Carsten Lanquillon, 6 Aug 2024, Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations, https://arxiv.org/abs/2408.03130
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
- Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
- M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
White Box Knowledge Distillation
White box knowledge distillation is any use of KD where the weights of the student model can be directly modified. For example, such architectures may involve swapping weights directly with the teacher model, or otherwise editing the internal architecture of the student model.
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- 8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
- Leo Donisch, Sigurd Schacht, Carsten Lanquillon, 6 Aug 2024, Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations, https://arxiv.org/abs/2408.03130
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
- Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
- M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: