Aussie AI

Early Exit Knowledge Distillation

Last Updated 24 April, 2026

by David Spuler, Ph.D.

Research on Early Exit Knowledge Distillation

Research papers include:

Boyi Liu, Zimu Zhou, Yongxin Tong, 15 Jan 2026, CAFEDistill: Learning Personalized and Dynamic Models through Federated Early-Exit Network Distillation, https://arxiv.org/abs/2601.10015
Salim Khazem, 3 Feb 2026, SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones, https://arxiv.org/abs/2602.03043
Shiwen Ni, Min Yang, Ruifeng Xu, Chengming Li, Xiping Hu, 26 Feb 2024, Layer-wise Regularized Dropout for Neural Language Models, https://arxiv.org/abs/2402.16361
Anas Anwarul Haq Khan, Utkarsh Verma, Ganesh Ramakrishnan, 11 Sep 2025 (v2), Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization, https://arxiv.org/abs/2504.21831
Lehao Qu, Shuyuan Li, Zimu Zhou, Boyi Liu, Yi Xu, and Yongxin Tong. 2025. DarkDistill: Difficulty-Aligned Federated Early-Exit Network Training on Heterogeneous Devices. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25). Association for Computing Machinery, New York, NY, USA, 2374–2385. https://doi.org/10.1145/3711896.3736902 https://dl.acm.org/doi/10.1145/3711896.3736902
Dong, Y., He, Q., Rui, P., Zheng, Z., Li, Z., Chen, F., Jin, H., & Yang, Y. (2026). EnViT: Enhancing the Performance of Early-Exit Vision Transformers via Exit-Aware Structured Dropout-Enabled Self-Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(25), 20852-20860. https://doi.org/10.1609/aaai.v40i25.39225 https://ojs.aaai.org/index.php/AAAI/article/view/39225 https://ojs.aaai.org/index.php/AAAI/article/view/39225/43186
Haseena Rahmath P, Vishal Srivastava, Kuldeep Chaurasia, Roberto G. Pacheco, and Rodrigo S. Couto. 2024. Early-Exit Deep Neural Network - A Comprehensive Survey. ACM Comput. Surv. 57, 3, Article 75 (March 2025), 37 pages. https://doi.org/10.1145/3698767 https://dl.acm.org/doi/full/10.1145/3698767 https://dl.acm.org/doi/pdf/10.1145/3698767
Shiting Xu, DEEP-CWS: Distilling Efficient pre-trained models with Early exit and Pruning for scalable Chinese Word Segmentation, Information Sciences, Volume 719, 2025, 122470, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2025.122470 https://www.sciencedirect.com/science/article/abs/pii/S0020025525006024
Meng, L., Zhang, R., Shan, W. (2026). Robust and Efficient Early Exit for Large Language Models: Mitigating KV Cache Loss and Enhancing Exit Stability. In: Jin, L., Wang, L. (eds) Advances in Neural Networks – ISNN 2025. ISNN 2025. Lecture Notes in Computer Science, vol 15951. Springer, Singapore. https://doi.org/10.1007/978-981-95-1233-1_7 https://link.springer.com/chapter/10.1007/978-981-95-1233-1_7