Aussie AI

Token Dropping

  • Last Updated 18 September, 2025
  • by David Spuler, Ph.D.

What is Token Dropping?

Token dropping is an LLM optimization that reduces token processing by "dropping" some of them. It is similar to "token pruning", but often refers to dropping tokens during the training phase, whereas token pruning is mostly an inference optimization.

Releated research areas include:

Research on Token Dropping

Research papers include:

  • Bartosz Wójcik, Alessio Devoto, Karol Pustelnik, Pasquale Minervini, Simone Scardapane, 15 Dec 2023, Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference, https://arxiv.org/abs/2312.10193 (Modifies its computation depending on the difficulty of each input token.)
  • M Salehi, S Mehta, A Kusupati, A Farhadi, H Hajishirzi, 2023 Sharcs: Efficient transformers through routing with dynamic width sub-networks https://arxiv.org/pdf/2310.12126.pdf (Direct queries to subnetworks with different widths.)
  • Kazi Hasan Ibn Arif, JinYi Yoon, Dimitrios S. Nikolopoulos, Hans Vandierendonck, Deepu John, Bo Ji, 20 Aug 2024, HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments, https://arxiv.org/abs/2408.10945
  • Simone Scardapane, Alessandro Baiocchi, Alessio Devoto, Valerio Marsocci, Pasquale Minervini, Jary Pomponi, 8 Jul 2024 (v2), Conditional computation in neural networks: principles and research trends, https://arxiv.org/abs/2403.07965
  • Le Hou, Richard Yuanzhe Pang, Tianyi Zhou, Yuexin Wu, Xinying Song, Xiaodan Song, Denny Zhou, 24 Mar 2022. Token Dropping for Efficient BERT Pretraining, https://arxiv.org/abs/2203.13240
  • Huaao Zhang, Shigui Qiu, Xiangyu Duan, Min Zhang, 21 Oct 2020, Token Drop mechanism for Neural Machine Translation, https://arxiv.org/abs/2010.11018
  • Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, Dacheng Tao, 24 May 2023, Revisiting Token Dropping Strategy in Efficient BERT Pretraining, https://arxiv.org/abs/2305.15273
  • Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He, 17 Nov 2022, Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers, https://arxiv.org/abs/2211.11586
  • Ting Liu, Liangtao Shi, Richang Hong, Yue Hu, Quanjun Yin, Linfeng Zhang, 16 Nov 2024, Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model, https://arxiv.org/abs/2411.10803
  • Fabio Montello, Ronja Güldenring, Simone Scardapane, Lazaros Nalpantidis, 13 Jan 2025, A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion, https://arxiv.org/abs/2501.07451 (Survey of adaptive inference optimizations: early exit, dynamic routing, token skimming.)
  • Tuowei Wang, Xingyu Chen, Kun Li, Ting Cao, Ju Ren, Yaoxue Zhang, 15 Jan 2025, LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning, https://arxiv.org/abs/2501.09767
  • Difan Deng, Marius Lindauer, 20 Feb 2025 (v2), Neural Attention Search, https://arxiv.org/abs/2502.13251 (Deciding whether a token deserves global attention, local attention, or sliding window attention, reducing KV caches.)
  • Ammar Ahmed, Sheng Di, Franck Cappello, Zirui Liu, Jingoo Han, Ali Anwar, 1 Aug 2025, Systematic Evaluation of Optimization Techniques for Long-Context Language Models, https://arxiv.org/abs/2508.00305
  • Guoxin Wang, Qingyuan Wang, Binhua Huang, Shaowu Chen, Deepu John, 3 Sep 2025, TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers, https://arxiv.org/abs/2509.03379

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research Topics

Read more about: