Aussie AI
Token Skipping
-
Last Updated 17 November, 2025
-
by David Spuler, Ph.D.
What is Token Skipping?
Token skipping is an LLM inference optimization methods that "skips" the processing of some input tokens. Fewer total tokens to process means less GPU compute cost and faster inference. Variants of this technique include token pruning, token dropping, and token merging.
Releated research areas include:
- Token reduction
- Prompt compression
- Context compression
- Token pruning
- Reasoning token reduction
- Token merging
- Token dropping
Research on Token Skipping
Research papers include:
- Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou, 7 May 2024, Switchable Decision: Dynamic Neural Generation Networks, https://arxiv.org/abs/2405.04513 (Switching and skipping sub-layer components such as attention heads, FFNs, or input token skipping, using decisions made based on allocating computation resources.)
- Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You, 18 Mar 2024, Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation, https://arxiv.org/abs/2403.11808 (PEFT and adaptive inference and token pruning in Vision Transformers.)
- Foozhan Ataiefard, Walid Ahmed, Habib Hajimolahoseini, Saina Asani, Farnoosh Javadi, Mohammad Hassanpour, Omar Mohamed Awad, Austin Wen, Kangling Liu, Yang Liu, 27 Jan 2024, SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection, https://arxiv.org/abs/2401.15293
- Fabio Montello, Ronja Güldenring, Simone Scardapane, Lazaros Nalpantidis, 13 Jan 2025, A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion, https://arxiv.org/abs/2501.07451 (Survey of adaptive inference optimizations: early exit, dynamic routing, token skimming.)
- Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li, 17 Feb 2025, TokenSkip: Controllable Chain-of-Thought Compression in LLMs, https://arxiv.org/abs/2502.12067
- Rohan Choudhury, Shanchuan Lin, Jianyi Wang, Hao Chen, Qi Zhao, Feng Cheng, Lu Jiang, Kris Kitani, Laszlo A. Jeni, 9 Oct 2025, SkipSR: Faster Super Resolution with Token Skipping, https://arxiv.org/abs/2510.08799
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home