Aussie AI

Reasoning Tokens

  • Last Updated 18 September, 2025
  • by David Spuler, Ph.D.

Reasoning tokens are special non-word meta-tokens that aid reasoning. Simple examples of reasoning tokens include "pause tokens" (pause to think more) or "start thought" and "stop thought" tokens (marks a segment of text as a "thought"). The full extreme is where the interim steps of reasoning are performed in non-language tokens, or even where the entire reasoning process is done using "concept tokens" in Large Concept Models (LCMs). The goals of using reasoning tokens can be two-fold:

  • Greater accuracy from using concepts in reasoning (avoiding ambiguity of language), and/or
  • Faster cost-effective reasoning by using fewer tokens (i.e., token reduction optimizations).

See also other reasoning topics:

Research on Reasoning Tokens

Research papers on reasoning tokens include:

  • Ignacio de Gregorio Noblejas, September 15, 2024, OpenAI Launches o1. Here’s All You Need to Know, https://thetechoasis.beehiiv.com/p/openai-launches-o1-heres-need-know
  • Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769 (Performing reasoning in a model trained to operate in the embedding vector space, rather than more directly in the token space.)
  • Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
  • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
  • Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
  • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
  • Lance Eliot, Dec 18, 2024, Chain Of Continuous Thought Promises Mighty Boost For LLMs And Generative AI By Blowing Up The Fixation On Tokens, https://www.forbes.com/sites/lanceeliot/2024/12/18/chain-of-continuous-thought-promises-mighty-boost-for-llms-and-generative-ai-by-blowing-up-the-fixation-on-tokens/
  • Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu, 31 Jan 2025, Efficient Reasoning with Hidden Thinking, https://arxiv.org/abs/2501.19201
  • DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng, 5 Feb 2025. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, https://arxiv.org/abs/2502.03275
  • Jim the AI Whisperer, Feb 2025, I hacked Perplexity AI’s full system prompt when I shared my own cognitive vulnerabilities with it. How I used my own scrambled brain to outwit Perplexity AI. https://medium.com/the-generator/prompt-hacking-perplexity-ai-system-instructions-7aa6ee923060
  • Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, Dacheng Tao, Mingli Song, Shunyu Liu, 19 Feb 2025, Reasoning with Reinforced Functional Token Tuning, https://arxiv.org/abs/2502.13389
  • Ziang Ye, Zhenru Zhang, Yang Zhang, Jianxin Ma, Junyang Lin, Fuli Feng, 19 Dec 2024, Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning, https://arxiv.org/abs/2412.14780
  • Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He, 28 Feb 2025, CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation, https://arxiv.org/abs/2502.21074
  • Guanghao Li, Wenhao Jiang, Mingfeng Chen, Yan Li, Hao Yu, Shuting Dong, Tao Ren, Ming Tang, Chun Yuan, 30 May 2025, SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought, https://arxiv.org/abs/2505.24181
  • Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, Tao Lin, 30 Jun 2025, Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model, https://arxiv.org/abs/2506.23840
  • Jiaao Li, Kaiyuan Li, Chen Gao, Yong Li, Xinlei Chen, 21 Jul 2025, EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent, https://arxiv.org/abs/2507.15428
  • Ziyao Wang, Guoheng Sun, Yexiao He, Zheyu Shen, Bowei Tian, Ang Li, 29 Jul 2025, Predictive Auditing of Hidden Tokens in LLM APIs via Reasoning Length Estimation, https://arxiv.org/abs/2508.00912
  • Huihan Li, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta, Xiang Ren, 4 Aug 2025, Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time, https://arxiv.org/abs/2508.02037
  • Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu, 8 Aug 2025, Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal, https://arxiv.org/abs/2508.05988
  • Shubhra Mishra, Gabriel Poesia, Noah D. Goodman, 8 Aug 2025, From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models, https://arxiv.org/abs/2407.00900
  • Quan Nguyen and Thanh Nguyen-Tang, 20 Aug 2025, One-Layer Transformers are Provably Optimal for In-context Reasoning and Distributional Association Learning in Next-Token Prediction Tasks, https://arxiv.org/abs/2505.15009
  • Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li, 24 Aug 2025, BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens, https://arxiv.org/abs/2508.17196
  • Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, Dimitris Metaxas, Hao Wang, 4 Sep 2025, TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning, https://arxiv.org/abs/2505.11737

Pause Tokens

Pause tokens are a type of meta-token that helps an LLM perform better reasoning in a multi-step inference algorithm such as Chain-of-Thought. The idea is to train extra meta-tokens that instruct the LLM to "pause" and think some more at the current point. Research papers on pause tokens include:

  • Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
  • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
  • Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
  • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
  • Zeyu Tang, Zhenhao Chen, Loka Li, Xiangchen Song, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang, 5 Feb 2025, Reflection-Window Decoding: Text Generation with Selective Refinement, https://arxiv.org/abs/2502.03678 (Combination of sliding window attention with pausing.)

Concept Tokens

Concept tokens are an LLM inference method that uses meta-tokens to represent concepts rather than words. This can be used to improve accuracy of reasoning (because of less language ambiguity), and/or efficiency of token processing, because there are fewer tokens. The full use of concept tokens for the entire LLM is called a "concept model" or a "Large Concept Model" (LCM). It is also possible to use concept tokens in the interim steps of Chain-of-Thought reasoning.

Research papers on concept tokens and Large Concept Models:

Reasoning and CoT Efficiency Topics

Blog articles on reasoning efficiency:

More research information on general efficiency optimization techniques for reasoning models:

Efficiency optimizations to Chain-of-Thought include:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: