Aussie AI

Reasoning Tokens

Last Updated 17 November, 2025

by David Spuler, Ph.D.

Reasoning tokens are special non-word meta-tokens that aid reasoning. Simple examples of reasoning tokens include "pause tokens" (pause to think more) or "start thought" and "stop thought" tokens (marks a segment of text as a "thought"). The full extreme is where the interim steps of reasoning are performed in non-language tokens, or even where the entire reasoning process is done using "concept tokens" in Large Concept Models (LCMs). The goals of using reasoning tokens can be two-fold:

Greater accuracy from using concepts in reasoning (avoiding ambiguity of language), and/or
Faster cost-effective reasoning by using fewer tokens (i.e., token reduction optimizations).

Research on Reasoning Tokens

Research papers on reasoning tokens include:

Ignacio de Gregorio Noblejas, September 15, 2024, OpenAI Launches o1. Here’s All You Need to Know, https://thetechoasis.beehiiv.com/p/openai-launches-o1-heres-need-know
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769 (Performing reasoning in a model trained to operate in the embedding vector space, rather than more directly in the token space.)
Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
Lance Eliot, Dec 18, 2024, Chain Of Continuous Thought Promises Mighty Boost For LLMs And Generative AI By Blowing Up The Fixation On Tokens, https://www.forbes.com/sites/lanceeliot/2024/12/18/chain-of-continuous-thought-promises-mighty-boost-for-llms-and-generative-ai-by-blowing-up-the-fixation-on-tokens/
Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu, 31 Jan 2025, Efficient Reasoning with Hidden Thinking, https://arxiv.org/abs/2501.19201
DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng, 5 Feb 2025. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, https://arxiv.org/abs/2502.03275
Jim the AI Whisperer, Feb 2025, I hacked Perplexity AI’s full system prompt when I shared my own cognitive vulnerabilities with it. How I used my own scrambled brain to outwit Perplexity AI. https://medium.com/the-generator/prompt-hacking-perplexity-ai-system-instructions-7aa6ee923060
Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, Dacheng Tao, Mingli Song, Shunyu Liu, 19 Feb 2025, Reasoning with Reinforced Functional Token Tuning, https://arxiv.org/abs/2502.13389
Ziang Ye, Zhenru Zhang, Yang Zhang, Jianxin Ma, Junyang Lin, Fuli Feng, 19 Dec 2024, Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning, https://arxiv.org/abs/2412.14780
Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He, 28 Feb 2025, CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation, https://arxiv.org/abs/2502.21074
Guanghao Li, Wenhao Jiang, Mingfeng Chen, Yan Li, Hao Yu, Shuting Dong, Tao Ren, Ming Tang, Chun Yuan, 30 May 2025, SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought, https://arxiv.org/abs/2505.24181
Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, Tao Lin, 30 Jun 2025, Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model, https://arxiv.org/abs/2506.23840
Jiaao Li, Kaiyuan Li, Chen Gao, Yong Li, Xinlei Chen, 21 Jul 2025, EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent, https://arxiv.org/abs/2507.15428
Ziyao Wang, Guoheng Sun, Yexiao He, Zheyu Shen, Bowei Tian, Ang Li, 29 Jul 2025, Predictive Auditing of Hidden Tokens in LLM APIs via Reasoning Length Estimation, https://arxiv.org/abs/2508.00912
Huihan Li, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta, Xiang Ren, 4 Aug 2025, Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time, https://arxiv.org/abs/2508.02037
Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu, 8 Aug 2025, Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal, https://arxiv.org/abs/2508.05988
Shubhra Mishra, Gabriel Poesia, Noah D. Goodman, 8 Aug 2025, From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models, https://arxiv.org/abs/2407.00900
Quan Nguyen and Thanh Nguyen-Tang, 20 Aug 2025, One-Layer Transformers are Provably Optimal for In-context Reasoning and Distributional Association Learning in Next-Token Prediction Tasks, https://arxiv.org/abs/2505.15009
Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li, 24 Aug 2025, BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens, https://arxiv.org/abs/2508.17196
Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, Dimitris Metaxas, Hao Wang, 4 Sep 2025, TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning, https://arxiv.org/abs/2505.11737
Zeru Shi, Yingjia Wan, Zhenting Wang, Qifan Wang, Fan Yang, Elisa Kreiss, Ruixiang Tang, 1 Oct 2025, Meaningless Tokens, Meaningful Gains: How Activation Shifts Enhance LLM Reasoning, https://arxiv.org/abs/2510.01032
Mohan Zhang, Yihua Zhang, Jinghan Jia, Zhangyang Wang, Sijia Liu, Tianlong Chen, 12 Oct 2025, One Token Embedding Is Enough to Deadlock Your Large Reasoning Model, https://arxiv.org/abs/2510.15965
Inha Kang, Youngsun Lim, Seonho Lee, Jiho Choi, Junsuk Choe, Hyunjung Shim, 15 Oct 2025, What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging, https://arxiv.org/abs/2510.13232
Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Kai Jia, Zhifang Sui, 3 Oct 2025, SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning, https://arxiv.org/abs/2505.11274
Hengbo Xiao, Jingyuan Fan, Xin Tong, Jingzhao Zhang, Chao Lu, Guannan He, 27 Sep 2025, PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning, https://arxiv.org/abs/2509.18169
Canhui Wu, Qiong Cao, Chang Li, Zhenfang Wang, Chao Xue, Yuwei Fan, Wei Xi, and Xiaodong He, 4 Oct 2025, Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models, https://arxiv.org/abs/2510.03805
Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavarm, 10 Oct 2025, DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning, https://arxiv.org/abs/2510.09883

Pause Tokens

Pause tokens are a type of meta-token that helps an LLM perform better reasoning in a multi-step inference algorithm such as Chain-of-Thought. The idea is to train extra meta-tokens that instruct the LLM to "pause" and think some more at the current point. Research papers on pause tokens include:

Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
Zeyu Tang, Zhenhao Chen, Loka Li, Xiangchen Song, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang, 5 Feb 2025, Reflection-Window Decoding: Text Generation with Selective Refinement, https://arxiv.org/abs/2502.03678 (Combination of sliding window attention with pausing.)

Concept Tokens

Concept tokens are an LLM inference method that uses meta-tokens to represent concepts rather than words. This can be used to improve accuracy of reasoning (because of less language ambiguity), and/or efficiency of token processing, because there are fewer tokens. The full use of concept tokens for the entire LLM is called a "concept model" or a "Large Concept Model" (LCM). It is also possible to use concept tokens in the interim steps of Chain-of-Thought reasoning.

Research papers on concept tokens and Large Concept Models:

LCM team, Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R. Costa-jussà, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk, 15 Dec 2024 (v2), Large Concept Models: Language Modeling in a Sentence Representation Space, https://arxiv.org/abs/2412.08821 https://github.com/facebookresearch/large_concept_model (Model operates at the sentence concept level, using SONAR sentence embeddings.)
Dr. Ashish Bamania, Dec 2024, Meta’s Large Concept Models (LCMs) Are Here To Challenge And Redefine LLMs: A deep dive into ‘Large Concept Model’, a novel language processing architecture and evaluating its performance against state-of-the-art LLMs, https://levelup.gitconnected.com/metas-large-concept-models-lcms-are-here-to-challenge-and-redefine-llms-7f9778f88a87
Sachin Kumar, Sep 17, 2024, Hidden Chain-of-Thought decoding: faster and efficient CoT decoding to improve reasoning of LLMs, https://medium.com/@techsachin/hidden-chain-of-thought-decoding-faster-and-efficient-cot-decoding-to-improve-reasoning-of-llms-d95584bc9346 (Token reduction in CoT by compressing language tokens into an internal "hidden" concise token representation.)
Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561
Lance Eliot, Dec 18, 2024, Chain Of Continuous Thought Promises Mighty Boost For LLMs And Generative AI By Blowing Up The Fixation On Tokens, https://www.forbes.com/sites/lanceeliot/2024/12/18/chain-of-continuous-thought-promises-mighty-boost-for-llms-and-generative-ai-by-blowing-up-the-fixation-on-tokens/
Kyle Orland, 13 Dec 2024, Are LLMs capable of non-verbal reasoning? Processing in the "latent space" could help AI with tricky logical questions, https://arstechnica.com/ai/2024/12/are-llms-capable-of-non-verbal-reasoning/
Alex McFarland, December 16, 2024, Meta’s COCONUT: The AI Method That Thinks Without Language, https://www.unite.ai/metas-coconut-the-ai-method-that-thinks-without-language/
Maxime Peyrard, Martin Josifoski, Robert West, 21 Mar 2024, The Era of Semantic Decoding, https://arxiv.org/abs/2403.14562
Hussain Ahmad, Diksha Goel, 8 Jan 2025, The Future of AI: Exploring the Potential of Large Concept Models, https://arxiv.org/abs/2501.05487
Giuliano Liguori, Jan 2025, Large Concept Models (LCM): A New Frontier in AI Beyond Token-Level Language Models, https://www.linkedin.com/pulse/large-concept-models-lcm-new-frontier-ai-beyond-giuliano-liguori--dnj3f/
Hanyu Zhang, Xiting Wang, Chengao Li, Xiang Ao, Qing He, 10 Jan 2025, Controlling Large Language Models Through Concept Activation Vectors, https://arxiv.org/abs/2501.05764 (Training a vector used to control the model on certain attributes.)
Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu, 3 Feb 2025, Scalable Language Models with Posterior Inference of Latent Thought Vectors, https://arxiv.org/abs/2502.01567
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng, 5 Feb 2025. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, https://arxiv.org/abs/2502.03275
Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov, Janice Lan, Shibo Hao, Yuandong Tian, Jason Weston, Xian Li, 12 Feb 2025, LLM Pretraining with Continuous Concepts, https://arxiv.org/abs/2502.08524
Vishal Rajput, Feb 2025, Forget LLMs, It’s Time For Large Concept Models (LCMs), https://medium.com/aiguys/forget-llms-its-time-for-large-concept-models-lcms-05b75fe43185
Towards Practical Concept-Based Language Models: An Efficiency-Focused Implementation Vivek K. Tiwari, 2025, https://www.researchgate.net/profile/Vivek-Tiwari-41/publication/388753941_Towards_Practical_Concept-Based_Language_Models_An_Efficiency-Focused_Implementation/links/67a4bf86461fb56424cc6b62/Towards-Practical-Concept-Based-Language-Models-An-Efficiency-Focused-Implementation.pdf
Datacamp, Feb 21, 2025, Large Concept Models: A Guide With Examples: Learn what large concept models are, how they differ from LLMs, and how their architecture leads to improvements in language processing, https://www.datacamp.com/blog/large-concept-models
Mehul Gupta, Jan 5, 2025, Meta Large Concept Models (LCM): End of LLMs? What are LCMs and how is LCM different from LLMs, https://medium.com/data-science-in-your-pocket/meta-large-concept-models-lcm-end-of-llms-68cb0c5cd5cf
By AI Papers Academy, 3 January 2025, Large Concept Models (LCMs) by Meta: The Era of AI After LLMs? https://aipapersacademy.com/large-concept-models/
Andrea Viliotti, 20 Dec 2024, Large Concept Model (LCM): a new paradigm for large-scale semantic reasoning in AI, https://www.andreaviliotti.it/post/large-concept-model-lcm-a-new-paradigm-for-large-scale-semantic-reasoning-in-ai
Leadership in AI, January, 2025, Meta’s stunning LCM large concept models for artificial intelligence — they are thinking now! https://www.youtub e.com/watch?v=u Z3HCw8ApQ,
Lance Eliot, Jan 06, 2025, AI Is Breaking Free Of Token-Based LLMs By Upping The Ante To Large Concept Models That Devour Sentences And Adore Concepts, https://www.forbes.com/sites/lanceeliot/2025/01/06/ai-is-breaking-free-of-token-based-llms-by-upping-the-ante-to-large-concept-models-that-devour-sentences-and-adore-concepts/
Zen the innovator, Jan 5, 2025, Large Concept Models (LCMs), https://medium.com/@ThisIsMeIn360VR/large-concept-models-lcms-d59b86531ef6
Debabrata Pruseth, Jan 2025, LCMs: Large Concept Models – The Path to AGI ( Artificial General Intelligence) & The Future of AI Thinking, https://debabratapruseth.com/lcms-large-concept-models-the-path-to-agi-the-future-of-ai-thinking/
Asif Razzaq, December 15, 2024, Meta AI Proposes Large Concept Models (LCMs): A Semantic Leap Beyond Token-based Language Modeling, https://www.marktechpost.com/2024/12/15/meta-ai-proposes-large-concept-models-lcms-a-semantic-leap-beyond-token-based-language-modeling/
Aniket Hingane, Dec 27, 2024, Practical Advancements in AI: How Large Concept Models Are Redefining the Landscape of LLMs, https://medium.com/@learn-simplified/practical-advancements-in-ai-how-large-concept-models-are-redefining-the-landscape-of-llms-b0220296458b
Siddhant Rai and Vizuara AI, Dec 30, 2024, Large Concept models : Language Modeling in a Sentence Representation Space: Re-imagining the core principles behind representation generation in foundation model, https://vizuara.substack.com/p/large-concept-models-language-modeling?
J Liao, R Xie, S Li, X Wang, X Sun, Z Kang, X He, 2025, Multi-Grained Patch Training for Efficient LLM-based Recommendation, https://hexiangnan.github.io/papers/sigir25-PatchRec.pdf
Chen Shani, Dan Jurafsky, Yann LeCun, Ravid Shwartz-Ziv, 30 Jun 2025 (v3), From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning, https://arxiv.org/abs/2505.17117 (Humans organize information via "semantic compression".)
Shariar Kabir, Kevin Esterling, Yue Dong, 23 Apr 2025, Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models, https://arxiv.org/abs/2504.17052