Aussie AI
Reasoning Token Reduction
-
Last Updated 30 March, 2026
-
by David Spuler, Ph.D.
Research on Reasoning Token Reduction
Research papers include:
- OpenAI, Dec 2024, OpenAI o1 and new tools for developers, https://openai.com/index/o1-and-new-tools-for-developers/ ("Lower latency: o1 uses on average 60% fewer reasoning tokens than o1-preview for a given request.")
- Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561 (Compressing the interim token sequences in Chain-of-Thought.)
- Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber, 2 Nov 2023, Implicit Chain of Thought Reasoning via Knowledge Distillation, https://arxiv.org/abs/2311.01460 (Knowledge distillation applied to optimizing the interim computations in Chain-of-Thought.)
- Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou, 16 Dec 2024, C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness, https://arxiv.org/abs/2412.11664 (Token pruning and prompt compression for Chain-of-Thought.)
- Jeffrey Cheng, Benjamin Van Durme, 17 Dec 2024, Compressed Chain of Thought: Efficient Reasoning Through Dense Representations, https://arxiv.org/abs/2412.13171 (Context compression applied to interim CoT reasoning steps.)
- Devmallya Karar, Oct 4, 2024, Chain-Of-Thought ( CoT ) in Large Language Models prompting and Concise CoT with Code implementation using Python and PyTorch, https://medium.com/@devmallyakarar/chain-of-thought-cot-in-large-language-models-prompting-and-concise-cot-with-code-82821f9a832d
- Cobus Greyling, Jan 24, 2024, Concise Chain-of-Thought (CCoT) Prompting, Traditional CoT comes at a cost of increased output token usage, CCoT prompting is a prompt-engineering technique which is aimed at reducing LLM response verbosity & inference time. https://cobusgreyling.substack.com/p/concise-chain-of-thought-ccot-prompting
- Matthew Renze, Erhan Guven 19 Oct 2024 (v3), The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models, https://arxiv.org/abs/2401.05618 https://github.com/matthewrenze/jhu-concise-cot (The original paper on Concise CoT.)
- Tyler McDonald, Anthony Colosimo, Yifeng Li, Ali Emami, 2 Dec 2024, Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index, https://arxiv.org/abs/2412.01690
- Waleed Kadous, May 17, 2023, Numbers every LLM Developer should know, https://www.anyscale.com/blog/num-every-llm-developer-should-know (Includes discussion of "be concise" prompting.)
- Sachin Kumar, Sep 17, 2024, Hidden Chain-of-Thought decoding: faster and efficient CoT decoding to improve reasoning of LLMs, https://medium.com/@techsachin/hidden-chain-of-thought-decoding-faster-and-efficient-cot-decoding-to-improve-reasoning-of-llms-d95584bc9346 (Token reduction in CoT by compressing language tokens into an internal "hidden" concise token representation.)
- Tingxu Han, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen, Zhenting Wang, 30 Dec 2024 (v2), Token-Budget-Aware LLM Reasoning, https://arxiv.org/abs/2412.18547 https://github.com/GeniusHTX/TALE
- Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi, 14 Aug 2024 (v2), CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference, https://arxiv.org/abs/2310.10845
- Joonwon Jang, Jaehee Kim, Wonbin Kweon, Hwanjo Yu, 31 Dec 2024 (v2), Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria, https://arxiv.org/abs/2412.21006? (Remove redundant sentences in the reasoning steps for token efficiency.)
- Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
- Zirui Zhao, Hanze Dong, Amrita Saha, Caiming Xiong, Doyen Sahoo, 10 Oct 2024, Automatic Curriculum Expert Iteration for Reliable LLM Reasoning, https://arxiv.org/abs/2410.07627 (Efficiency of bailing out with "I don't know" or refusals versus continuing reasoning steps.)
- Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu, Jacob Andreas, 7 Oct 2024, Learning How Hard to Think: Input-Adaptive Allocation of LM Computation, https://arxiv.org/abs/2410.04707
- Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li, 24 Aug 2024, Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning, https://arxiv.org/abs/2408.13457
- Rohin Manvi, Anikait Singh, Stefano Ermon, 3 Oct 2024, Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation, https://arxiv.org/abs/2410.02725
- Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li, 19 Jan 2024, Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. The Twelfth International Conference on Learning Representations, 2024, https://arxiv.org/abs/2401.10480 https://github.com/Yiwei98/ESC (Uses "early stopping" idea to improve CoT efficiency during inference.)
- Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou, 25 Aug 2024, Path-Consistency: Prefix Enhancement for Efficient Inference in LLM, https://arxiv.org/abs/2409.01281 (Uses the confidence calculations from earlier branches of the reasoning to improve efficiency.)
- Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam, 16 Nov 2023 (v2), Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs, EMNLP 2023, https://arxiv.org/abs/2305.11860 https://www.sample-step-by-step.info/
- Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
- Sania Nayab, Giulio Rossolini, Giorgio Buttazzo, Nicolamaria Manes, Fabrizio Giacomelli, 29 Jul 2024, Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost, https://arxiv.org/abs/2407.19825
- Wenqing Chen, Weicheng Wang, Zhixuan Chu, Kui Ren, Zibin Zheng, and Zhichao Lu. 2024. Self-Para-Consistency: Improving Reasoning Tasks at Low Cost for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 14162–14167, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024.findings-acl.842/ (Generate multiple paraphrased answers, which can reduce tokens as fewer are needed.)
- Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, 22 Jan 2025, O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning, https://arxiv.org/abs/2501.12570 https://github.com/StarDewXXX/O1-Pruner
- By Asif Razzaq, January 24, 2025, Berkeley Sky Computing Lab Introduces Sky-T1-32B-Flash: A New Reasoning Language Model that Significantly Reduces Overthinking, Slashing Inference Costs on Challenging Questions by up to 57%, https://www.marktechpost.com/2025/01/24/berkeley-sky-computing-lab-introduces-sky-t1-32b-flash-a-new-reasoning-language-model-that-significantly-reduces-overthinking-slashing-inference-costs-on-challenging-questions-by-up-to-57/
- Liang Wang, Haonan Chen, Nan Yang, Xiaolong Huang, Zhicheng Dou, Furu Wei, 24 Jan 2025, Chain-of-Retrieval Augmented Generation, https://arxiv.org/abs/2501.14342 (Combines RAG with multi-step reasoning such as Chain-of-Thought, with a method to control token cost.)
- Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky, 6 Oct 2024, Inference Scaling for Long-Context Retrieval Augmented Generation, https://arxiv.org/abs/2410.04343 (Combine RAG and multi-step inference, controlling token cost via budgeting allocations.)
- Zishun Yu, Tengyu Xu, Di Jin, Karthik Abinav Sankararaman, Yun He, Wenxuan Zhou, Zhouhao Zeng, Eryk Helenowski, Chen Zhu, Sinong Wang, Hao Ma, Han Fang, 29 Jan 2025, Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization, https://arxiv.org/abs/2501.17974 (CoT optimization using an inference budget.)
- Mayi Xu, Yongqi Li, Ke Sun, and Tieyun Qian, 2024, Adaption-of-thought: Learning question difficulty improves large language models for reasoning, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5468–5495, 2024, https://aclanthology.org/2024.emnlp-main.313/ https://aclanthology.org/2024.emnlp-main.313.pdf
- Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Jan 2025, Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs, https://arxiv.org/abs/2501.18585
- Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
- Mohammed Karimkhan Pathan, February 3, 2025, Open-source revolution: How DeepSeek-R1 challenges OpenAI’s o1 with superior processing, cost efficiency, https://venturebeat.com/ai/open-source-revolution-how-deepseek-r1-challenges-openais-o1-with-superior-processing-cost-efficiency/
- Ben Dickson, February 5, 2025, Not every AI prompt deserves multiple seconds of thinking: how Meta is teaching models to prioritize, https://venturebeat.com/ai/not-every-ai-prompt-deserves-multiple-seconds-of-thinking-how-meta-is-teaching-models-to-prioritize/
- Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang, 13 Feb 2025, CoT-Valve: Length-Compressible Chain-of-Thought Tuning, https://arxiv.org/abs/2502.09601 https://github.com/horseee/CoT-Valve
- Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, (authors omitted), 22 Jan 2025, Kimi k1.5: Scaling Reinforcement Learning with LLMs, https://arxiv.org/abs/2501.12599 (Includes a "length penalty" to address token reduction.)
- Jin, M., Yu, Q., Shu, D., Zhao, H., Hua, W., Meng, Y., Zhang, Y., and Du, M. The impact of reasoning step length on large language models. In Ku, L.-W., Martins, A., and Srikumar, V. (eds.), Findings of the Association for Computational Linguistics: ACL 2024, pp. 1830 1842, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024. https://aclanthology.org/2024.findings-acl.108/ (Shows that token reduction does reduce accuracy in reasoning.)
- Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li, 17 Feb 2025, TokenSkip: Controllable Chain-of-Thought Compression in LLMs, https://arxiv.org/abs/2502.12067
- Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
- Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang, 21 Feb 2025, LightThinker: Thinking Step-by-Step Compression, https://arxiv.org/abs/2502.15589 https://github.com/zjunlp/LightThinker (Faster CoT by compressing the text of intermediate reasoning steps with gist tokens.)
- Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei, 25 Feb 2025, Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning, https://arxiv.org/abs/2502.18080 (Trying to generate the "shortest correct response" by examining the lengths needed for CoT.)
- Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He, 25 Feb 2025, Chain of Draft: Thinking Faster by Writing Less, https://arxiv.org/abs/2502.18600 (Concise CoT method using a per-step inference budget.)
- Tergel Munkhbat, Namgyu Ho, Seohyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun, 27 Feb 2025, Self-Training Elicits Concise Reasoning in Large Language Models, https://arxiv.org/abs/2502.20122 https://github.com/TergelMunkhbat/concise-reasoning
- Dr. Ashish Bamania, March 3rd, 2025, Chain-of-Draft (CoD) Is The New King Of Prompting Techniques: A deep dive into the novel Chain-of-Draft (CoD) Prompting that outperforms Chain-of-Thought (CoT) Prompting while reducing LLM inference cost and latency like never before, https://levelup.gitconnected.com/chain-of-draft-cod-is-the-new-king-of-prompting-techniques-d9dc17f12051
- Michael Nuñez, March 3, 2025, Less is more: How ‘chain of draft’ could cut AI costs by 90% while improving performance, https://venturebeat.com/ai/less-is-more-how-chain-of-draft-could-cut-ai-costs-by-90-while-improving-performance/
- Reddit, March 01, 2025, Chain of Draft: Streamlining LLM Reasoning with Minimal Token Generation, https://www.reddit.com/r/artificial/comments/1j04ezf/chain_of_draft_streamlining_llm_reasoning_with/
- Sulbha Jain, March 02, 2025, Chain of Draft: Thinking Faster by Writing Less — Paper Review, https://medium.com/@sulbha.jindal/chain-of-draft-thinking-faster-by-writing-less-paper-review-20e57bfc867a
- Ajith Vallath Prabhakar, March 2, 2025, Chain of Draft: The Breakthrough Prompting Technique That Makes LLMs Think Faster With Less, https://ajithp.com/2025/03/02/chain-of-draft-llm-prompting/
- The Decoder, Mar 2, 2025, Chain of Draft Prompts lets LLMs think cheaper with fewer words, https://the-decoder.com/chain-of-draft-prompts-lets-llms-think-cheaper-with-fewer-words/
- Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He, 28 Feb 2025, CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation, https://arxiv.org/abs/2502.21074
- Ayeong Lee, Ethan Che, Tianyi Peng, 3 Mar 2025, How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach, https://arxiv.org/abs/2503.01141
- Anonymous authors, Mar 2025, Pencil: Long Throughts with Short Memory, ICLR 2025 review, https://openreview.net/pdf?id=KRI2Fmffqr (Using a "memory" during reasoning steps to rewrite intermediate CoT steps in shorter form.)
- Pranjal Aggarwal, Sean Welleck, 6 Mar 2025, L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning, https://arxiv.org/abs/2503.04697 https://www.cmu-l3.github.io/l1 (Efficient CoT by controlling the length of intermediate step outputs.)
- Simon A. Aytes, Jinheon Baek, Sung Ju Hwang, 7 Mar 2025, Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching, https://arxiv.org/abs/2503.05179 https://www.github.com/SimonAytes/SoT (Compressing intermediate steps in CoT.)
- Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
- Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Hanjie Chen, Xia Hu, 23 Mar 2025 (v2), Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models, https://arxiv.org/abs/2503.16419
- Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui, 16 May 2025, SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning, https://arxiv.org/abs/2505.11274
- Chengyu Huang, Zhengxin Zhang, Claire Cardie, 16 May 2025, HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization, https://arxiv.org/abs/2505.11225
- Ren Zhuang, Ben Wang, Shuifa Sun, 17 May 2025 (v2), Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping, https://arxiv.org/abs/2505.08392
- Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak, 14 May 2025 (v2), Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement, https://arxiv.org/abs/2505.07961
- Muzhi Dai, Chenxu Yang, Qingyi Si, 17 May 2025 (v2), S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models, https://arxiv.org/abs/2505.07686
- Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong, 21 May 2025 (v2), Scalable Chain of Thoughts via Elastic Reasoning, https://arxiv.org/abs/2505.05315 https://github.com/SalesforceAIResearch/Elastic-Reasoning
- Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Fandong Meng, Jie Zhou, Ju Ren, Yaoxue Zhang, 8 May 2025, ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning, https://arxiv.org/abs/2505.04881
- Bin Yu, Hang Yuan, Haotian Li, Xueyin Xu, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen, 21 May 2025 (v2), Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models, https://arxiv.org/abs/2505.03469
- Jingyang Yi, Jiazheng Wang, Sida Li, 16 May 2025 (v2), ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning, https://arxiv.org/abs/2504.21370
- Guanghao Li, Wenhao Jiang, Mingfeng Chen, Yan Li, Hao Yu, Shuting Dong, Tao Ren, Ming Tang, Chun Yuan, 30 May 2025, SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought, https://arxiv.org/abs/2505.24181
- Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu, 28 May 2025, AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models, https://arxiv.org/abs/2505.22662
- Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, Aixin Sun, 28 May 2025, CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models, https://arxiv.org/abs/2505.22017
- Jinyan Su, Claire Cardie, 23 May 2025, Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards, https://arxiv.org/abs/2505.18298
- Ruihan Gong, Yue Liu, Wenjie Qu, Mingzhe Du, Yufei He, Yingwei Ma, Yulin Chen, Xiang Liu, Yi Wen, Xinfeng Li, Ruidong Wang, Xinzhong Zhu, Bryan Hooi, Jiaheng Zhang, 26 May 2025, Efficient Reasoning via Chain of Unconscious Thought, https://arxiv.org/abs/2505.19756
- Weize Chen, Jiarui Yuan, Tailin Jin, Ning Ding, Huimin Chen, Zhiyuan Liu, Maosong Sun, 25 May 2025, The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training, https://arxiv.org/abs/2505.19217
- Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty, 23 May 2025, First Finish Search: Efficient Test-Time Scaling in Large Language Models, https://arxiv.org/abs/2505.18149 (Running multiple parallel decoding steps but stopping when the fastest and usually shortest one completes.)
- Joonwon Jang, Jaehee Kim, Wonbin Kweon, Seonghyeon Lee, Hwanjo Yu, July 2025, Verbosity-Aware Rationale Reduction: Sentence-Level Rationale Reduction for Efficient and Effective Reasoning, Findings of the Association for Computational Linguistics: ACL 2025, pages 20769–20784 July 27- August 1, 2025, https://aclanthology.org/2025.findings-acl.1068.pdf
- Jason Zhu, Hongyu Li, 13 Jul 2025, Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey, https://arxiv.org/abs/2507.09662
- Shu Yang, Junchao Wu, Xuansheng Wu, Derek Wong, Ninhao Liu, Di Wang, 24 Jun 2025, Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs, https://arxiv.org/abs/2506.19492
- Zehui Ling, Deshu Chen, Hongwei Zhang, Yifeng Jiao, Xin Guo, Yuan Cheng, 12 Jun 2025, Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty, https://arxiv.org/abs/2506.10446
- Wei Huang, Yizhe Xiong, Xin Ye, Zhijie Deng, Hui Chen, Zijia Lin, Guiguang Ding, 23 May 2025, Fast Quiet-STaR: Thinking Without Thought Tokens, https://arxiv.org/abs/2505.17746 https://github.com/huangwei200012/Fast-Quiet-STaR
- Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang, 23 May 2025, VeriThinker: Learning to Verify Makes Reasoning Model Efficient, https://arxiv.org/abs/2505.17941 https://github.com/czg1225/VeriThinker
- Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim, 20 May 2025, Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning, https://arxiv.org/abs/2505.13866 https://github.com/jiwonsong-dev/ReasoningPathCompression
- Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Yizhen Yuan, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu, 21 May 2025 (v3), An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint, https://arxiv.org/abs/2504.14350
- Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang, 15 Apr 2025, Efficient Reasoning Models: A Survey, https://arxiv.org/abs/2504.10903
- Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu, 5 Aug 2025, Compressing Chain-of-Thought in LLMs via Step Entropy, https://arxiv.org/abs/2508.03346
- Tim, Nous Research, Aug 14, 2025, Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark, https://nousresearch.com/measuring-thinking-efficiency-in-reasoning-models-the-missing-benchmark/
- Ruiyang Zhang, Dongzhan Zhou, Zhedong Zheng, 6 Jan 2026, SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models, https://arxiv.org/abs/2601.02825
- Janvijay Singh, Dilek Hakkani-Tür, 6 Jan 2026, Do LLMs Encode Functional Importance of Reasoning Tokens? https://arxiv.org/abs/2601.03066
- Weixin Guan, Liang Li, Jiapeng Liu, Bing Li, Peng Fu, Chengyang Fang, Xiaoshuai Hao, Can Ma, Weiping Wang, 15 Mar 2026, Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring, https://arxiv.org/abs/2603.14251
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home