Aussie AI

Reasoning Inference Optimization

  • Last Updated 17 November, 2025
  • by David Spuler, Ph.D.

What is Reasoning Inference Optimization?

Reasoning Inference Optimization is the application of LLM inference optimization techniques to speed up reasoning models, such as Chain-of-Thought (CoT) algorithms. Although all of the standard LLM inference optimization techniques (more than 500 exist) can be applied to the inference steps in these multi-step reasoning methods, there are also additional optimizations that are specific to multi-step inference.

The main special feature of CoT and other reasoning algorithms is that they work on a sequence of texts, iteratively refining the answer until a best answer is chosen. Generally speaking, more steps is better than less steps, which gives rise to the "inference scaling law" whereby more inference computation increases the intelligence of the model. Hence, there is a trade-off between speed and capability.

Hence, the question is whether there are any cross-step optimizations that make use of prior texts in the sequence, whether they are continued paths or abandoned sequences. Some of the inference optimization techniques that may have particular applicability to multi-step reasoning algorithms include:

  • High-level reasoning algorithm changes (e.g., fewer steps, cutting short reasoning paths earlier, etc.)
  • Concise Chain-of-Thought (CCoT) — a particular enhancement to CoT whereby the model is prompted to be "concise" in its thoughts.
  • Token reduction methods — any of the techniques seem particularly applicable to Chain-of-Thought, which has long token sequences in its reasoning steps.
  • Prompt lookup decoding — use token sequences in prior reasoning steps for faster and more accurate drafting in speculative decoding.
  • Early exit enhancements — use text sequences from prior steps as part of the exiting decision, to allow more precise exiting with reduced accuracy loss.
  • Activation sparsification — CoT algorithms have much context, with effectively early drafts of the final output, which gives a strong indication of which signals are relevant to the final answer, so that dynamic sparsification of activations may be effective.
  • Fused substring KV caching — where a subsequence has been calculated by inference in a previous step, the KV cache for this substring can be reused (a generalization of prefix KV caching).

Another factor is that each step of a multi-step inference algorithm is working on an input text, and producing an output text. What does this sound like? Well, it's editing and revising! Hence, applicable research may also include:

Research on Reasoning Inference Optimization

Research papers on speeding up reasoning algorithms:

  • OpenAI, Dec 2024, OpenAI o1 and new tools for developers, https://openai.com/index/o1-and-new-tools-for-developers/ ("Lower latency: o1 uses on average 60% fewer reasoning tokens than o1-preview for a given request.")
  • 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561 (Compressing the interim token sequences in Chain-of-Thought.)
  • Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber, 2 Nov 2023, Implicit Chain of Thought Reasoning via Knowledge Distillation, https://arxiv.org/abs/2311.01460 (Knowledge distillation applied to optimizing the interim computations in Chain-of-Thought.)
  • Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou, 16 Dec 2024, C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness, https://arxiv.org/abs/2412.11664 (Token pruning and prompt compression for Chain-of-Thought.)
  • Hongxuan Zhang, Zhining Liu, Yao Zhao, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen, 4 Jun 2024 (v2), Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster, https://arxiv.org/abs/2311.08263
  • Jeffrey Cheng, Benjamin Van Durme, 17 Dec 2024, Compressed Chain of Thought: Efficient Reasoning Through Dense Representations, https://arxiv.org/abs/2412.13171 (Context compression applied to interim CoT reasoning steps.)
  • Libo Wang, 11 Dec 2024 (v4), Reducing Reasoning Costs -- The Path of Optimization for Chain of Thought via Sparse Attention Mechanism, https://arxiv.org/abs/2411.09111 https://github.com/brucewang123456789/GeniusTrail.git
  • Sachin Kumar, Sep 17, 2024, Hidden Chain-of-Thought decoding: faster and efficient CoT decoding to improve reasoning of LLMs, https://medium.com/@techsachin/hidden-chain-of-thought-decoding-faster-and-efficient-cot-decoding-to-improve-reasoning-of-llms-d95584bc9346
  • Devmallya Karar, Oct 4, 2024, Chain-Of-Thought ( CoT ) in Large Language Models prompting and Concise CoT with Code implementation using Python and PyTorch, https://medium.com/@devmallyakarar/chain-of-thought-cot-in-large-language-models-prompting-and-concise-cot-with-code-82821f9a832d
  • Cobus Greyling, Jan 24, 2024, Concise Chain-of-Thought (CCoT) Prompting, Traditional CoT comes at a cost of increased output token usage, CCoT prompting is a prompt-engineering technique which is aimed at reducing LLM response verbosity & inference time. https://cobusgreyling.substack.com/p/concise-chain-of-thought-ccot-prompting
  • Matthew Renze, Erhan Guven 19 Oct 2024 (v3), The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models, https://arxiv.org/abs/2401.05618 https://github.com/matthewrenze/jhu-concise-cot (The original paper on Concise CoT.)
  • Tyler McDonald, Anthony Colosimo, Yifeng Li, Ali Emami, 2 Dec 2024, Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index, https://arxiv.org/abs/2412.01690
  • David Spuler, Dec 21st, 2024, Multi-Step Reasoning Inference Optimization, Aussie AI Blog, https://www.aussieai.com/blog/reasoning-inference-optimization
  • Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu, 22 Dec 2024, Teaching LLMs to Refine with Tools, https://arxiv.org/abs/2412.16871
  • Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin, 31 Oct 2024 (v2), Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2406.09136 https://github.com/sail-sg/CPO
  • Tingxu Han, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen, Zhenting Wang, 30 Dec 2024 (v2), Token-Budget-Aware LLM Reasoning, https://arxiv.org/abs/2412.18547 https://github.com/GeniusHTX/TALE
  • Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi, 14 Aug 2024 (v2), CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference, https://arxiv.org/abs/2310.10845
  • Shiv Sakhuja, 25 Sep 2024, Chain-of-Thought (CoT) Prompting Explained: 7 Techniques for Optimizing AI Performance, https://hub.athina.ai/athina-originals/guides-chain-of-thought-cot-prompting-explained-7-techniques-for-optimizing-ai-performance/
  • Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Ji ayang, Yue Zhang, Xipeng Qiu, and Zheng Zhang. 2024. Can language models learn to skip steps? In The Thirty-eighth Annual Conference on Neural In formation Processing Systems. https://arxiv.org/abs/2411.01855
  • Mayi Xu, Yunfeng Ning, Yongqi Li, Jianhao Chen, Jintao Wen, Yao Xiao, Shen Zhou, Birong Pan, Zepeng Bao, Xin Miao, Hankun Kang, Ke Sun, Tieyun Qian, 2 Jan 2025, Reasoning based on symbolic and parametric knowledge bases: a survey, https://arxiv.org/abs/2501.01030 (Extensive survey of reasoning from CoT to knowledge graphs to table-based reasoning.)
  • Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
  • Zirui Zhao, Hanze Dong, Amrita Saha, Caiming Xiong, Doyen Sahoo, 10 Oct 2024, Automatic Curriculum Expert Iteration for Reliable LLM Reasoning, https://arxiv.org/abs/2410.07627 (Efficiency of bailing out with "I don't know" or refusals versus continuing reasoning steps.)
  • Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu, Jacob Andreas, 7 Oct 2024, Learning How Hard to Think: Input-Adaptive Allocation of LM Computation, https://arxiv.org/abs/2410.04707
  • Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li, 24 Aug 2024, Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning, https://arxiv.org/abs/2408.13457
  • Rohin Manvi, Anikait Singh, Stefano Ermon, 3 Oct 2024, Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation, https://arxiv.org/abs/2410.02725
  • Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li, 19 Jan 2024, Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. The Twelfth International Conference on Learning Representations, 2024, https://arxiv.org/abs/2401.10480 https://github.com/Yiwei98/ESC (Uses "early stopping" idea to improve CoT efficiency during inference.)
  • Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou, 25 Aug 2024, Path-Consistency: Prefix Enhancement for Efficient Inference in LLM, https://arxiv.org/abs/2409.01281 (Uses the confidence calculations from earlier branches of the reasoning to improve efficiency.)
  • Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam, 16 Nov 2023 (v2), Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs, EMNLP 2023, https://arxiv.org/abs/2305.11860 https://www.sample-step-by-step.info/
  • Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
  • Sania Nayab, Giulio Rossolini, Giorgio Buttazzo, Nicolamaria Manes, Fabrizio Giacomelli, 29 Jul 2024, Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost, https://arxiv.org/abs/2407.19825
  • Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao, 8 Feb 2024 (v3), Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning, https://arxiv.org/abs/2310.03094 (Efficient CoT using smaller models.)
  • Wenqing Chen, Weicheng Wang, Zhixuan Chu, Kui Ren, Zibin Zheng, and Zhichao Lu. 2024. Self-Para-Consistency: Improving Reasoning Tasks at Low Cost for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 14162–14167, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024.findings-acl.842/ (Generate multiple paraphrased answers, which can reduce tokens as fewer are needed.)
  • Zhen Li, Yupeng Su, Runming Yang, Zhongwei Xie, Ngai Wong, Hongxia Yang, 6 Jan 2025, Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning, https://arxiv.org/abs/2501.03035 (Analysis of quantization's effect on CoT and math reasoning.)
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Janelle Teng, Dec 24, 2024, Unwrapping OpenAI’s o3, https://nextbigteng.substack.com/p/unwrapping-openai-o3-reasoning-model ("...it costs a whopping $17-$20 per task to run o3 in low-compute mode...o3 and other CoT models are currently expensive at inference")
  • Sungjae Lee, Hyejin Park, Jaechang Kim, Jungseul Ok, 10 Jan 2025, Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models, https://arxiv.org/abs/2501.05752 (CoT optimization by avoiding redundant paths that have identical semantics.)
  • Zheqi Lv, Wenkai Wang, Jiawei Wang, Shengyu Zhang, Fei Wu, 10 Jan 2025, Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models, https://arxiv.org/abs/2501.05662 (Optimize multimodal CoT by breaking down prompts into smaller sub-goals.)
  • Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, Andrew D. White, 30 Dec 2024, Aviary: training language agents on challenging scientific tasks, https://arxiv.org/abs/2412.21154 (Using smaller models combined with multi-step reasoning to compete with big models with 100x less inference cost.)
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, 22 Jan 2025, O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning, https://arxiv.org/abs/2501.12570 https://github.com/StarDewXXX/O1-Pruner
  • Zunhai Su, Zhe Chen, Wang Shen, Hanyu Wei, Linge Li, Huangqi Yu, Kehong Yuan, 25 Jan 2025, RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations, https://arxiv.org/abs/2501.16383 (INT2 KV caching with special handling of outliers, RoPE, and attention sinks, and the resulting architecture works in Chain-of-Thought.)
  • Jianing Sun, Zhichao Zhang, Xiaopu Wang, Xinyuan Ji, Yizhi Zhang, October 17, 2024; revised December 23, 2024, Fallback Prompting Guides Large Language Models for Accurate Responses in Complex Reasoning, https://iecscience.org/uploads/jpapers/202501/cGU5HbpaB6LvuhKhZHeAPUzYGglrSN2xxASwBlPH.pdf
  • Zishun Yu, Tengyu Xu, Di Jin, Karthik Abinav Sankararaman, Yun He, Wenxuan Zhou, Zhouhao Zeng, Eryk Helenowski, Chen Zhu, Sinong Wang, Hao Ma, Han Fang, 29 Jan 2025, Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization, https://arxiv.org/abs/2501.17974 (CoT optimization using an inference budget.)
  • Mayi Xu, Yongqi Li, Ke Sun, and Tieyun Qian, 2024, Adaption-of-thought: Learning question difficulty improves large language models for reasoning, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5468–5495, 2024, https://aclanthology.org/2024.emnlp-main.313/ https://aclanthology.org/2024.emnlp-main.313.pdf
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Joonwon Jang, Jaehee Kim, Wonbin Kweon, Hwanjo Yu, 31 Dec 2024 (v2), Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria, https://arxiv.org/abs/2412.21006
  • Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Jan 2025, Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs, https://arxiv.org/abs/2501.18585
  • Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
  • Mohammed Karimkhan Pathan, February 3, 2025, Open-source revolution: How DeepSeek-R1 challenges OpenAI’s o1 with superior processing, cost efficiency, https://venturebeat.com/ai/open-source-revolution-how-deepseek-r1-challenges-openais-o1-with-superior-processing-cost-efficiency/
  • Ben Dickson, February 5, 2025, Not every AI prompt deserves multiple seconds of thinking: how Meta is teaching models to prioritize, https://venturebeat.com/ai/not-every-ai-prompt-deserves-multiple-seconds-of-thinking-how-meta-is-teaching-models-to-prioritize/
  • Zhi Zhou, Tan Yuhao, Zenan Li, Yuan Yao, Lan-Zhe Guo, Xiaoxing Ma, Yu-Feng Li, 1 Feb 2025, Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning, https://arxiv.org/abs/2502.00511
  • Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica, 11 Feb 2025, LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! https://arxiv.org/abs/2502.07374 https://github.com/NovaSky-AI/SkyThought (Learning to reason with SFT and LoRA.)
  • Daman Arora, Andrea Zanette, 11 Feb 2025 (v2), Training Language Models to Reason Efficiently, https://arxiv.org/abs/2502.04463 https://github.com/Zanette-Labs/efficient-reasoning
  • Jin, M., Yu, Q., Shu, D., Zhao, H., Hua, W., Meng, Y., Zhang, Y., and Du, M. The impact of reasoning step length on large language models. In Ku, L.-W., Martins, A., and Srikumar, V. (eds.), Findings of the Association for Computational Linguistics: ACL 2024, pp. 1830 1842, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024. https://aclanthology.org/2024.findings-acl.108/ (Shows that token reduction does reduce accuracy in reasoning.)
  • Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li, 17 Feb 2025, TokenSkip: Controllable Chain-of-Thought Compression in LLMs, https://arxiv.org/abs/2502.12067
  • Yingqian Cui, Pengfei He, Jingying Zeng, Hui Liu, Xianfeng Tang, Zhenwei Dai, Yan Han, Chen Luo, Jing Huang, Zhen Li, Suhang Wang, Yue Xing, Jiliang Tang, Qi He, 18 Feb 2025, Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models, https://arxiv.org/abs/2502.13260
  • XYZ Labs, Feb 23, 2025, Open Reasoner Zero: A Breakthrough in AI Training Efficiency Matches DeepSeek with Just 1/30th of Training Steps. Major AI Figures Including Kai-Fu Lee, Harry Shum, and Xiangyu Zhang Unveil Revolutionary Open-Source Training Method. https://xyzlabs.substack.com/p/open-reasoner-zero-a-breakthrough
  • Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
  • Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang, 21 Feb 2025, LightThinker: Thinking Step-by-Step Compression, https://arxiv.org/abs/2502.15589 https://github.com/zjunlp/LightThinker (Faster CoT by compressing the text of intermediate reasoning steps with gist tokens.)
  • Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He, 25 Feb 2025, Chain of Draft: Thinking Faster by Writing Less, https://arxiv.org/abs/2502.18600 (Concise CoT method using a per-step inference budget.)
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Ayeong Lee, Ethan Che, Tianyi Peng, 3 Mar 2025, How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach, https://arxiv.org/abs/2503.01141
  • Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang, 25 Feb 2025, Efficient Test-Time Scaling via Self-Calibration, https://arxiv.org/abs/2503.00031
  • Anonymous authors, Mar 2025, Pencil: Long Throughts with Short Memory, ICLR 2025 review, https://openreview.net/pdf?id=KRI2Fmffqr (Using a "memory" during reasoning steps to rewrite intermediate CoT steps in shorter form.)
  • Y Fu, J Chen, Y Zhuang, Z Fu, I Stoica, H Zhang, Mar 2025, Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing, ICLR 2025 review, https://openreview.net/pdf?id=wpK4IMJfdX (Shortening CoT reasoning paths by "probing" the LLM in the middle of the sequence to examine if it already has the final answer, thereby avoiding unnecessary extra reasoning steps.)
  • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
  • Ashraf Eassa, Anjali Shah, Huizi Mao, Hao Lu, Erin Ho, Justin Xin and Omri Almog, Mar 18, 2025, NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance, https://developer.nvidia.com/blog/nvidia-blackwell-delivers-world-record-deepseek-r1-inference-performance/
  • Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng, 27 Mar 2025, A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond, https://arxiv.org/abs/2503.21614
  • Yijiong Yu, 26 Mar 2025, Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence, https://arxiv.org/abs/2503.20533 https://github.com/yuyijiong/parallel-decoding-in-one-sequence
  • Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Hanjie Chen, Xia Hu, 23 Mar 2025 (v2), Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models, https://arxiv.org/abs/2503.16419
  • Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui, 16 May 2025, SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning, https://arxiv.org/abs/2505.11274
  • Chengyu Huang, Zhengxin Zhang, Claire Cardie, 16 May 2025, HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization, https://arxiv.org/abs/2505.11225
  • Songjun Tu, Jiahao Lin, Qichao Zhang, Xiangyu Tian, Linjing Li, Xiangyuan Lan, Dongbin Zhao, 16 May 2025, Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL, https://arxiv.org/abs/2505.10832
  • Zemin Huang, Zhiyang Chen, Zijun Wang, Tiancheng Li, Guo-Jun Qi, 21 May 2025 (v2), Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models, https://arxiv.org/abs/2505.10446
  • Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak, 14 May 2025 (v2), Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement, https://arxiv.org/abs/2505.07961
  • Muzhi Dai, Chenxu Yang, Qingyi Si, 17 May 2025 (v2), S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models, https://arxiv.org/abs/2505.07686
  • Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong, 21 May 2025 (v2), Scalable Chain of Thoughts via Elastic Reasoning, https://arxiv.org/abs/2505.05315 https://github.com/SalesforceAIResearch/Elastic-Reasoning
  • Bin Yu, Hang Yuan, Haotian Li, Xueyin Xu, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen, 21 May 2025 (v2), Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models, https://arxiv.org/abs/2505.03469
  • Jingyang Yi, Jiazheng Wang, Sida Li, 16 May 2025 (v2), ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning, https://arxiv.org/abs/2504.21370
  • Jiin Kim, Byeongjun Shin, Jinha Chung, Minsoo Rhu, 4 Jun 2025, The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective, https://arxiv.org/abs/2506.04301
  • Guanghao Li, Wenhao Jiang, Mingfeng Chen, Yan Li, Hao Yu, Shuting Dong, Tao Ren, Ming Tang, Chun Yuan, 30 May 2025, SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought, https://arxiv.org/abs/2505.24181
  • Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu, 28 May 2025, AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models, https://arxiv.org/abs/2505.22662
  • Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, Aixin Sun, 28 May 2025, CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models, https://arxiv.org/abs/2505.22017
  • Sohyun An, Ruochen Wang, Tianyi Zhou, Cho-Jui Hsieh, 27 May 2025, Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models, https://arxiv.org/abs/2505.21765
  • Xixian Yong, Xiao Zhou, Yingying Zhang, Jinlin Li, Yefeng Zheng, Xian Wu, 23 May 2025, Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens, https://arxiv.org/abs/2505.18237
  • Jinyan Su, Claire Cardie, 23 May 2025, Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards, https://arxiv.org/abs/2505.18298
  • Ruihan Gong, Yue Liu, Wenjie Qu, Mingzhe Du, Yufei He, Yingwei Ma, Yulin Chen, Xiang Liu, Yi Wen, Xinfeng Li, Ruidong Wang, Xinzhong Zhu, Bryan Hooi, Jiaheng Zhang, 26 May 2025, Efficient Reasoning via Chain of Unconscious Thought, https://arxiv.org/abs/2505.19756
  • Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong, 23 May 2025, Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning, https://arxiv.org/abs/2505.17829
  • Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu, 8 Jun 2025, How Far Are We from Optimal Reasoning Efficiency? https://arxiv.org/abs/2506.07104
  • Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty, 23 May 2025, First Finish Search: Efficient Test-Time Scaling in Large Language Models, https://arxiv.org/abs/2505.18149 (Running multiple parallel decoding steps but stopping when the fastest and usually shortest one completes.)
  • Sebastian Raschka, Mar 8, 2025, Inference-Time Compute Scaling Methods to Improve Reasoning Models: Part 1: Inference-Time Compute Scaling Methods, https://sebastianraschka.com/blog/2025/state-of-llm-reasoning-and-inference-scaling.html
  • Joonwon Jang, Jaehee Kim, Wonbin Kweon, Seonghyeon Lee, Hwanjo Yu, July 2025, Verbosity-Aware Rationale Reduction: Sentence-Level Rationale Reduction for Efficient and Effective Reasoning, Findings of the Association for Computational Linguistics: ACL 2025, pages 20769–20784 July 27- August 1, 2025, https://aclanthology.org/2025.findings-acl.1068.pdf
  • Xingyu Wu, Yuchen Yan, Shangke Lyu, Linjuan Wu, Yiwen Qiu, Yongliang Shen, Weiming Lu, Jian Shao, Jun Xiao, Yueting Zhuang, 21 Jul 2025, LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization, https://arxiv.org/abs/2507.15758 https://github.com/zju-real/lapo https://zju-real.github.io/lapo
  • Jason Zhu, Hongyu Li, 13 Jul 2025, Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey, https://arxiv.org/abs/2507.09662
  • Hiroshi Yoshihara, Taiki Yamaguchi, Yuichi Inoue, 11 Jul 2025, A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning, https://arxiv.org/abs/2507.08267 https://github.com/analokmaus/kaggle-aimo2-fast-math-r1
  • Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
  • Yifei Xu, Tusher Chakraborty, Srinagesh Sharma, Leonardo Nunes, Emre Kıcıman, Songwu Lu, Ranveer Chandra, 16 Jun 2025, Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks, https://arxiv.org/abs/2506.13351
  • Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, Tao Lin, 30 Jun 2025, Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model, https://arxiv.org/abs/2506.23840
  • Shu Yang, Junchao Wu, Xuansheng Wu, Derek Wong, Ninhao Liu, Di Wang, 24 Jun 2025, Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs, https://arxiv.org/abs/2506.19492
  • Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, Ge Yu, 12 Jun 2025, ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization, https://arxiv.org/abs/2506.10822 https://github.com/NEUIR/ReCUT
  • Ye Yu, Yaoning Yu, Haohan Wang, 12 Jun 2025, PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models, https://arxiv.org/abs/2506.10716
  • Zehui Ling, Deshu Chen, Hongwei Zhang, Yifeng Jiao, Xin Guo, Yuan Cheng, 12 Jun 2025, Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty, https://arxiv.org/abs/2506.10446
  • Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, Zheng Hu, 20 May 2025, FlashThink: An Early Exit Method For Efficient Reasoning, https://arxiv.org/abs/2505.13949
  • Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim, 20 May 2025, Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning, https://arxiv.org/abs/2505.13866 https://github.com/jiwonsong-dev/ReasoningPathCompression
  • Yuhang Wang, Youhe Jiang, Bin Cui, Fangcheng Fu, 19 May 2025, Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately, https://arxiv.org/abs/2505.13326
  • Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, (many more authors), 30 Jun 2025 (v4), Llama-Nemotron: Efficient Reasoning Models, https://arxiv.org/abs/2505.00949
  • Jikai Wang, Juntao Li, Jianye Hou, Bowen Yan, Lijun Wu, Min Zhang, 25 May 2025 (v2), Efficient Reasoning for LLMs through Speculative Chain-of-Thought, https://arxiv.org/abs/2504.19095 https://github.com/Jikai0Wang/Speculative_CoT
  • Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr, 21 Apr 2025, Learning Adaptive Parallel Reasoning with Language Models, https://arxiv.org/abs/2504.15466
  • Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Qiaowei Li, Zheng Lin, Li Cao, Weiping Wang, 17 May 2025 (v2), Dynamic Early Exit in Reasoning Models, https://arxiv.org/abs/2504.15895
  • Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He, 4 Aug 2025 (v2), Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models, https://arxiv.org/abs/2504.13626 (Using smaller models to generate the interim reasoning steps for multi-step reasoning.)
  • Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang, 15 Apr 2025, Efficient Reasoning Models: A Survey, https://arxiv.org/abs/2504.10903
  • Tim, Nous Research, Aug 14, 2025, Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark, https://nousresearch.com/measuring-thinking-efficiency-in-reasoning-models-the-missing-benchmark/
  • Bin Hong, Jiayu Liu, Zhenya Huang, Kai Zhang, Mengdi Zhang, 13 Aug 2025, Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization, https://arxiv.org/abs/2508.10164
  • Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che, 15 Aug 2025, Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models, https://arxiv.org/abs/2508.11582
  • Yufeng Zhao, Junnan Liu, Hongwei Liu, Dongsheng Zhu, Yuan Shen, Songyang Zhang, Kai Chen, 21 Aug 2025, Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis, https://arxiv.org/abs/2508.15754
  • Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali, 9 Aug 2025, Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning, https://arxiv.org/abs/2508.07101
  • Yeonjun In, Wonjoong Kim, Sangwu Park, Chanyoung Park, 1 Aug 2025, R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge, https://arxiv.org/abs/2508.00324
  • Jiameng Huang, Baijiong Lin, Guhao Feng, Jierun Chen, Di He, and Lu Hou, 7 Aug 2025, Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression, https://arxiv.org/abs/2508.05337
  • Hasan Abed Al Kader Hammoud, Kumail Alhamoud, Abed Hammoud, Elie Bou-Zeid, Marzyeh Ghassemi, Bernard Ghanem, 12 Aug 2025, Train Long, Think Short: Curriculum Learning for Efficient Reasoning, https://arxiv.org/abs/2508.08940
  • Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho, 19 Aug 2025, Hawkeye:Efficient Reasoning with Model Collaboration, https://arxiv.org/abs/2504.00424
  • Chuhuai Yue, Chengqi Dong, Yinan Gao, Hang He, Jiajun Chai, Guojun Yin and Wei Lin, 14 Aug 2025, Promoting Efficient Reasoning with Verifiable Stepwise Reward, https://arxiv.org/abs/2508.10293
  • Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, and Bohan Zhuang, 23 Jul 2025, R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning, https://arxiv.org/abs/2507.17307
  • Wei Sun, Qianlong Du, Fuwei Cui, Jiajun Zhang, 23 Jul 2025, An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning, https://arxiv.org/abs/2503.02382
  • Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang and Peng Zhang, 23 Jul 2025, Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning, https://arxiv.org/abs/2507.16802
  • Shangke Lyu, Linjuan Wu, Yuchen Yan, Xingyu Wu, Hao Li, Yongliang Shen, Peisheng Jiang, Weiming Lu, Jun Xiao, Yueting Zhuang, 22 Jul 2025, Hierarchical Budget Policy Optimization for Adaptive Reasoning, https://arxiv.org/abs/2507.15844
  • Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta, 24 Jul 2025, Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models, https://arxiv.org/abs/2507.18014
  • Bowen Zhang, Pengcheng Luo, 24 Jul 2025, OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM, https://arxiv.org/abs/2503.10009
  • Jiaao Li, Kaiyuan Li, Chen Gao, Yong Li, Xinlei Chen, 21 Jul 2025, EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent, https://arxiv.org/abs/2507.15428
  • Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou, 21 Jul 2025, Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning, https://arxiv.org/abs/2505.16122
  • Bo-Cheng Chiu, Jen-Jee Chen, Yu-Chee Tseng and Feng-Chi Chen, 21 Jul 2025, DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs, https://arxiv.org/abs/2506.11558
  • Zhenpeng Su, Leiyu Pan, Xue Bai, Dening Liu, Guanting Dong, Jiaming Huang, Wenping Hu, Guorui Zhou, 11 Aug 2025, Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization, https://arxiv.org/abs/2508.07629
  • Kaiwen Chen, Xin Tan, Minchen Yu, Hong Xu, 29 Jul 2025, MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse, https://arxiv.org/abs/2507.21433
  • Tao He, Rongchuan Mu, Lizi Liao, Yixin Cao, Ming Liu, and Bing Qin, 31 Jul 2025, Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner, https://arxiv.org/abs/2507.23317
  • Haoran Sun, Shaoning Zeng, 23 Jul 2025, Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents, https://arxiv.org/abs/2507.22925
  • Chuan Li, Qianyi Zhao, Fengran Mo, Cen Chen, 7 Aug 2025, FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2508.10020
  • Kai Zhao, Yanjun Zhao, Jiaming Song, Shien He, Lusheng Zhang, Qiang Zhang, Tianjiao Li, 8 Aug 2025, SABER: Switchable and Balanced Training for Efficient LLM Reasoning, https://arxiv.org/abs/2508.10026
  • Xiaotao Feng, Xiaogang Zhu, Kun Hu, Jincheng Wang, Yingjie Cao, Guang Gong, Jianfeng Pan, 30 Jun 2025, Fuzzing: Randomness? Reasoning! Efficient Directed Fuzzing via Large Language Models, https://arxiv.org/abs/2507.22065
  • Yingxu Wang, Shiqi Fan, Mengzhu Wang, Siwei Liu, 1 Aug 2025, Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA, https://arxiv.org/abs/2508.00719
  • Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Daxin Jiang, Binxing Jiao, Chen Hu, Huacan Wang, 4 Aug 2025, SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents, https://arxiv.org/abs/2508.02085
  • Linan Yue, Yichao Du, Yizhi Wang, Weibo Gao, Fangzhou Yao, Li Wang, Ye Liu, Ziyu Xu, Qi Liu, Shimin Di, Min-Ling Zhang, 4 Aug 2025, Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models, https://arxiv.org/abs/2508.02120
  • Newman Cheng, Gordon Broadbent, William Chappell, 4 Aug 2025, Cognitive Loop via In-Situ Optimization: Self-Adaptive Reasoning for Science, https://arxiv.org/abs/2508.02789
  • Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang, 7 Aug 2025, InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities, https://arxiv.org/abs/2508.05496
  • Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu, 8 Aug 2025, Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal, https://arxiv.org/abs/2508.05988
  • Chen Li, Han Zhang, Zhantao Yang, Fangyi Chen, Zihan Wang, Anudeepsekhar Bolimera, Marios Savvides, 12 Aug 2025, STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision, https://arxiv.org/abs/2508.08688
  • Xiaojun Wu, Xiaoguang Jiang, Huiyang Li, Jucai Zhai, Dengfeng Liu, Qiaobo Hao, Huang Liu, Zhiguo Yang, Ji Xie, Ninglun Gu, Jin Yang, Kailai Zhang, Yelun Bao, Jun Wang, 13 Aug 2025, Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning, https://arxiv.org/abs/2508.09883
  • Vaishnavi Shrivastava, Ahmed Awadallah, Vidhisha Balachandran, Shivam Garg, Harkirat Behl, Dimitris Papailiopoulos, 13 Aug 2025, Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning, https://arxiv.org/abs/2508.09726
  • Yuyang Xu, Yi Cheng, Haochao Ying, Zhuoyun Du, Renjun Hu, Xing Shi, Wei Lin, Jian Wu, 18 Aug 2025, SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression, https://arxiv.org/abs/2508.12604
  • Beinuo Yang, Qishen Zhou, Junyi Li, Xingchen Su, Simon Hu, 20 Aug 2025, Automated Optimization Modeling through Expert-Guided Large Language Model Reasoning, https://arxiv.org/abs/2508.14410
  • NVIDIA: Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adi Renduchintala, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan, Ashton Sharabiani, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Banghua Zhu, Barnaby Simkin, Bilal Kartal, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Brian Yu, Bryan Catanzaro, Charles Wang, Charlie Truong, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christian Munley, Christopher Parisien, Dan Su, Daniel Afrimi, Daniel Korzekwa, Daniel Rohrer, Daria Gitman, et al. (161 additional authors not shown), 20 Aug 2025, NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model, https://arxiv.org/abs/2508.14444
  • Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao, 22 Aug 2025, Efficient RL Training for Reasoning Models via Length-Aware Optimization, https://arxiv.org/abs/2505.12284
  • Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel, 24 Aug 2025, PRISM: Efficient Long-Range Reasoning With Short-Context LLMs, https://arxiv.org/abs/2412.18914
  • Sunguk Choi, Yonghoon Kwon, Heondeuk Lee, 26 Aug 2025, CAC-CoT: Connector-Aware Compact Chain-of-Thought for Efficient Reasoning Data Synthesis Across Dual-System Cognitive Tasks, https://arxiv.org/abs/2508.18743
  • Chenghao Wu, Ruiyang Ren, Junjie Zhang, Ruirui Wang, Zhongrui Ma, Qi Ye, Wayne Xin Zhao, 26 Aug 2025, STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning, https://arxiv.org/abs/2508.18812
  • Wenfeng Feng, Penghong Zhao, Guochao Jiang, Chuzhan Hao, Yuewei Zhang, Hao Wang, 28 Aug 2025, PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning, https://arxiv.org/abs/2508.21104
  • Juhyeon Lee, Wonduk Seo, Hyunjin An, Seunghyun Lee, Yi Bu, 2 Sep 2025, Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization, https://arxiv.org/abs/2509.02093
  • Salah Eddine Bekhouche, Abdellah Zakaria Sellam, Hichem Telli, Cosimo Distante, Abdenour Hadid, 30 Aug 2025, CVPD at QIAS 2025 Shared Task: An Efficient Encoder-Based Approach for Islamic Inheritance Reasoning, https://arxiv.org/abs/2509.00457
  • Sadegh Jafari, Aishwarya Sarkar, Mohiuddin Bilwal, Ali Jannesari, 6 Sep 2025, ProfilingAgent: Profiling-Guided Agentic Reasoning for Adaptive Model Optimization, https://arxiv.org/abs/2509.05584
  • Pranav Pawar, Dhwaj Jain, Varun Gupta, Kaustav Dedhia, Dashrath Kale, Sudhir Dhekane, 8 Sep 2025, Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning, https://arxiv.org/abs/2509.07238
  • Zhoujun Cheng, Richard Fan, Shibo Hao, Taylor W. Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno, Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute, Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu, Zhengzhong Liu, Eric P. Xing, 9 Sep 2025, K2-Think: A Parameter-Efficient Reasoning System, https://arxiv.org/abs/2509.07604
  • Brennen Hill, 11 Sep 2025, HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning, https://arxiv.org/abs/2509.09801
  • Bingning Huang and Tu Nguyen and Matthieu Zimmer, 11 Sep 2025, Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning, https://arxiv.org/abs/2509.09284
  • Xianzhi Li, Ethan Callanan, Abdellah Ghassel, Xiaodan Zhu, 11 Sep 2025, Entropy-Gated Branching for Efficient Test-Time Reasoning, https://arxiv.org/abs/2503.21961
  • Abdarahmane Traore, \'Eric Hervet, Andy Couturier, 18 Sep 2025, SmolRGPT: Efficient Spatial Reasoning for Warehouse Environments with 600M Parameters, https://arxiv.org/abs/2509.15490
  • Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Guanbo Wang, Fandong Meng, Jie Zhou, Ju Ren, Yaoxue Zhang, 19 Sep 2025, ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning, https://arxiv.org/abs/2505.04881
  • Simon A. Aytes, Jinheon Baek, Sung Ju Hwang, 16 Sep 2025, Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching, https://arxiv.org/abs/2503.05179
  • Zhaohui Yang, Yuxiao Ye, Shilei Jiang, Chen Hu, Linjing Li, Shihong Deng, Daxin Jiang, 15 Sep 2025, Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning, https://arxiv.org/abs/2505.14403
  • Pratik Jayarao and Himanshu Gupta and Neeraj Varshney and Chaitanya Dwivedi, 9 Sep 2025, Explicit Reasoning Makes Better Judges: A Systematic Study on Accuracy, Efficiency, and Robustness, https://arxiv.org/abs/2509.13332
  • Qikai Chang, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Yicheng Pan, Jianshu Zhang, Jun Du, Quan Liu, Jianqing Gao, 17 Sep 2025, THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning, https://arxiv.org/abs/2509.13761
  • Zhenqi Wu, Abhinav Modi, Angelos Mavrogiannis, Kaustubh Joshi, Nikhil Chopra, Yiannis Aloimonos, Nare Karapetyan, Ioannis Rekleitis, Xiaomin Lin, 17 Sep 2025, DREAM: Domain-aware Reasoning for Efficient Autonomous Underwater Monitoring, https://arxiv.org/abs/2509.13666
  • Giovanni Monea, Yair Feldman, Shankar Padmanabhan, Kianté Brantley, Yoav Artzi, 15 Oct 2025, Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons, https://arxiv.org/abs/2510.13797
  • Yujian Zhang, Keyu Chen, Zhifeng Shen, Ruizhi Qiao, Xing Sun, 14 Oct 2025 (v2), Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning, https://arxiv.org/abs/2510.10207
  • Canhui Wu, Qiong Cao, Chang Li, Zhenfang Wang, Chao Xue, Yuwei Fan, Wei Xi, Xiaodong He, 4 Oct 2025, Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models, https://arxiv.org/abs/2510.03805
  • Shaoan Xie, Lingjing Kong, Xiangchen Song, Xinshuai Dong, Guangyi Chen, Eric P.Xing, and Kun Zhang, 2 Oct 2025, Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models, https://arxiv.org/abs/2510.01544
  • Akshat Ramachandran, Marina Neseem, Charbel Sakr, Rangharajan Venkatesan, Brucek Khailany, Tushar Krishna, 1 Oct 2025, ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models, https://arxiv.org/abs/2510.01290
  • Weizhe Chen, Sven Koenig, Bistra Dilkina, 1 Oct 2025, LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning, https://arxiv.org/abs/2510.01459
  • Jiashun Liu, Johan Obando-Ceron, Han Lu, Yancheng He, Weixun Wang, Wenbo Su, Bo Zheng, Pablo Samuel Castro, Aaron Courville, Ling Pan, 2 Oct 2025, Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning, https://arxiv.org/abs/2510.01656
  • Sunzhu Li, Zhiyu Lin, Shuling Yang, Jiale Zhao, Wei Chen, 14 Oct 2025, ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization, https://arxiv.org/abs/2510.12063
  • Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, Zhiqiang Xu, 14 Oct 2025, Concise Reasoning in the Lens of Lagrangian Optimization, https://arxiv.org/abs/2510.10168
  • Haoyue Zhang, Hualei Zhang, Xiaosong Ma, Jie Zhang, Song Guo, 14 Oct 2025, LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning, https://arxiv.org/abs/2506.15969
  • Dongqi Zheng, 29 Sep 2025, ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models, https://arxiv.org/abs/2510.00071
  • Yunhao Wang, Ziting Li, Shuai Chen, Tao Liu, Chao Song, Junjie Jiang, Jian Zhu, Peng Gao, Bin Qin, 1 Oct 2025, ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning, https://arxiv.org/abs/2510.00690
  • Luckeciano C. Melo, Alessandro Abate, Yarin Gal, 1 Oct 2025, Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning, https://arxiv.org/abs/2510.00819
  • Yongcheng Zeng, Zexu Sun, Bokai Ji, Erxue Min, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Haifeng Zhang, Xu Chen, Jun Wang, 1 Oct 2025, CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs, https://arxiv.org/abs/2510.01037
  • Oussama Gabouj, Kamel Charaf, Ivan Zakazov, Nicolas Baldwin, Robert West, 1 Oct 2025, GRAD: Generative Retrieval-Aligned Demonstration Sampler for Efficient Few-Shot Reasoning, https://arxiv.org/abs/2510.01165
  • Siao Tang, Xinyin Ma, Gongfan Fang, Xinchao Wang, 1 Oct 2025, ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation, https://arxiv.org/abs/2506.18810
  • Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi Lin, 1 Oct 2025, Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO, https://arxiv.org/abs/2505.11595
  • Anisha Garg, Engin Tekin, Yash More, David Bick, Nishit Neema, Ganesh Venkatesh, 24 Sep 2025, Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving, https://arxiv.org/abs/2509.19681
  • Sujun Tang, Christopher Priebe, Rohan Mahapatra, Lianhui Qin, Hadi Esmaeilzadeh, 27 Oct 2025, REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving, https://arxiv.org/abs/2506.01374
  • Shiqi He, Yue Cui, Xinyu Ma, Yaliang Li, Bolin Ding, Mosharaf Chowdhury, 18 Oct 2025, Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory, https://arxiv.org/abs/2510.19838
  • Junfeng Gong and Zhiyi Wei and Junying Chen and Cheng Liu and Huawei Li, 22 Oct 2025, From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph, https://arxiv.org/abs/2510.19873
  • Ling Team, Bin Han, Caizhi Tang, Chen Liang, Donghao Zhang, Fan Yuan, Feng Zhu, Jie Gao, Jingyu Hu, Longfei Li, Meng Li, Mingyang Zhang, Peijie Jiang, Peng Jiao, Qian Zhao, Qingyuan Yang, Wenbo Shen, Xinxing Yang, Yalin Zhang, Yankun Ren, Yao Zhao, Yibo Cao, Yixuan Sun, Yue Zhang, Yuchen Fang, Zibin Lin, Zixuan Cheng, Jun Zhou, 23 Oct 2025, Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning, https://arxiv.org/abs/2510.19338
  • Xiaozhe Li, Xinyu Fang, Shengyuan Ding, Linyang Li, Haodong Duan, Qingwen Liu, Kai Chen, 18 Oct 2025, NP-Engine: Empowering Optimization Reasoning in Large Language Models with Verifiable Synthetic NP Problems, https://arxiv.org/abs/2510.16476
  • Wonduk Seo, Juhyeon Lee, Junseo Koh, Hyunjin An, Jian Park, Seunghyun Lee, Haihua Chen, Yi Bu, 18 Oct 2025, Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent Analysis, https://arxiv.org/abs/2510.16635
  • Deuksin Kwon, Jiwon Hae, Emma Clift, Daniel Shamsoddini, Jonathan Gratch, Gale M. Lucas, 19 Sep 2025, ASTRA: A Negotiation Agent with Adaptive and Strategic Reasoning via Tool-integrated Action for Dynamic Offer Optimization, https://arxiv.org/abs/2503.07129
  • Yiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, Ankur Srivastava, Qingfu Zhang, Gang Qu, Ang Li, 22 Sep 2025, SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning, https://arxiv.org/abs/2504.10369
  • Yuyang Ding, Chi Zhang, Juntao Li, Haibin Lin, Xin Liu, Min Zhang, 26 Oct 2025, FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning, https://arxiv.org/abs/2510.22543
  • Yiwen Tang and Qiuyu Zhao and Zenghui Sun and Jinsong Lan and Xiaoyong Zhu and Bo Zheng and Kaifu Zhang, 26 Oct 2025, REVISION:Reflective Intent Mining and Online Reasoning Auxiliary for E-commerce Visual Search System Optimization, https://arxiv.org/abs/2510.22739
  • Liliang Ren, Congcong Chen, Haoran Xu, Young Jin Kim, Adam Atkinson, Zheng Zhan, Jiankai Sun, Baolin Peng, Liyuan Liu, Shuohang Wang, Hao Cheng, Jianfeng Gao, Weizhu Chen, Yelong Shen, 25 Oct 2025, Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation, https://arxiv.org/abs/2507.06607
  • Zehui Ling, Deshu Chen, Yichi Zhang, Yuchen Liu, Xigui Li, Xin Guo, Yuan Cheng, 15 Oct 2025, Adaptive Reasoning Executor: A Collaborative Agent System for Efficient Reasoning, https://arxiv.org/abs/2510.13214
  • Yang Li, Zhichen Dong, Yuhan Sun, Weixun Wang, Shaopan Xiong, Yijia Luo, Jiashun Liu, Han Lu, Jiamang Wang, Wenbo Su, Bo Zheng, Junchi Yan, 15 Oct 2025, Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization, https://arxiv.org/abs/2510.13554
  • Ammar Ahmed, Azal Ahmad Khan, Ayaan Ahmad, Sheng Di, Zirui Liu, Ali Anwar, 26 Sep 2025, Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts, https://arxiv.org/abs/2509.21743
  • Boyang Liu, Yifan Hu, Senjie Jin, Shihan Dou, Gonglei Shi, Jie Shao, Tao Gui, Xuanjing Huang, 26 Sep 2025, Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization, https://arxiv.org/abs/2509.21871
  • Hongyu Shan, Mingyang Song, Chang Dai, Di Liang, Han Chen, 26 Sep 2025, R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning, https://arxiv.org/abs/2509.22131
  • Zhuo Yang, Daolang Wang, Lingli Ge, Beilun Wang, Tianfan Fu, Yuqiang Li, 26 Sep 2025, Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs, https://arxiv.org/abs/2505.12833
  • Wenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chen, 8 Oct 2025, Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning, https://arxiv.org/abs/2510.07038
  • Ziyan Wang, Zheng Wang, Jie Fu, Xingwei Qu, Qi Cheng, Shengpu Tang, Minjia Zhang, Xiaoming Huo, 8 Oct 2025, Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning, https://arxiv.org/abs/2510.04072
  • Aleksei Arzhantsev and Otmane Sakhi and Flavian Vasile, 3 Oct 2025, RoiRL: Efficient, Self-Supervised Reasoning with Offline Iterative Reinforcement Learning, https://arxiv.org/abs/2510.02892
  • Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu, 21 Oct 2025, LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling, https://arxiv.org/abs/2505.19187
  • Qihang Ai, Haiyun Jiang, 25 Sep 2025, Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning, https://arxiv.org/abs/2509.20744
  • Xiao Wang, Jia Wang, Yijie Wang, Pengtao Dang, Sha Cao, Chi Zhang, 24 Sep 2025, MARS: toward more efficient multi-agent collaboration for LLM reasoning, https://arxiv.org/abs/2509.20502
  • Shijie Zhang, Guohao Sun, Kevin Zhang, Xiang Guo, Rujun Guo, 29 Sep 2025, CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning, https://arxiv.org/abs/2509.25004
  • Kaisen Yang, Lixuan He, Rushi Shah, Kaicheng Yang, Qinwei Ma, Dianbo Liu, Alex Lamb, 28 Sep 2025, Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm, https://arxiv.org/abs/2509.23946
  • Weifan Jiang, Rana Shahout, Yilun Du, Michael Mitzenmacher, Minlan Yu, 29 Sep 2025, Intra-request branch orchestration for efficient LLM reasoning, https://arxiv.org/abs/2509.24957
  • Berkcan Kapusuzoglu, Supriyo Chakraborty, Chia-Hsuan Lee, Sambit Sahu, 26 Sep 2025, Critique-Guided Distillation for Efficient and Robust Language Model Reasoning, https://arxiv.org/abs/2505.11628
  • Siddharth Chaudhary, Dev Patel, Maheep Chaudhary, Bennett Browning, 16 Oct 2025, Hydra: A Modular Architecture for Efficient Long-Context Reasoning, https://arxiv.org/abs/2508.15099
  • Gang Li, Yan Chen, Ming Lin, Tianbao Yang, 6 Oct 2025, DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization, https://arxiv.org/abs/2510.04474
  • Mingkang Zhu, Xi Chen, Bei Yu, Hengshuang Zhao, Jiaya Jia, 6 Oct 2025, From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models, https://arxiv.org/abs/2510.05095
  • Wengao Ye, Yan Liang, Lianlei Shan, 5 Oct 2025, Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization, https://arxiv.org/abs/2510.04182
  • Zhengyang Tang, Zihan Ye, Chenyu Huang, Xuhan Huang, Chengpeng Li, Sihang Li, Guanhua Chen, Ming Yan, Zizhuo Wang, Hongyuan Zha, Dayiheng Liu, Benyou Wang, 5 Oct 2025, CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling, https://arxiv.org/abs/2510.04204
  • Imran Mansha, 6 Oct 2025, Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning, https://arxiv.org/abs/2510.05003
  • Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An, 10 Oct 2025, PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning, https://arxiv.org/abs/2510.09133
  • Suming Qiu, Jing Li, Zhicheng Zhou, Junjie Huang, Linyuan Qiu, Zhijie Sun, 10 Oct 2025, HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance, https://arxiv.org/abs/2510.08896
  • Chen Huang, Wei Lu and Wenxuan Zhang, 10 Oct 2025, PEAR: Phase Entropy Aware Reward for Efficient Reasoning, https://arxiv.org/abs/2510.08026
  • Siyong Chen, Jinbo Wen, Jiawen Kang, Tenghui Huang, Xumin Huang, Yuanjia Su, Hudan Pan, Zishao Zhong, Dusit Niyato, Shengli Xie, and Dong In Kim, 24 Oct 2025, MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning, https://arxiv.org/abs/2510.21093
  • Ravindra Aribowo Tarunokusumo, Rafael Fernandes Cunha, 24 Oct 2025, Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning, https://arxiv.org/abs/2510.21398
  • Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal, 24 Oct 2025, Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning, https://arxiv.org/abs/2507.06485
  • Martina G. Vilas, Safoora Yousefi, Besmira Nushi, Eric Horvitz, Vidhisha Balachandran, 12 Oct 2025, Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning, https://arxiv.org/abs/2510.10494
  • Taiqiang Wu, Runming Yang, Tao Liu, Jiahao Wang, Ngai Wong, 13 Oct 2025, Revisiting Model Interpolation for Efficient Reasoning, https://arxiv.org/abs/2510.10977
  • Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu, 13 Oct 2025, From to : Multidimensional Supervision of Reasoning Process for LLM Optimization, https://arxiv.org/abs/2510.11457
  • Junhyuck Kim, Ethan Ewer, Taehong Moon, Jongho Park, Dimitris Papailiopoulos, 13 Oct 2025, Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models, https://arxiv.org/abs/2510.10964
  • Hongwei Chen, Yishu Lei, Dan Zhang, Bo Ke, Danxiang Zhu, Xuyi Chen, Yuxiang Lu, Zhengjie Huang, Shikun Feng, Jingzhou He, Yu Sun, Hua Wu, Haifeng Wang, 11 Oct 2025, MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning, https://arxiv.org/abs/2510.10293
  • Junjie Lu, Yuliang Liu, Chaofeng Qu, Wei Shen, Zhouhan Lin, Min Xu, 13 Oct 2025, Enhancing LLM Reasoning via Non-Human-Like Reasoning Path Preference Optimization, https://arxiv.org/abs/2510.11104
  • Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavarm, 10 Oct 2025, DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning, https://arxiv.org/abs/2510.09883
  • Heming Xia, Cunxiao Du, Rui Li, Chak Tou Leong, Yongqi Li, Wenjie Li, 12 Oct 2025, Merlin's Whisper: Enabling Efficient Reasoning in LLMs via Black-box Adversarial Prompting, https://arxiv.org/abs/2510.10528
  • Jiaqi Wei, Hao Zhou, Xiang Zhang, Di Zhang, Zijie Qiu, Wei Wei, Jinzhe Li, Wanli Ouyang, Siqi Sun, 11 Oct 2025, Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization, https://arxiv.org/abs/2504.14858
  • Menglan Chen, Xianghe Pang, Jingjing Dong, WenHao Wang, Yaxin Du and Siheng Chen, 13 Oct 2025, VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization, https://arxiv.org/abs/2504.12661
  • Yuhan Sun, Zhiwei Huang, Wanqing Cui, Shaopan Xiong, Yazhi Guo, Meiguang Jin and Junfeng Ma, 9 Oct 2025, LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning, https://arxiv.org/abs/2510.07685
  • Yuchen Zhu, Wei Guo, Jaemoo Choi, Petr Molodyk, Bo Yuan, Molei Tao, Yongxin Chen, 9 Oct 2025, Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization, https://arxiv.org/abs/2510.08233
  • Kevin Rojas, Jiahe Lin, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, Molei Tao, Wei Deng, 9 Oct 2025, Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization, https://arxiv.org/abs/2510.08554
  • Yandu Chen, Kefan Gu, Yuqing Wen, Yucheng Zhao, Tiancai Wang, Liqiang Nie, 9 Oct 2025, IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction, https://arxiv.org/abs/2510.07778
  • Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum, 23 Sep 2025, Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models, https://arxiv.org/abs/2506.09532
  • Wonje Choi, Jooyoung Kim, Honguk Woo, 22 Oct 2025, NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning, https://arxiv.org/abs/2510.19429
  • Xichen Zhang, Sitong Wu, Yinghao Zhu, Haoru Tan, Shaozuo Yu, Ziyi He, Jiaya Jia, 22 Oct 2025, Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning, https://arxiv.org/abs/2510.19807
  • Gang Li, Yulei Qin, Xiaoyu Tan, Dingkang Yang, Yuchen Shi, Zihan Xu, Xiang Li, Xing Sun, Ke Li, 30 Sep 2025, RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning, https://arxiv.org/abs/2509.25958
  • Xin Xu, Cliveb AI, Kai Yang, Tianhao Chen, Yang Wang, Saiyong Yang, Can Yang, 30 Sep 2025, Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners, https://arxiv.org/abs/2509.26226
  • Runze Liu, Jiakang Wang, Yuling Shi, Zhihui Xie, Chenxin An, Kaiyan Zhang, Jian Zhao, Xiaodong Gu, Lei Lin, Wenping Hu, Xiu Li, Fuzheng Zhang, Guorui Zhou, Kun Gai, 30 Sep 2025, Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models, https://arxiv.org/abs/2509.26628
  • Gang Li, Ming Lin, Tomer Galanti, Zhengzhong Tu, Tianbao Yang, 30 Sep 2025, DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization, https://arxiv.org/abs/2505.12366
  • Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kiant\'e Brantley, Wen Sun, 30 Sep 2025, Value-Guided Search for Efficient Chain-of-Thought Reasoning, https://arxiv.org/abs/2505.17373
  • Prateek Humane, Paolo Cudrano, Daniel Z. Kaplan, Matteo Matteucci, Supriyo Chakraborty, Irina Rish, 7 Oct 2025, Influence Functions for Efficient Data Selection in Reasoning, https://arxiv.org/abs/2510.06108

Reasoning and CoT Efficiency Topics

Blog articles on reasoning efficiency:

More research information on general efficiency optimization techniques for reasoning models:

Efficiency optimizations to Chain-of-Thought include:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: