Aussie AI
Long List of LLM Reasoning Papers
-
Last Updated 26 August, 2025
-
by David Spuler, Ph.D.
Reasoning Research Papers
This page is a long list of paper citations in the area of LLM reasoning. For a more useful discussion and categorization of research papers, see:
Blog articles: You may also be interested in our recent blog articles related to reasoning:
- Reasoning inference optimization
- Reasoning is the New AI Middleware
- Reasoning Decoding Algorithms
- 500 LLM inference optimization techniques
Multi-Step Inference for Reasoning
A general list of multi-step reasoning or "test time compute" reasoning papers:
- Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 3 Dec 2023 (v2), Tree of Thoughts: Deliberate Problem Solving with Large Language Models, https://arxiv.org/abs/2305.10601 Code: https://github.com/princeton-nlp/tree-of-thought-llm
- Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao, 29 Jul 2024, MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, https://arxiv.org/abs/2407.20183 Code: https://github.com/InternLM/MindSearch Project: https://mindsearch.netlify.app
- Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini, 31 Jul 2024, Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, https://arxiv.org/abs/2407.21787 (Generating multiple answers by repeated inference queries, and then using a verifier to choose the best one, which is shown to greatly increase overall accuracy.)
- David Gu, July 18, 2024, Text Compression for Efficient Language Generation, Master’s Thesis, Distributed Computing Group, Computer Engineering and Networks Laboratory, ETH Zürich, https://pub.tik.ee.ethz.ch/students/2023-HS/MA-2023-19.pdf (Training and inference at the sentence level, including caching of embeddings per sentence, which also has the side-effect of compressing the input prompts and reducing computation analogously to token pruning.)
- Ignacio de Gregorio, Aug 2024, Grokking, a New Form of Reasoning, https://medium.com/@ignacio.de.gregorio.noblejas/grokking-a-new-form-of-reasoning-6785ea89d2ec
- Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
- Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
- Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal, 18 Sep 2024, MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning, https://arxiv.org/abs/2409.12147 https://github.com/dinobby/MAgICoRe
- Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong Long, Jian-guang Lou, 29 Feb 2024 (v2), Re-Reading Improves Reasoning in Large Language Models, https://arxiv.org/abs/2309.06275
- Artem Shelamanov, Sep 2024, Why OpenAI’s o1 Model Is A Scam, https://pub.towardsai.net/why-openais-o1-model-is-a-scam-eb3356c3d70e
- Rinon Gal, Adi Haviv, Yuval Alaluf, Amit H. Bermano, Daniel Cohen-Or, Gal Chechik, 2 Oct 2024, ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation, https://arxiv.org/abs/2410.01731 https://comfygen-paper.github.io/
- Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong, Integrative Decoding: Improve Factuality via Implicit Self-consistency, 3 Oct 2024 (v2), https://arxiv.org/abs/2410.01556 (Prepends a previous response to improve decoding accuracy.)
- Zhenwen Liang, Ye Liu, Tong Niu, Xiangliang Zhang, Yingbo Zhou, Semih Yavuz, 5 Oct 2024, Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification, https://arxiv.org/abs/2410.05318
- Sonya Huang, Pat Grady, and o1, Sequoia, October 9, 2024 Generative AI’s Act o1, https://www.sequoiacap.com/article/generative-ais-act-o1/
- Yingqian Cui, Pengfei He, Xianfeng Tang, Qi He, Chen Luo, Jiliang Tang, Yue Xing, 21 Oct 2024, A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration, https://arxiv.org/abs/2410.16540
- Jiangming Liu, Matt Gardner, Shay B. Cohen, Mirella Lapata, 7 Jun 2021 (v2), Multi-Step Inference for Reasoning Over Paragraphs, https://arxiv.org/abs/2004.02995
- Aditya Kalyanpur, Kailash Karthik Saravanakumar, Victor Barres, CJ McFate, Lori Moon, Nati Seifu, Maksim Eremeev, Jose Barrera, Abraham Bautista-Castillo, Eric Brown, David Ferrucci 24 Jul 2024 (v4), Multi-step Inference over Unstructured Data https://arxiv.org/abs/2406.17987
- Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu, 5 Sep 2024 (v5), Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review, https://arxiv.org/abs/2310.14735
- Xiaodong Liu, Kevin Duh, Jianfeng Gao, 30 Mar 2019 (v2), Stochastic Answer Networks for Natural Language Inference, https://arxiv.org/abs/1804.07888
- TED, Oct 2024, Multi-Step Reasoning Agents, https://tedai-sanfrancisco.ted.com/glossary/multi-step-reasoning-agents/
- Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot, 30 Jan 2023 (v2), Complexity-Based Prompting for Multi-Step Reasoning, https://arxiv.org/abs/2210.00720
- Junting Lu, Oct 2024 (accessed), Awesome-LLM-Reasoning-Techniques, https://github.com/Junting-Lu/Awesome-LLM-Reasoning-Techniques
- Cameron R. Wolfe, Dec 23, 2023, Tree of Thoughts Prompting. Solving multi-step problems with LLMs via deliberate planning and exploration, https://towardsdatascience.com/tree-of-thoughts-prompting-65a3e51f9ac4
- Data Camp, Jul 10, 2024, Chain-of-Thought Prompting: Step-by-Step Reasoning with LLMs, https://www.datacamp.com/tutorial/chain-of-thought-prompting
- Pankaj, Dec 21, 2023, Chain of Thought Prompting: Guiding LLMs Step-by-Step, https://medium.com/@pankaj_pandey/chain-of-thought-prompting-guiding-llms-step-by-step-e6eac32d02d8
- Cobus Greyling, Aug 2, 2023, 12 Prompt Engineering Techniques, https://cobusgreyling.medium.com/12-prompt-engineering-techniques-644481c857aa
- Cameron R. Wolfe, Aug 21, 2023, Tree of Thoughts Prompting. Solving multi-step problems with LLMs via deliberate planning and exploration, https://cameronrwolfe.substack.com/p/tree-of-thoughts-prompting
- Cameron R. Wolfe, Jan 3, 2024, Graph-Based Prompting and Reasoning with Language Models. Understanding graph of thoughts prompting and several variants… https://towardsdatascience.com/graph-based-prompting-and-reasoning-with-language-models-d6acbcd6b3d8
- Jason Wei and Denny Zhou, May 11, 2022, Language Models Perform Reasoning via Chain of Thought, https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/
- Cameron R. Wolfe, Jul 24, 2023, Chain of Thought Prompting for LLMs: A practical and simple approach for “reasoning” with LLMs, https://towardsdatascience.com/chain-of-thought-prompting-for-llms-33c963eead38
- Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
- Arun Shankar, Oct 2024, Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch, https://medium.com/google-cloud/designing-cognitive-architectures-agentic-workflow-patterns-from-scratch-63baa74c54bc
- Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
- Jinlin Wang, Suyuchen Wang, Ziwen Xia, Sirui Hong, Yun Zhu, Bang Liu, Chenglin Wu, 28 Oct 2024, FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval, https://arxiv.org/abs/2410.21012
- Latent Space, Nov 05, 2024, Inference, Fast and Slow. When System 1/System 2 analogies are not enough: The 6 types of LLM inference https://www.latent.space/p/inference-fast-and-slow
- Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, Junyang Lin, 31 Oct 2024, Language Models can Self-Lengthen to Generate Long Texts, https://arxiv.org/abs/2410.23933?
- LangChain, Nov 7, 2024. SCIPE - Systematic Chain Improvement and Problem Evaluation, https://blog.langchain.dev/scipe-systematic-chain-improvement-and-problem-evaluation/ https://github.com/garg-ankush/scipe/tree/main
- X Wang, L Mu, J Zhang, H Xu, 2024, Multi-pass Decoding for Grammatical Error Correction, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9904–9916, November 12-16, 2024, https://aclanthology.org/2024.emnlp-main.553.pdf
- Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
- Guowei Xu, Peng Jin, Li Hao, Yibing Song, Lichao Sun, Li Yuan, 15 Nov 2024, LLaVA-o1: Let Vision Language Models Reason Step-by-Step, https://arxiv.org/abs/2411.10440
- Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
- mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
- Eric Horvitz , Harsha Nori , Naoto Usuyama , November 27, 2024 Advances in run-time strategies for next-generation foundation models, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/advances-in-run-time-strategies-for-next-generation-foundation-models/
- Harsha Nori, Naoto Usuyama, Nicholas King, Scott Mayer McKinney, Xavier Fernandes, Sheng Zhang, Eric Horvitz, 6 Nov 2024, From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond, https://arxiv.org/abs/2411.03590
- Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830
- Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
- Arda Sevinc, Abdurrahman Gumus, 9 Dec 2024, AutoReason: Automatic Few-Shot Reasoning Decomposition, https://arxiv.org/abs/2412.06975 https://github.com/miralab-ai/autoreason
- Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, Yangyang Shi, Vikas Chandra, Jürgen Schmidhuber, 16 Oct 2024 (v2), Agent-as-a-Judge: Evaluate Agents with Agents, https://arxiv.org/abs/2410.10934
- Kyle Wiggers, December 14, 2024, ‘Reasoning’ AI models have become a trend, for better or worse, https://techcrunch.com/2024/12/14/reasoning-ai-models-have-become-a-trend-for-better-or-worse/
- Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
- Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
- Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
- Agnostiq, Dec 2024, multi-agent-llm: LLM based Multi-Agent methods: Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT), https://github.com/AgnostiqHQ/multi-agent-llm
- Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena, 30 Nov 2021, Show Your Work: Scratchpads for Intermediate Computation with Language Models, https://arxiv.org/abs/2112.00114
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
- Rohin Manvi, Anikait Singh, Stefano Ermon, 3 Oct 2024, Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation, https://arxiv.org/abs/2410.02725
- Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li, 19 Jan 2024, Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. The Twelfth International Conference on Learning Representations, 2024, https://arxiv.org/abs/2401.10480 https://github.com/Yiwei98/ESC (Uses "early stopping" idea to improve CoT efficiency during inference.)
- Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
- NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
- Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
- Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Ningyu Zhang, Jiang Yong, Pengjun Xie, Fei Huang, Huajun Chen, 16 Jan 2025, OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking, https://arxiv.org/abs/2501.09751 (Iteratively going deeper into a topic while generating.)
- Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, Andrew D. White, 30 Dec 2024, Aviary: training language agents on challenging scientific tasks, https://arxiv.org/abs/2412.21154 (Using smaller models combined with multi-step reasoning to compete with big models with 100x less inference cost.)
- Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, Dave Marwood, Shumeet Baluja, Dale Schuurmans, Xinyun Chen, 17 Jan 2025, Evolving Deeper LLM Thinking, https://arxiv.org/abs/2501.09891 (An alternative search strategy broad/deep, compared to CoT and reflection.)
- Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
- Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
- Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, Bingchen Liu, Daquan Zhou, Song Han, 30 Jan 2025, SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer, https://arxiv.org/abs/2501.18427 (Diffusion model optimization using block-level depth pruning and inference-time scaling.)
- S Wang, X Zhang, J Ma, A Hwang, Z Yu, Jan 2025, JumpStarter: Getting Started on Personal Goals with Adaptive Personal Context Curation, https://sitong-wang.github.io/data/JumpStarter.pdf (Long-term planning of goal-oriented long multi-step projects.)
- Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
- Manish Sanwal, 3 Feb 2025 (v2), Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models, https://arxiv.org/abs/2501.18645
- Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
- Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang, 10 Feb 2025, ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates, https://arxiv.org/abs/2502.06772 https://github.com/Gen-Verse/ReasonFlux (RALM-like retrieval of reasoning prompt templates at inference time.)
- Hanmeng Liu, Zhizhang Fu, Mengru Ding, Ruoxi Ning, Chaoli Zhang, Xiaozhang Liu, Yue Zhang, 13 Feb 2025, Logical Reasoning in Large Language Models: A Survey, https://arxiv.org/abs/2502.09100
- Zeping Yu, Yonatan Belinkov, Sophia Ananiadou, 15 Feb 2025, Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models, https://arxiv.org/abs/2502.10835
- Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, 20 Feb 2025, S*: Test Time Scaling for Code Generation, https://arxiv.org/abs/2502.14382 https://github.com/NovaSky-AI/SkyThought
- Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
- Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu, 17 Feb 2025, Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? https://arxiv.org/abs/2502.12215
- Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://arxiv.org/abs/2502.12521
- Zihao Zeng, Xuyao Huang, Boxiu Li, Zhijie Deng, 19 Feb 2025, SIFT: Grounding LLM Reasoning in Contexts via Stickers, https://arxiv.org/abs/2502.14922 https://github.com/zhijie-group/SIFT (Multi-step reasoning where the LLM first generates a modified prompt that summarizes the key points, and then does inference for both the original and modified prompts, then comparing results and adjusting forwards and backwards.)
- Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
- Maxwell Zeff, February 24, 2025, Anthropic launches a new AI model that ‘thinks’ as long as you want, https://techcrunch.com/2025/02/24/anthropic-launches-a-new-ai-model-that-thinks-as-long-as-you-want/
- Kif Leswing, Feb 26 2025, Nvidia CEO Huang says AI has to do ’100 times more’ computation now than when ChatGPT was released, https://www.cnbc.com/2025/02/26/nvidia-ceo-huang-says-next-generation-ai-will-need-more-compute.html (The thesis that AI reasoning will need 100 times more compute, regardless of whether it is a single-step "long answers" model thinking out loud, or a multi-step test time compute model.)
- Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu, 25 Feb 2025 (v2), From System 1 to System 2: A Survey of Reasoning Large Language Models, https://arxiv.org/abs/2502.17419
- Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei, 25 Feb 2025, Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning, https://arxiv.org/abs/2502.18080 (Trying to generate the "shortest correct response" by examining the lengths needed for CoT.)
- Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Mengdi Zhang, Jian Shao, Yueting Zhuang, 13 Mar 2025 (v2), InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models, https://arxiv.org/abs/2503.06692
- Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
- Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi, 20 Feb 2025 (v2), Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/abs/2502.01839 (Wrapping a single model with a Best-of-N approach that self-selects the best answer can significantly improve reasoning rates.)
- Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He, 8 May 2025 (v2), A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law, https://arxiv.org/abs/2505.02665
- Michael Nuñez, July 15, 2025, OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’, https://venturebeat.com/ai/openai-google-deepmind-and-anthropic-sound-alarm-we-may-be-losing-the-ability-to-understand-ai/ (Monitoring the text-based interim "thinking-out-loud" reasoning of models in CoT.)
- Tomek Korbak, Mikita Balesni, (and many more authors) July 2025, Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety, https://tomekkorbak.com/cot-monitorability-is-a-fragile-opportunity/cot_monitoring.pdf
- Sebastian Raschka, Mar 8, 2025, Inference-Time Compute Scaling Methods to Improve Reasoning Models: Part 1: Inference-Time Compute Scaling Methods, https://sebastianraschka.com/blog/2025/state-of-llm-reasoning-and-inference-scaling.html
- Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou, 10 Feb 2025, Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling, https://arxiv.org/abs/2502.06703
- Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://www.arxiv.org/abs/2502.12521
- Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang, 23 Feb 2025 (v2), Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking, https://arxiv.org/abs/2502.13842
- Brown Ebouky, Andrea Bartezzaghi, Mattia Rigotti, 13 Jun 2025, Eliciting Reasoning in Language Models with Cognitive Tools, https://arxiv.org/abs/2506.12115
- Tao Xu, Dung-Yang Lee and Momiao Xiong, 21 Jul 2025, Reinforcement Learning in hyperbolic space for multi-step reasoning, https://arxiv.org/abs/2507.16864
- Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama, 23 Jul 2025, Cross-domain Multi-step Thinking: Zero-shot Fine-grained Traffic Sign Recognition in the Wild, https://arxiv.org/abs/2409.01534
- Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi, 11 Aug 2025, Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent, https://arxiv.org/abs/2508.08222
- Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Daxin Jiang, Binxing Jiao, Chen Hu, Huacan Wang, 4 Aug 2025, SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents, https://arxiv.org/abs/2508.02085
- Jiashuo Yu, Yue Wu, Meng Chu, Zhifei Ren, Zizheng Huang, Pei Chu, Ruijie Zhang, Yinan He, Qirui Li, Songze Li, Zhenxiang Li, Zhongying Tu, Conghui He, Yu Qiao, Yali Wang, Yi Wang, Limin Wang, 4 Aug 2025, VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos, https://arxiv.org/abs/2506.10857
- Shaofeng Yin, Ting Lei, Yang Liu, 5 Aug 2025, ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools, https://arxiv.org/abs/2508.03284
- Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 13 Aug 2025, Multi-Step Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
- Ayoub Ben Chaliah and Hela Dellagi, 18 Aug 2025, Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis, https://arxiv.org/abs/2508.13382
General Reasoning Papers
- Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun, 11 Jun 2024 (v2), Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies, https://arxiv.org/abs/2406.06461
- Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin, 13 Jun 2024, Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2406.09136 Code: https://github.com/sail-sg/CPO
- Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 3 Dec 2023 (v2), Tree of Thoughts: Deliberate Problem Solving with Large Language Models, https://arxiv.org/abs/2305.10601 Code: https://github.com/princeton-nlp/tree-of-thought-llm
- Hayden Field, June 20, 2024, OpenAI competitor Anthropic announces its most powerful AI yet, CNBC, https://www.cnbc.com/2024/06/20/anthropic-claude-3point5-sonnet-ai-announced.html
- Wai-Chung Kwan, Xingshan Zeng, Yuxin Jiang, Yufei Wang, Liangyou Li, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong, 30 Jan 2024, MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models, https://arxiv.org/abs/2401.16745 Code: https://github.com/KwanWaiChung/MT-Eval
- M.Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk et al., “Graph of thoughts: Solving elaborate problems with large language models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17682–17690. https://arxiv.org/abs/2308.09687
- Q. Sun, Z. Yin, X. Li, Z. Wu, X. Qiu, and L. Kong, “Corex: Pushing the boundaries of complex reasoning through multi model collaboration,” arXiv preprint arXiv:2310.00280, 2023. https://arxiv.org/abs/2310.00280
- Tianle Li, Wei-Lin Chiang, Lisa Dunlap, May 20, 2024, Introducing Hard Prompts Category in Chatbot Arena, https://lmsys.org/blog/2024-05-17-category-hard/
- Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
- Rahul Verma, June 21, 2024, OpenAI's GPT-5 Pushed Back To Late 2025, But Promises Ph.D.-Level Abilities, https://in.mashable.com/tech/77593/openais-gpt-5-pushed-back-to-late-2025-but-promises-phd-level-abilities
- Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab, 17 Jun 2024, Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs, https://arxiv.org/abs/2406.11695
- Sachit Menon, Richard Zemel, Carl Vondrick, 20 Jun 2024, Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities, https://arxiv.org/abs/2406.14562
- Lina M. Rojas-Barahona, 2024, Talking to Machines: do you read me?. Computation and Language, Universite de Lorraine, https://hal.science/tel-04620199/document
- Rafe Brena, May 24, 2024, 3 Key Differences Between Human and Machine Intelligence You Need to Know: AI is an alien intelligence https://pub.towardsai.net/3-key-differences-between-human-and-machine-intelligence-you-need-to-know-7a34dcee2cd3 (Good article about how LLMs don't have "emotions" or "intelligence" and they don't "pause".)
- Vishal Rajput, Apr 11, 2024, What’s next for AI: AI agentic workflows? https://medium.com/aiguys/next-for-llms-and-rag-ai-agentic-workflows-1869ba0a6796
- Rachel Metz, July 12, 2024, OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving: The company believes its technology is approaching the second level of five on the path to artificial general intelligence, Bloomberg, https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai?sref=P6Q0mxvj
- Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu, 1 May 2024, Causal Evaluation of Language Models, https://arxiv.org/abs/2405.00622 Project: https://opencausalab.github.io/CaLM
- Anna Tong and Katie Paul July 16, 2024, Exclusive: OpenAI working on new reasoning technology under code name ‘Strawberry’, https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/
- Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao, 29 Jul 2024, MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, https://arxiv.org/abs/2407.20183 Code: https://github.com/InternLM/MindSearch Project: https://mindsearch.netlify.app
- Ethan Mollick, May 12, 2024, Superhuman? What does it mean for AI to be better than a human? And how can we tell? https://www.oneusefulthing.org/p/superhuman
- Ignacio de Gregorio, Aug 2024, Grokking, a New Form of Reasoning, https://medium.com/@ignacio.de.gregorio.noblejas/grokking-a-new-form-of-reasoning-6785ea89d2ec
- Zarif Bin Akhtar, Mapping Generative Artificial Intelligence (GAI's) Exciting Future: From Gemini to Q* and Beyond, https://publications.eai.eu/index.php/airo/article/view/5962 https://doi.org/10.4108/airo.5962 PDF: https://publications.eai.eu/index.php/airo/article/view/5962/3329
- Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu, 16 Aug 2024, Visual Agents as Fast and Slow Thinkers, https://arxiv.org/abs/2408.08862
- Adam Zewe, June 14, 2024, Technique improves the reasoning capabilities of large language models: Combining natural language and programming, the method enables LLMs to solve numerical, analytical, and language-based tasks transparently, MIT News, https://news.mit.edu/2024/technique-improves-reasoning-capabilities-large-language-models-0614
- Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng, 6 Feb 2024, Self-Discover: Large Language Models Self-Compose Reasoning Structures, https://arxiv.org/abs/2402.03620
- Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su, 4 Feb 2024 (v2), Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning, https://arxiv.org/abs/2401.17686
- Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian 14 Mar 2024, Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey, McAuley, Wei Ai, Furong Huang, https://arxiv.org/abs/2403.09606
- Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou, 25 Aug 2024, Path-Consistency: Prefix Enhancement for Efficient Inference in LLM, https://arxiv.org/abs/2409.01281
- Cogni Down Under, Sep 2024, Reflection 70B: The AI That Thinks Before It Speaks, https://medium.com/@cognidownunder/reflection-70b-the-ai-that-thinks-before-it-speaks-8a70d3a0e38a
- Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
- Alberto Romero. Sep 10, 2024, Big News: OpenAI to Launch AI Model That Can Reason in 2 Weeks, https://www.thealgorithmicbridge.com/p/big-news-openai-to-launch-ai-model
- Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Bo Tang, Feiyu Xiong, Zhiyu Li, 5 Sep 2024, Attention Heads of Large Language Models: A Survey, https://arxiv.org/abs/2409.03752 https://github.com/IAAR-Shanghai/Awesome-Attention-Heads (This survey is about making attention mechanisms more performant, accurate and intelligent, rather than improving efficiency.)
- Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
- Louis Bouchard, Sep 12, 2024, OpenAI's o1 Model: The Future of Reasoning AI? What Sets It Apart, How OpenAI's o1 Model Thinks Through Problems (And Why It's Slower), https://www.louisbouchard.ai/openai-o1/
- OpenAI, September 12, 2024, Introducing OpenAI o1-preview, A new series of reasoning models for solving hard problems. https://openai.com/index/introducing-openai-o1-preview/
- OpenAI, September 12, 2024, Learning to Reason with LLMs, https://openai.com/index/learning-to-reason-with-llms/
- Nathan Lambert, Sep 05, 2024, OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference, Whether or not scaling works, we should spend more on inference, https://www.interconnects.ai/p/openai-strawberry-and-inference-scaling-laws
- Ignacio de Gregorio Noblejas, September 15, 2024, OpenAI Launches o1. Here’s All You Need to Know, https://thetechoasis.beehiiv.com/p/openai-launches-o1-heres-need-know
- Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li, 27 Jun 2024 (v2), ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967
- Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561
- Michael Nuñez, September 16, 2024, SambaNova challenges OpenAI’s o1 model with Llama 3.1-powered demo on HuggingFace, https://venturebeat.com/ai/sambanova-challenges-openais-o1-model-with-llama-3-1-powered-demo-on-huggingface/
- Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett, 18 Sep 2024, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, https://arxiv.org/abs/2409.12183
- Santosh Kumar Radha, Yasamin Nouri Jelyani, Ara Ghukasyan, Oktay Goktas, 19 Sep 2024, Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning, https://arxiv.org/abs/2409.12618
- Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal, 18 Sep 2024, MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning, https://arxiv.org/abs/2409.12147 https://github.com/dinobby/MAgICoRe
- Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
- Artem Shelamanov, Sep 2024, Why OpenAI’s o1 Model Is A Scam, https://pub.towardsai.net/why-openais-o1-model-is-a-scam-eb3356c3d70e
- Chloe Berger, October 2, 2024, Mark Cuban says his puppy is ‘smarter than AI is today’, https://fortune.com/2024/10/01/mark-cuban-dog-puppy-smarter-than-ai/
- Julia Love and Rachel Metz, October 2, 2024, Google Is Working on Reasoning AI, Chasing OpenAI’s Efforts, https://www.bloomberg.com/news/articles/2024-10-02/google-is-working-on-reasoning-ai-chasing-openai-s-efforts
- Zhenwen Liang, Ye Liu, Tong Niu, Xiangliang Zhang, Yingbo Zhou, Semih Yavuz, 5 Oct 2024, Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification, https://arxiv.org/abs/2410.05318
- Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar, 7 Oct 2024, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, https://arxiv.org/abs/2410.05229
- Sonya Huang, Pat Grady, and o1, Sequoia, October 9, 2024 Generative AI’s Act o1, https://www.sequoiacap.com/article/generative-ais-act-o1/
- Ignacio de Gregorio Noblejas, October 20, 2024, The Anti-LLM Revolution Begins,https://thetechoasis.beehiiv.com/p/the-anti-llm-revolution-begins
- By Asif Razzaq, October 13, 2024, OpenR: An Open-Source AI Framework Enhancing Reasoning in Large Language Models, https://www.marktechpost.com/2024/10/13/openr-an-open-source-ai-framework-enhancing-reasoning-in-large-language-models/
- Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
- Latent Space, Nov 05, 2024, Inference, Fast and Slow. When System 1/System 2 analogies are not enough: The 6 types of LLM inference https://www.latent.space/p/inference-fast-and-slow
- Will Lockett Nov 2024, Apple Calls BS On The AI Revolution, They aren’t late to the AI game; they are just the only sceptical big tech company. https://medium.com/predict/apple-calls-bullshit-on-the-ai-revolution-ae38fdf83392
- Anthony Ha, Nov 2024, OpenAI reportedly developing new strategies to deal with AI improvement slowdown, https://techcrunch.com/2024/11/09/openai-reportedly-developing-new-strategies-to-deal-with-ai-improvement-slowdown/
- Michael Nuñez, November 11, 2024, AI’s math problem: FrontierMath benchmark shows how far technology still has to go, https://venturebeat.com/ai/ais-math-problem-frontiermath-benchmark-shows-how-far-technology-still-has-to-go/
- Kyle Orland, 13 Nov 2024, What if AI doesn’t just keep getting better forever? New reports highlight fears of diminishing returns for traditional LLM training. https://arstechnica.com/ai/2024/11/what-if-ai-doesnt-just-keep-getting-better-forever/
- Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
- Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen, 14 Oct 2024 (v3), Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, https://arxiv.org/abs/2408.02442
- Janelle Teng, Nov 26, 2024, AI's reasoning quandary, https://nextbigteng.substack.com/p/ais-reasoning-quandary
- Qwen Team, November 28, 2024, QwQ: Reflect Deeply on the Boundaries of the Unknown, https://qwenlm.github.io/blog/qwq-32b-preview/
- mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
- Tom Schaul, 25 Nov 2024, Boundless Socratic Learning with Language Games, https://arxiv.org/abs/2411.16905
- Alberto Romero, Dec 06, 2024, OpenAI Announces o1 Model And ChatGPT Pro ($200/Mo). OpenAI Christmas event: Day 1 of 12, https://www.thealgorithmicbridge.com/p/openai-announces-o1-model-and-chatgpt
- Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister, 29 Nov 2024, Reverse Thinking Makes LLMs Stronger Reasoners, https://arxiv.org/abs/2411.19865
- Tiernan Ray, Dec. 10, 2024, How Cerebras boosted Meta's Llama to 'frontier model' performance The company also demonstrates initial training of a one-trillion-parameter AI model on a single machine using conventional DDR5 memory chips. https://www.zdnet.com/article/how-cerebras-boosted-metas-llama-to-frontier-model-performance/
- Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769 (Performing reasoning in a model trained to operate in the embedding vector space, rather than more directly in the token space.)
- Arda Sevinc, Abdurrahman Gumus, 9 Dec 2024, AutoReason: Automatic Few-Shot Reasoning Decomposition, https://arxiv.org/abs/2412.06975 https://github.com/miralab-ai/autoreason
- Kyle Wiggers, December 14, 2024, ‘Reasoning’ AI models have become a trend, for better or worse, https://techcrunch.com/2024/12/14/reasoning-ai-models-have-become-a-trend-for-better-or-worse/
- Vincent-Pierre Berges, Barlas Oguz, December 12, 2024, Memory Layers at Scale, Meta, https://ai.meta.com/research/publications/memory-layers-at-scale/ https://github.com/facebookresearch/memory (Augmention of an LLM with an additional key-value associative memory, by replacing some FFNs with a "memory layer".)
- Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
- Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
- Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
- Agnostiq, Dec 2024, multi-agent-llm: LLM based Multi-Agent methods: Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT), https://github.com/AgnostiqHQ/multi-agent-llm
- Denise Holt, Dec 18, 2024, VERSES AI Crushes OpenAI o1 in Head to Head Competition: VERSES AI 's New Genius™ Platform Delivers Far More Performance than Open AI's Most Advanced Model at a Fraction of the Cost. https://deniseholt.substack.com/p/verses-ai-crushes-openai-o1-in-head
- Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen, 18 Dec 2024 (v2), Are Your LLMs Capable of Stable Reasoning? https://arxiv.org/abs/2412.13147 https://github.com/open-compass/GPassK
- Alberto Romero, Dec 21, 2024, OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI. Incredible, a miracle, more than just a better state-of-the-art AI model. https://www.thealgorithmicbridge.com/p/openai-o3-model-is-a-message-from
- Sabrina Ortiz, Dec. 20, 2024, OpenAI unveils its most advanced o3 reasoning model on its last day of 'shipmas', https://www.zdnet.com/article/openai-unveils-its-most-advanced-o3-reasoning-model-on-its-last-day-of-shipmas/
- Jie He, Nan Hu, Wanqiu Long, Jiaoyan Chen, Jeff Z. Pan, 22 Dec 2024, MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge, https://arxiv.org/abs/2412.17032 https://github.com/probe2/multi-hop/ (Model evaluation of reasoning abilities.)
- Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao, 24 Dec 2024, Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search, https://arxiv.org/abs/2412.18319 https://github.com/HJYao00/Mulberry (Multimodal multi-step reasoning like CoT.)
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang, 25 Dec 2024, HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs, https://arxiv.org/abs/2412.18925
- Lori Dajose, December 17, 2024, Thinking Slowly: The Paradoxical Slowness of Human Behavior, https://www.caltech.edu/about/news/thinking-slowly-the-paradoxical-slowness-of-human-behavior
- Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
- Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang, 21 Nov 2024 (v2), Disentangling Memory and Reasoning Ability in Large Language Models, https://arxiv.org/abs/2411.13504 https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning
- Allen Nie, Yi Su, Bo Chang, Jonathan N. Lee, Ed H. Chi, Quoc V. Le, Minmin Chen, 8 Oct 2024, EVOLvE: Evaluating and Optimizing LLMs For Exploration, https://arxiv.org/abs/2410.06238
- Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar, 14 Oct 2024, Thinking LLMs: General Instruction Following with Thought Generation, https://arxiv.org/abs/2410.10630 (Training an LLM to reason by generating additional "thoughts" during training.)
- Xiang Huang, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang, Yuzhong Qu, 27 Dec 2024, TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data, https://arxiv.org/abs/2412.19544?
- Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen, 2 Jan 2025, Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking, https://arxiv.org/abs/2501.01306
- Mayi Xu, Yunfeng Ning, Yongqi Li, Jianhao Chen, Jintao Wen, Yao Xiao, Shen Zhou, Birong Pan, Zepeng Bao, Xin Miao, Hankun Kang, Ke Sun, Tieyun Qian, 2 Jan 2025, Reasoning based on symbolic and parametric knowledge bases: a survey, https://arxiv.org/abs/2501.01030 (Extensive survey of reasoning from CoT to knowledge graphs to table-based reasoning.)
- Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
- Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou, 9 Jan 2025, Search-o1: Agentic Search-Enhanced Large Reasoning Models, https://arxiv.org/abs/2501.05366 https://github.com/sunnynexus/Search-o1 (RAG retrieval and agentic methods applied to Large Reasoning Models.)
- Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu, 8 Jan 2025, InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection, https://arxiv.org/abs/2501.04575
- Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
- Ben Hylak and swyx & Alessio, Jan 12, 2025, o1 isn’t a chat model (and that’s the point): How Ben Hylak turned from ol pro skeptic to fan by overcoming his skill issue. https://www.latent.space/p/o1-skill-issue (Prompting reasoning models like "o1" is different from previous model generations.)
- NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
- Omkar Thawakar, Dinura Dissanayake, Ketan More, Ritesh Thawkar, Ahmed Heakl, Noor Ahsan, Yuhao Li, Mohammed Zumri, Jean Lahoud, Rao Muhammad Anwer, Hisham Cholakkal, Ivan Laptev, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan, 10 Jan 2025, LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs, https://arxiv.org/abs/2501.06186
- François Chollet, 25 Nov 2019 (v2), On the Measure of Intelligence, https://arxiv.org/abs/1911.01547
- Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, Andrew D. White, 30 Dec 2024, Aviary: training language agents on challenging scientific tasks, https://arxiv.org/abs/2412.21154 (Using smaller models combined with multi-step reasoning to compete with big models with 100x less inference cost.)
- Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou, 21 Nov 2024 (v2), LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning, https://arxiv.org/abs/2410.02884
- Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang, 18 Nov 2024 (v3), ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search, https://arxiv.org/abs/2406.03816 https://github.com/THUDM/ReST-MCTS
- Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang, 12 Oct 2024, OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models, https://arxiv.org/abs/2410.09671 https://openreasoner.github.io/
- Yiwei Qin, Xuefeng Li, Haoyang Zou, Yixiu Liu, Shijie Xia, Zhen Huang, Yixin Ye, Weizhe Yuan, Hector Liu, Yuanzhi Li, Pengfei Liu, 8 Oct 2024, O1 Replication Journey: A Strategic Progress Report -- Part 1. https://arxiv.org/abs/2410.18982
- Matthias Bastian, Oct 6, 2024, Study reveals major reasoning flaws in smaller AI language models, https://the-decoder.com/study-reveals-major-reasoning-flaws-in-smaller-ai-language-models/
- Paul Sawers, January 23, 2025, Meta’s Yann LeCun predicts a ‘new AI architectures paradigm’ within 5 years and ‘decade of robotics’, https://techcrunch.com/2025/01/23/metas-yann-lecun-predicts-a-new-ai-architectures-paradigm-within-5-years-and-decade-of-robotics/
- Latent Space, Jan 25, 2025, Why o3-mini *had* to be free: the coming DeepSeek R1, 2.0 Flash, and Sky-T1 Price War: 2025's biggest surprise so far: Reasoning is less of a moat than anyone thought. https://www.latent.space/p/reasoning-price-war
- Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
- Akash Bajwa Jan 27, 2025, The Post-R1 World: AI Economics Have Irreversibly Changed, https://akashbajwa.substack.com/p/the-post-r1-world
- G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
- Lan Pan, Hanbo Xie, Robert C. Wilson, 29 Jan 2025, Large Language Models Think Too Fast To Explore Effectively, https://arxiv.org/abs/2501.18009
- Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Jan 2025, Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs, https://arxiv.org/abs/2501.18585
- Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang, 31 Jan 2025, BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning, https://arxiv.org/abs/2501.18858
- Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
- Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
- Hieu Minh "Jord" Nguyen, 10 Feb 2025, A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks, https://arxiv.org/abs/2502.06470
- Daniel Fleischer, Moshe Berchansky, Gad Markovits, Moshe Wasserblat, 13 Feb 2025, SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models, https://arxiv.org/abs/2502.09390 https://github.com/IntelLabs/RAG-FiT/tree/square
- Salvatore Raieli, Feb 2025, The LLMs’ Dilemma: Thinking Too Much OR Too Little? Exploring the fine line between deep reasoning and computational overkill in large language models., https://levelup.gitconnected.com/the-llms-dilemma-thinking-too-much-or-too-little-619a7532a47e
- Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
- Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah, 20 Feb 2025, CER: Confidence Enhanced Reasoning in LLMs, https://arxiv.org/abs/2502.14634 (Using model confidence metrics, i.e., logits, to evaluate reasoning pathways.)
- Zhipeng Chen, Yingqian Min, Beichen Zhang, Jie Chen, Jinhao Jiang, Daixuan Cheng, Wayne Xin Zhao, Zheng Liu, Xu Miao, Yang Lu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen, 6 Mar 2025, An Empirical Study on Eliciting and Improving R1-like Reasoning Models, https://arxiv.org/abs/2503.04548 https://github.com/RUCAIBox/Slow_Thinking_with_LLMs
- Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng, 27 Mar 2025, A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond, https://arxiv.org/abs/2503.21614
- Kenneth Payne, Baptiste Alloui-Cros, 3 Jul 2025, Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory, https://arxiv.org/abs/2507.02618
- Bin Hong, Jiayu Liu, Zhenya Huang, Kai Zhang, Mengdi Zhang, 13 Aug 2025, Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization, https://arxiv.org/abs/2508.10164
- Jingde Cheng, 14 Aug 2025, Why Cannot Large Language Models Ever Make True Correct Reasoning?, https://arxiv.org/abs/2508.10265
- Chuhuai Yue, Chengqi Dong, Yinan Gao, Hang He, Jiajun Chai, Guojun Yin and Wei Lin, 14 Aug 2025, Promoting Efficient Reasoning with Verifiable Stepwise Reward, https://arxiv.org/abs/2508.10293
- Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu, 14 Aug 2025, What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles, https://arxiv.org/abs/2508.10358
- Runqi Qiao and Qiuna Tan and Peiqing Yang and Yanzi Wang and Xiaowan Wang and Enhui Wan and Sitong Zhou and Guanting Dong and Yuchen Zeng and Yida Xu and Jie Wang and Chong Sun and Chen Li and Honggang Zhang, 14 Aug 2025, We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning, https://arxiv.org/abs/2508.10433
- Yushi Feng, Junye Du, Yingying Hong, Qifan Wang, Lequan Yu, 14 Aug 2025, PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning, https://arxiv.org/abs/2508.10501
- Ma\"el Jullien, Marco Valentino, and Andr\'e Freitas, 14 Aug 2025, The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference, https://arxiv.org/abs/2508.10777
- Zhipeng Chen, Xiaobo Qin, Youbin Wu, Yue Ling, Qinghao Ye, Wayne Xin Zhao, Guang Shi, 14 Aug 2025, Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models, https://arxiv.org/abs/2508.10751
- Li Wang, Changhao Zhang, Zengqi Xiu, Kai Lu, Xin Yu, Kui Zhang, Wenjun Wu, 7 Aug 2025, Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning, https://arxiv.org/abs/2508.10019
- Chuan Li, Qianyi Zhao, Fengran Mo, Cen Chen, 7 Aug 2025, FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2508.10020
- Kai Zhao, Yanjun Zhao, Jiaming Song, Shien He, Lusheng Zhang, Qiang Zhang, Tianjiao Li, 8 Aug 2025, SABER: Switchable and Balanced Training for Efficient LLM Reasoning, https://arxiv.org/abs/2508.10026
- Christopher Pinier, Sonia Acu\~na Vargas, Mariia Steeghs-Turchina, Dora Matzke, Claire E. Stevenson, Michael D. Nunez, 12 Aug 2025, Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning, https://arxiv.org/abs/2508.10057
- Juyuan Wang, Rongchen Zhao, Wei Wei, Yufeng Wang, Mo Yu, Jie Zhou, Jin Xu, Liyan Xu, 14 Aug 2025, ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning, https://arxiv.org/abs/2508.10419
- Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, and Xiaofeng Yang, 14 Aug 2025, Performance of GPT-5 in Brain Tumor MRI Reasoning, https://arxiv.org/abs/2508.10865
- Xingyu Wu, Yuchen Yan, Shangke Lyu, Linjuan Wu, Yiwen Qiu, Yongliang Shen, Weiming Lu, Jian Shao, Jun Xiao, Yueting Zhuang, 14 Aug 2025, LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization, https://arxiv.org/abs/2507.15758
- Liang Zhang, Edith Aurora Graf, 14 Aug 2025, Mathematical Computation and Reasoning Errors by Large Language Models, https://arxiv.org/abs/2508.09932
- Atin Pothiraj, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal, 13 Aug 2025, CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting, https://arxiv.org/abs/2504.15485
- GLM-V Team: Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiale Zhu, Jiali Chen, Jing Chen, Jinhao Chen, Jinghao Lin, Jinjiang Wang, Junjie Chen, Leqi Lei, Letian Gong, Leyi Pan, Mingdao Liu, Mingde Xu, Mingzhi Zhang, Qinkai Zheng, Sheng Yang, Shi Zhong, Shiyu Huang, Shuyuan Zhao, Siyan Xue, Shangqin Tu, Shengbiao Meng, Tianshu Zhang, Tianwei Luo, Tianxiang Hao, Tianyu Tong, Wenkai Li, Wei Jia, Xiao Liu, Xiaohan Zhang, Xin Lyu, Xinyue Fan, Xuancheng Huang, Yanling Wang, Yadong Xue, Yanfeng Wang, Yanzi Wang, Yifan An, et al. (22 additional authors not shown), 14 Aug 2025, GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning, https://arxiv.org/abs/2507.01006
- Zhangquan Chen, Ruihui Zhao, Chuwei Luo, Mingze Sun, Xinlei Yu, Yangyang Kang, Ruqi Huang, 14 Aug 2025, SIFThinker: Spatially-Aware Image Focus for Visual Reasoning, https://arxiv.org/abs/2508.06259
- Yanhui Li, Yunkang Cao, Chengliang Liu, Yuan Xiong, Xinghui Dong, Chao Huang, 14 Aug 2025, IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection, https://arxiv.org/abs/2508.09178
- Mo Yu, Tsz Ting Chung, Chulun Zhou, Tong Li, Rui Lu, Jiangnan Li, Liyan Xu, Haoshu Lu, Ning Zhang, Jing Li, Jie Zhou, 14 Aug 2025, PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts, https://arxiv.org/abs/2508.09848
- Yihao Xue, Baharan Mirzasoleiman, 22 Jul 2025, LoRA is All You Need for Safety Alignment of Reasoning LLMs, https://arxiv.org/abs/2507.17075
- Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li, 23 Jul 2025, Improving LLMs' Generalized Reasoning Abilities by Graph Problems, https://arxiv.org/abs/2507.17168
- Luca Salvatore Lorello, Nikolaos Manginas, Marco Lippi, Stefano Melacci, 23 Jul 2025, LTLZinc: a Benchmarking Framework for Continual Learning and Neuro-Symbolic Temporal Reasoning, https://arxiv.org/abs/2507.17482
- Yu Li, Zhuoshi Pan, Honglin Lin, Mengyuan Sun, Conghui He, Lijun Wu, 23 Jul 2025, Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning, https://arxiv.org/abs/2507.17512
- Xinyao Liu, Diping Song, 23 Jul 2025, Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning, https://arxiv.org/abs/2507.17539
- Zhao Song, Song Yue, Jiahao Zhang, 23 Jul 2025, Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations, https://arxiv.org/abs/2507.17699
- Tao Xu, Dung-Yang Lee and Momiao Xiong, 21 Jul 2025, Reinforcement Learning in hyperbolic space for multi-step reasoning, https://arxiv.org/abs/2507.16864
- Rishi Parekh, Saisubramaniam Gopalakrishnan, Zishan Ahmad, Anirudh Deodhar, 23 Jul 2025, Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance, https://arxiv.org/abs/2507.17273
- Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, and Bohan Zhuang, 23 Jul 2025, R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning, https://arxiv.org/abs/2507.17307
- Xuchen Li, Xuzhao Li, Shiyu Hu, Kaiqi Huang, Wentao Zhang, 22 Jul 2025, CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos, https://arxiv.org/abs/2507.16878
- Aleksandr Perevalov, Andreas Both, 22 Jul 2025, Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning, https://arxiv.org/abs/2507.16971
- Zhe Xu, Ziyi Liu, Junlin Hou, Jiabo Ma, Cheng Jin, Yihui Wang, Zhixuan Chen, Zhengyu Zhang, Zhengrui Guo, Fengtao Zhou, Yingxue Xu, Xi Wang, Ronald Cheong Kin Chan, Li Liang, Hao Chen, 23 Jul 2025, A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model, https://arxiv.org/abs/2507.17303
- Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu, 23 Jul 2025, Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning, https://arxiv.org/abs/2507.17448
- Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing, 23 Jul 2025, MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs, https://arxiv.org/abs/2507.17476
- Nima Fathi, Amar Kumar, Tal Arbel, 22 Jul 2025, AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation, https://arxiv.org/abs/2507.16940
- Adrian Kaiser and Claudiu Leoveanu-Condrei and Ryan Gold and Marius-Constantin Dinu and Markus Hofmarcher, 23 Jul 2025, HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs, https://arxiv.org/abs/2507.15917
- Wei Sun, Qianlong Du, Fuwei Cui, Jiajun Zhang, 23 Jul 2025, An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning, https://arxiv.org/abs/2503.02382
- Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang, 23 Jul 2025, Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start, https://arxiv.org/abs/2505.22334
- Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang and Peng Zhang, 23 Jul 2025, Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning, https://arxiv.org/abs/2507.16802
- Shai Shalev-Shwartz and Amnon Shashua, 13 Jul 2025, From Reasoning to Super-Intelligence: A Search-Theoretic Perspective, https://arxiv.org/abs/2507.15865
- Yin Wu, Daniel Slieter, Vivek Subramanian, Ahmed Abouelazm, Robin Bohn, and J. Marius Z\"ollner, 17 Jul 2025, Why Braking? Scenario Extraction and Reasoning Utilizing LLM, https://arxiv.org/abs/2507.15874
- Andy E. Williams, 18 Jul 2025, The Recursive Coherence Principle: A Formal Constraint on Scalable Intelligence, Alignment, and Reasoning Architecture, https://arxiv.org/abs/2507.15880
- Lisa Dargasz, 20 Jul 2025, Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture, https://arxiv.org/abs/2507.15895
- Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo, 21 Jul 2025, Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization, https://arxiv.org/abs/2507.16110
- Fred Mutisya (1 and 2), Shikoh Gitau (1), Christine Syovata (2), Diana Oigara (2), Ibrahim Matende (2), Muna Aden (2), Munira Ali (2), Ryan Nyotu (2), Diana Marion (2), Job Nyangena (2), Nasubo Ongoma (1), Keith Mbae (1), Elizabeth Wamicha (1), Eric Mibuari (1), Jean Philbert Nsengemana (3), Talkmore Chidede (4) ((1) Qhala (Nairobi, Kenya), (2) Kenya Medical Association (Nairobi, Kenya), (3) Africa CDC (Addis Ababa, Ethiopia), (4) AfCFTA (Accra, Ghana)), 22 Jul 2025, Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens, https://arxiv.org/abs/2507.16322
- Lucas de Lara (IECL), 22 Jul 2025, Canonical Representations of Markovian Structural Causal Models: A Framework for Counterfactual Reasoning, https://arxiv.org/abs/2507.16370
- Bo Hou and Xin Tan and Kai Zheng and Fang Liu and Yinghao Zhu and Li Zhang, 22 Jul 2025, LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning, https://arxiv.org/abs/2507.16395
- Jean Lelong, Adnane Errazine and Annabelle Blangero, 22 Jul 2025, Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications, https://arxiv.org/abs/2507.16507
- Xu Yang, Qi Zhang, Shuming Jiang, Yaowen Xu, Zhaofan Zou, Hao Sun, Xuelong Li, 22 Jul 2025, METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark, https://arxiv.org/abs/2507.16206
- Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, Jacob Andreas, 22 Jul 2025, Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty, https://arxiv.org/abs/2507.16806
- Junhao Shen, Haiteng Zhao, Yuzhe Gu, Songyang Gao, Kuikun Liu, Haian Huang, Jianfei Gao, Dahua Lin, Wenwei Zhang, Kai Chen, 22 Jul 2025, Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning, https://arxiv.org/abs/2507.16814
- Isaac Shi and Zeyuan Li and Fan Liu and Wenli Wang and Lewei He and Yang Yang and Tianyu Shi, 13 Jul 2025, eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs, https://arxiv.org/abs/2507.15863
- Run-Ze Fan and Zengzhi Wang and Pengfei Liu, 22 Jul 2025, MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning, https://arxiv.org/abs/2507.16812
- Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen, Yu-Chiang Frank Wang, Fu-En Yang, 22 Jul 2025, ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning, https://arxiv.org/abs/2507.16815
- Xiuwei Chen, Wentao Hu, Hanhui Li, Jun Zhou, Zisheng Chen, Meng Cao, Yihan Zeng, Kui Zhang, Yu-Jie Yuan, Jianhua Han, Hang Xu, Xiaodan Liang, 22 Jul 2025, C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning, https://arxiv.org/abs/2507.16518
- Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum, 22 Jul 2025, Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning, https://arxiv.org/abs/2507.16746
- Edward Y. Chang, Zeyneb N. Kaya, Ethan Chang, 22 Jul 2025, The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning, https://arxiv.org/abs/2506.02139
- Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori, 22 Jul 2025, Hierarchical Reasoning Model, https://arxiv.org/abs/2506.21734
- Yitong Lin, Jiaying He, Jiahe Chen, Xinnan Zhu, Jianwei Zheng, Tao Bo, 22 Jul 2025, BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning, https://arxiv.org/abs/2507.14468
- Shangke Lyu, Linjuan Wu, Yuchen Yan, Xingyu Wu, Hao Li, Yongliang Shen, Peisheng Jiang, Weiming Lu, Jun Xiao, Yueting Zhuang, 22 Jul 2025, Hierarchical Budget Policy Optimization for Adaptive Reasoning, https://arxiv.org/abs/2507.15844
- Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng, 22 Jul 2025, BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning, https://arxiv.org/abs/2502.16660
- Xiachong Feng, Longxu Dou, Lingpeng Kong, 22 Jul 2025, Reasoning Does Not Necessarily Improve Role-Playing Ability, https://arxiv.org/abs/2502.16940
- Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu, Toby Boyd, Brad Hekman, Aaron Parisi, Chaoyi Zhang, Kornraphop Kawintiranon, Tania Bedrax-Weiss, Oliver Wang, Ya Xu, Ollie Purkiss, Uri Mendlovic, Ila\"i Deutel, Nam Nguyen, Adam Langley, Flip Korn, Lucia Rossazza, Alexandre Ram\'e, Sagar Waghmare, Helen Miller, Nathan Byrd, Ashrith Sheshan, Raia Hadsell Sangnie Bhardwaj, Pawel Janus, Tero Rissa, Dan Horgan, Sharon Silver, Ayzaan Wahid, Sergey Brin, Yves Raimond, Klemen Kloboves, et al. (3255 additional authors not shown), 22 Jul 2025, Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities, https://arxiv.org/abs/2507.06261
- SaiBarath Sundar, Pranav Satheesan, Udayaadithya Avadhanam, 23 Jul 2025, I2I-STRADA -- Information to Insights via Structured Reasoning Agent for Data Analysis, https://arxiv.org/abs/2507.17874
- Mutian Yang, Jiandong Gao, and Ji Wu, 24 Jul 2025, Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory, https://arxiv.org/abs/2507.18178
- Zhuang Qiang Bok and Watson Wei Khong Chua, 24 Jul 2025, Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios, https://arxiv.org/abs/2507.18368
- Shiye Lei, Zhihao Cheng, Kai Jia, Dacheng Tao, 24 Jul 2025, Revisiting LLM Reasoning via Information Bottleneck, https://arxiv.org/abs/2507.18391
- Xiaoxu Guo, Siyan Liang, Yachao Cui, Juxiang Zhou, Lei Wang, Han Cao, 21 Jul 2025, Multimodal Fine-grained Reasoning for Post Quality Evaluation, https://arxiv.org/abs/2507.17934
- Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta, 24 Jul 2025, Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models, https://arxiv.org/abs/2507.18014
- Matthias Otth, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 24 Jul 2025, Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning, https://arxiv.org/abs/2507.18122
- Dongyang Guo, Yasmeen Abdrabou, Enkeleda Thaqi, Enkelejda Kasneci, 24 Jul 2025, Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning, https://arxiv.org/abs/2507.18252
- Rana Alshaikh, Israa Alghanmi, Shelan Jeawak, 24 Jul 2025, AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data, https://arxiv.org/abs/2507.18442
- Busra Icoz, Goksel Biricik, 24 Jul 2025, Automated Code Review Using Large Language Models with Symbolic Reasoning, https://arxiv.org/abs/2507.18476
- Andres M Bran, Theo A Neukomm, Daniel P Armstrong, Zlatko Jon\v{c}ev, Philippe Schwaller, 23 Jul 2025, Chemical reasoning in LLMs unlocks strategy-aware synthesis planning and reaction mechanism elucidation, https://arxiv.org/abs/2503.08537
- Bowen Zhang, Pengcheng Luo, 24 Jul 2025, OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM, https://arxiv.org/abs/2503.10009
- David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan, Giorgia Ramponi, Bernhard Sch\"olkopf, Zhijing Jin, 24 Jul 2025, Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games, https://arxiv.org/abs/2506.23276
- Martina Miliani, Serena Auriemma, Alessandro Bondielli, Emmanuele Chersoni, Lucia Passaro, Irene Sucameli, Alessandro Lenci, 24 Jul 2025, ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models, https://arxiv.org/abs/2502.15487
- Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Ding Wang, Botian Shi, 24 Jul 2025, Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning, https://arxiv.org/abs/2503.12972
- Renato Ghisellini and Remo Pareschi and Marco Pedroni and Giovanni Battista Raggi, 18 Jul 2025, From Extraction to Synthesis: Entangled Heuristics for Agent-Augmented Strategic Reasoning, https://arxiv.org/abs/2507.13768
- Shmuel Berman, Jia Deng, 4 Jul 2025, VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs, https://arxiv.org/abs/2507.13361
- Binbin Ji, Siddharth Agrawal, Qiance Tang, and Yvonne Wu, 6 Jul 2025, Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning, https://arxiv.org/abs/2507.13362
- Peican Zhu, Yubo Jing, Le Cheng, Bin Chen, Xiaodong Cui, Lianwei Wu, Keke Tang, 17 Jul 2025, SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection, https://arxiv.org/abs/2507.13415
- Ishant Chintapatla, Kazuma Choji, Naaisha Agarwal, Andrew Lin, Hannah You, Charles Duong, Kevin Zhu, Sean O'Brien, Vasu Sharma, 17 Jul 2025, COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark, https://arxiv.org/abs/2507.13405
- Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 18 Jul 2025, Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567
- Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar, 18 Jul 2025, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, https://arxiv.org/abs/2506.06941
- Zhiting Mei, Christina Zhang, Tenny Yin, Justin Lidard, Ola Shorinwa, Anirudha Majumdar, 18 Jul 2025, Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?, https://arxiv.org/abs/2506.18183
- Ahmed Bahloul, Simon Malberg, 18 Jul 2025, From Roots to Rewards: Dynamic Tree Reasoning with RL, https://arxiv.org/abs/2507.13142
- Thomas Foster, Anya Sims, Johannes Forkel, Mattie Fellows, Jakob Foerster, 18 Jul 2025, Learning to Reason at the Frontier of Learnability, https://arxiv.org/abs/2502.12272
- Constantin Venhoff, Iv\'an Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda, 17 Jul 2025, Understanding Reasoning in Thinking Language Models via Steering Vectors, https://arxiv.org/abs/2506.18167
- Jiayu Song, Mahmud Elahi Akhter, Dana Atzil Slonim, Maria Liakata, 18 Jul 2025, Temporal reasoning for timeline summarisation in social media, https://arxiv.org/abs/2501.00152
- Z.Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z.F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan, 18 Jul 2025, DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition, https://arxiv.org/abs/2504.21801
- Bingqian Lin, Yunshuang Nie, Khun Loun Zai, Ziming Wei, Mingfei Han, Rongtao Xu, Minzhe Niu, Jianhua Han, Liang Lin, Cewu Lu, Xiaodan Liang, 18 Jul 2025, EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation, https://arxiv.org/abs/2506.01551
- Humza Sami, Mubashir ul Islam, Pierre-Emmanuel Gaillardon, Valerio Tenace, 18 Jul 2025, Adaptive Multi-Agent Reasoning via Automated Workflow Generation, https://arxiv.org/abs/2507.14393
- Michael J. Zellinger and Matt Thomson, 18 Jul 2025, Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering, https://arxiv.org/abs/2507.14406
- Cole Robertson and Philip Wolff, 21 Jul 2025, LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning, https://arxiv.org/abs/2507.15521
- Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li, 18 Jul 2025, A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning, https://arxiv.org/abs/2507.14295
- Ole-Christoffer Granmo and Youmna Abdelwahab and Per-Arne Andersen and Paul F. A. Clarke and Kunal Dumbre and Ylva Gr{\o}nnins{\ae}ter and Vojtech Halenka and Runar Helin and Lei Jiao and Ahmed Khalid and Rebekka Omslandseter and Rupsa Saha and Mayur Shende and Xuan Zhang, 20 Jul 2025, The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs, https://arxiv.org/abs/2507.14874
- Yiyuan Yang, Zichuan Liu, Lei Song, Kai Ying, Zhiguang Wang, Tom Bamford, Svitlana Vyetrenko, Jiang Bian, Qingsong Wen, 20 Jul 2025, Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback, https://arxiv.org/abs/2507.15066
- Jiaao Li, Kaiyuan Li, Chen Gao, Yong Li, Xinlei Chen, 21 Jul 2025, EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent, https://arxiv.org/abs/2507.15428
- Seok Hwan Song, Mohna Chakraborty, Qi Li, Wallapak Tavanapong, 21 Jul 2025, Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?, https://arxiv.org/abs/2507.15707
- Sahana Srinivasan, Xuguang Ai, Thaddaeus Wai Soon Lo, Aidan Gilson, Minjie Zou, Ke Zou, Hyunjae Kim, Mingjia Yang, Krithi Pushpanathan, Samantha Yew, Wan Ting Loke, Jocelyn Goh, Yibing Chen, Yiming Kong, Emily Yuelei Fu, Michelle Ongyong Hui, Kristen Nwanyanwu, Amisha Dave, Kelvin Zhenghao Li, Chen-Hsin Sun, Mark Chia, Gabriel Dawei Yang, Wendy Meihua Wong, David Ziyou Chen, Dianbo Liu, Maxwell Singer, Fares Antaki, Lucian V Del Priore, Jost Jonas, Ron Adelman, Qingyu Chen, Yih-Chung Tham, 21 Jul 2025, BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning, https://arxiv.org/abs/2507.15717
- Yihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long, Lyle Ungar, 21 Jul 2025, The Impact of Language Mixing on Bilingual LLM Reasoning, https://arxiv.org/abs/2507.15849
- Fengxiang Cheng, Haoxuan Li, Fenrong Liu, Robert van Rooij, Kun Zhang, Zhouchen Lin, 21 Jul 2025, Empowering LLMs with Logical Reasoning: A Comprehensive Survey, https://arxiv.org/abs/2502.15652
- Shaohang Wei, Wei Li, Feifan Song, Wen Luo, Tianyi Zhuang, Haochen Tan, Zhijiang Guo, Houfeng Wang, 19 Jul 2025, TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios, https://arxiv.org/abs/2505.12891
- Kun Xiang and Heng Li and Terry Jingchen Zhang and Yinya Huang and Zirong Liu and Peixin Qu and Jixi He and Jiaqi Chen and Yu-Jie Yuan and Jianhua Han and Hang Xu and Hanhui Li and Mrinmaya Sachan and Xiaodan Liang, 21 Jul 2025, SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning, https://arxiv.org/abs/2505.19099
- Xin Wang, Jiyao Liu, Yulong Xiao, Junzhi Ning, Lihao Liu, Junjun He, Botian Shi, Kaicheng Yu, 21 Jul 2025, THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?, https://arxiv.org/abs/2506.21763
- Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas, 18 Jul 2025, Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning, https://arxiv.org/abs/2507.10571
- Michal Spiegel, Michal \v{S}tef\'anik, Marek Kadl\v{c}\'ik, Josef Kucha\v{r}, 21 Jul 2025, Attend or Perish: Benchmarking Attention in Algorithmic Reasoning, https://arxiv.org/abs/2503.01909
- Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou, 21 Jul 2025, Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning, https://arxiv.org/abs/2505.16122
- Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal, 18 Jul 2025, Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning, https://arxiv.org/abs/2503.05641
- Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han, 21 Jul 2025, Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, https://arxiv.org/abs/2503.09516
- Bo-Cheng Chiu, Jen-Jee Chen, Yu-Chee Tseng and Feng-Chi Chen, 21 Jul 2025, DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs, https://arxiv.org/abs/2506.11558
- Yiming Yang, Yueru Luo, Bingkun He, Hongbin Lin, Suzhong Fu, Chao Zheng, Zhipeng Cao, Erlong Li, Chao Yan, Shuguang Cui, Zhen Li, 20 Jul 2025, TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving, https://arxiv.org/abs/2507.00709
- Anirudh Choudhary, Mosbah Aouad, Krishnakant Saboo, Angelina Hwang, Jacob Kechter, Blake Bordeaux, Puneet Bhullar, David DiCaudo, Steven Nelson, Nneka Comfere, Emma Johnson, Olayemi Sokumbi, Jason Sluzevich, Leah Swanson, Dennis Murphree, Aaron Mangold, Ravishankar Iyer, 19 Jul 2025, RACR-MIL: Rank-aware contextual reasoning for weakly supervised grading of squamous cell carcinoma using whole slide images, https://arxiv.org/abs/2308.15618
- Blair Johnson, Clayton Kerce, Faramarz Fekri, 8 Aug 2025, GLIDR: Graph-Like Inductive Logic Programming with Differentiable Reasoning, https://arxiv.org/abs/2508.06716
- Amit Dhanda, 10 Aug 2025, Multi-Dimensional Summarization Agents with Context-Aware Reasoning over Enterprise Tables, https://arxiv.org/abs/2508.07186
- Yi Tang, Kaini Wang, Yang Chen, Guangquan Zhou, 10 Aug 2025, EndoAgent: A Memory-Guided Reflective Agent for Intelligent Endoscopic Vision-to-Decision Reasoning, https://arxiv.org/abs/2508.07292
- He Kong, Die Hu, Jingguo Ge, Liangxiong Li, Hui Li and Tong Li, 10 Aug 2025, Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning, https://arxiv.org/abs/2508.07382
- Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap, 11 Aug 2025, 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning, https://arxiv.org/abs/2508.07667
- Zhenpeng Su, Leiyu Pan, Xue Bai, Dening Liu, Guanting Dong, Jiaming Huang, Wenping Hu, Guorui Zhou, 11 Aug 2025, Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization, https://arxiv.org/abs/2508.07629
- Zihe Liu, Jiashun Liu, Yancheng He, Weixun Wang, Jiaheng Liu, Ling Pan, Xinyu Hu, Shaopan Xiong, Ju Huang, Jian Hu, Shengyi Huang, Siran Yang, Jiamang Wang, Wenbo Su, Bo Zheng, 11 Aug 2025, Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning, https://arxiv.org/abs/2508.08221
- Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi, 11 Aug 2025, Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent, https://arxiv.org/abs/2508.08222
- Logan Cross, Erik Brockbank, Tobias Gerstenberg, Judith E. Fan, Daniel L. K. Yamins, Nick Haber, 25 Jul 2025, Understanding Human Limits in Pattern Recognition: A Computational Model of Sequential Reasoning in Rock, Paper, Scissors, https://arxiv.org/abs/2508.06503
- Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, Zhicheng Dou, 9 Aug 2025, ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability, https://arxiv.org/abs/2508.07050
- Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali, 9 Aug 2025, Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning, https://arxiv.org/abs/2508.07101
- Fabio Vitali, 10 Aug 2025, From Knowledge to Conjectures: A Modal Framework for Reasoning about Hypotheses, https://arxiv.org/abs/2508.07304
- Anirudh Iyengar Kaniyar Narayana Iyengar, Srija Mukhopadhyay, Adnan Qidwai, Shubhankar Singh, Dan Roth, Vivek Gupta, 11 Aug 2025, InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information, https://arxiv.org/abs/2508.07630
- Chaohong Guo, Xun Mo, Yongwei Nie, Xuemiao Xu, Chao Xu, Fei Yu, and Chengjiang Long, 11 Aug 2025, TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding, https://arxiv.org/abs/2508.07683
- Shoaib Ahmmad, Zubayer Ahmed Aditto, Md Mehrab Hossain, Noushin Yeasmin, Shorower Hossain, 11 Aug 2025, Autonomous Navigation of Cloud-Controlled Quadcopters in Confined Spaces Using Multi-Modal Perception and LLM-Driven High Semantic Reasoning, https://arxiv.org/abs/2508.07885
- Meixiu Long, Duolin Sun, Dan Yang, Junjie Wang, Yue Shen, Jian Wang, Peng Wei, Jinjie Gu, Jiahai Wang, 11 Aug 2025, DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval, https://arxiv.org/abs/2508.07995
- Zhonghao Yan, Muxi Diao, Yuxuan Yang, Jiayuan Xu, Kaizhou Zhang, Ruoyan Jing, Lele Yang, Yanxi Liu, Kongming Liang and Zhanyu Ma, 11 Aug 2025, MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision, https://arxiv.org/abs/2508.08177
- Shansong Wang and Mingzhe Hu and Qiang Li and Mojtaba Safari and Xiaofeng Yang, 11 Aug 2025, Capabilities of GPT-5 on Multimodal Medical Reasoning, https://arxiv.org/abs/2508.08224
- Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi, 8 Aug 2025, Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models, https://arxiv.org/abs/2502.19918
- Alberto Pozanco, Marianela Morales, Daniel Borrajo, Manuela Veloso, 11 Aug 2025, A Planning Compilation to Reason about Goal Achievement at Planning Time, https://arxiv.org/abs/2503.09545
- Annie Wong, Thomas B\"ack, Aske Plaat, Niki van Stein and Anna V. Kononova, 10 Aug 2025, Reasoning Capabilities of Large Language Models on Dynamic Tasks, https://arxiv.org/abs/2505.10543
- Lucia Cipolina-Kun and Marianna Nezhurina and Jenia Jitsev, 10 Aug 2025, Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilites of Large Language Models via Game Play, https://arxiv.org/abs/2508.03368
- Yiye Chen, Harpreet Sawhney, Nicholas Gyd\'e, Yanan Jian, Jack Saunders, Patricio Vela, Ben Lundell, 8 Aug 2025, Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System, https://arxiv.org/abs/2502.03450
- Rong Cheng, Jinyi Liu, Yan Zheng, Fei Ni, Jiazhen Du, Hangyu Mao, Fuzheng Zhang, Bo Wang, Jianye Hao, 9 Aug 2025, DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering, https://arxiv.org/abs/2504.18243
- Xuandong Zhao, Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, 11 Aug 2025, Learning to Reason without External Rewards, https://arxiv.org/abs/2505.19590
- Yunpeng Gao, Zhigang Wang, Pengfei Han, Linglin Jing, Dong Wang, Bin Zhao, 11 Aug 2025, Exploring Spatial Representation to Enhance LLM Reasoning in Aerial Vision-Language Navigation, https://arxiv.org/abs/2410.08500
- Seyed Pouyan Mousavi Davoudi, Amin Gholami Davodi, Alireza Amiri-Margavi, Alireza Shafiee Fard, Mahdi Jafari, 9 Aug 2025, Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth, https://arxiv.org/abs/2502.20758
- Lo Pang-Yun Ting, Chengshuai Zhao, Yu-Hua Zeng, Yuan Jee Lim, Kun-Ta Chuang, Huan Liu, 9 Aug 2025, Leaps Beyond the Seen: Reinforced Reasoning Augmented Generation for Clinical Notes, https://arxiv.org/abs/2506.05386
- Mihir Godbole, Xiangbo Gao, Zhengzhong Tu, 9 Aug 2025, DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving, https://arxiv.org/abs/2506.17590
- Minghao Guo, Xi Zhu, Jingyuan Huang, Kai Mei, Yongfeng Zhang, 11 Aug 2025, ReaGAN: Node-as-Agent-Reasoning Graph Agentic Network, https://arxiv.org/abs/2508.00429
- Andrew Kiruluta, 18 Jul 2025, Wavelet Logic Machines: Learning and Reasoning in the Spectral Domain Without Neural Networks, https://arxiv.org/abs/2507.19514
- Aditya Sharma, Linh Nguyen, Ananya Gupta, Chengyu Wang, Chiamaka Adebayo, and Jakub Kowalski, 26 Jul 2025, Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning, https://arxiv.org/abs/2507.19855
- Enjun Du, Siyi Liu, Yongqi Zhang, 28 Jul 2025, Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning, https://arxiv.org/abs/2507.20498
- Ansh Poonia, Maeghal Jain, 28 Jul 2025, Dissecting Persona-Driven Reasoning in Language Models via Activation Patching, https://arxiv.org/abs/2507.20936
- Dong Du, Shulin Liu, Tao Yang, Shaohua Chen, Yang Li, 26 Jul 2025, UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities, https://arxiv.org/abs/2507.19766
- Eunkyu Park, Wesley Hanwen Deng, Gunhee Kim, Motahhare Eslami, Maarten Sap, 27 Jul 2025, Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations, https://arxiv.org/abs/2507.20409
- Xun Liang, Xin Guo, Zhongming Jin, Weihang Pan, Penghui Shang, Deng Cai, Binbin Lin, Jieping Ye, 28 Jul 2025, Enhancing Spatial Reasoning through Visual and Textual Thinking, https://arxiv.org/abs/2507.20529
- Adrien Bazoge, 28 Jul 2025, MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation, https://arxiv.org/abs/2507.20917
- Aleksandar Pavlovic, Emanuel Sallinger, Steven Schockaert, 26 Jul 2025, Faithful Differentiable Reasoning with Reshuffled Region-based Embeddings, https://arxiv.org/abs/2406.09529
- Zheng Zhang, Nuoqian Xiao, Qi Chai, Deheng Ye, Hao Wang, 28 Jul 2025, MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind, https://arxiv.org/abs/2504.18039
- Kun Li, Zhennan Wu, Shoupeng Wang, Jia Wu, Shirui Pan and Wenbin Hu, 28 Jul 2025, DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery, https://arxiv.org/abs/2505.13940
- Yun Qu, Qi Wang, Yixiu Mao, Vincent Tao Hu, Bj\"orn Ommer, Xiangyang Ji, 28 Jul 2025, Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?, https://arxiv.org/abs/2507.04632
- Xuzhao Li and Xuchen Li and Shiyu Hu and Yongzhen Guo and Wentao Zhang, 26 Jul 2025, VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains, https://arxiv.org/abs/2507.09884
- Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Yue Zhao, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji, 26 Jul 2025, MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning, https://arxiv.org/abs/2409.12059
- Shamus Sim and Tyrone Chen, 28 Jul 2025, Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models, https://arxiv.org/abs/2412.15748
- Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu, 28 Jul 2025, LIMO: Less is More for Reasoning, https://arxiv.org/abs/2502.03387
- Runyu Jiao, Alice Fasoli, Francesco Giuliari, Matteo Bortolon, Sergio Povoli, Guofeng Mei, Yiming Wang, Fabio Poiesi, 28 Jul 2025, Free-form language-based robotic reasoning and grasping, https://arxiv.org/abs/2503.13082
- Xiangning Yu, Zhuohan Wang, Linyi Yang, Haoxuan Li, Anjie Liu, Xiao Xue, Jun Wang, Mengyue Yang, 26 Jul 2025, Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning, https://arxiv.org/abs/2506.09853
- Yifu Han and Geo Zhang, 27 Jul 2025, Reinforcement learning fine-tuning of language model for instruction following and math reasoning, https://arxiv.org/abs/2506.21560
- Shenghe Zheng, Qianjia Cheng, Junchi Yao, Mengsong Wu, Haonan He, Ning Ding, Yu Cheng, Shuyue Hu, Lei Bai, Dongzhan Zhou, Ganqu Cui, Peng Ye, 28 Jul 2025, Scaling Physical Reasoning with the PHYSICS Dataset, https://arxiv.org/abs/2506.00022
- Khanh Son Pham, Christian Witte, Jens Behley, Johannes Betz, Cyrill Stachniss, 28 Jul 2025, Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps, https://arxiv.org/abs/2507.01397
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: