Aussie AI

Long List of LLM Reasoning Papers

  • Last Updated 26 August, 2025
  • by David Spuler, Ph.D.

Reasoning Research Papers

This page is a long list of paper citations in the area of LLM reasoning. For a more useful discussion and categorization of research papers, see:

Blog articles: You may also be interested in our recent blog articles related to reasoning:

Multi-Step Inference for Reasoning

A general list of multi-step reasoning or "test time compute" reasoning papers:

General Reasoning Papers

  • Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun, 11 Jun 2024 (v2), Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies, https://arxiv.org/abs/2406.06461
  • Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin, 13 Jun 2024, Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2406.09136 Code: https://github.com/sail-sg/CPO
  • Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 3 Dec 2023 (v2), Tree of Thoughts: Deliberate Problem Solving with Large Language Models, https://arxiv.org/abs/2305.10601 Code: https://github.com/princeton-nlp/tree-of-thought-llm
  • Hayden Field, June 20, 2024, OpenAI competitor Anthropic announces its most powerful AI yet, CNBC, https://www.cnbc.com/2024/06/20/anthropic-claude-3point5-sonnet-ai-announced.html
  • Wai-Chung Kwan, Xingshan Zeng, Yuxin Jiang, Yufei Wang, Liangyou Li, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong, 30 Jan 2024, MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models, https://arxiv.org/abs/2401.16745 Code: https://github.com/KwanWaiChung/MT-Eval
  • M.Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk et al., “Graph of thoughts: Solving elaborate problems with large language models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17682–17690. https://arxiv.org/abs/2308.09687
  • Q. Sun, Z. Yin, X. Li, Z. Wu, X. Qiu, and L. Kong, “Corex: Pushing the boundaries of complex reasoning through multi model collaboration,” arXiv preprint arXiv:2310.00280, 2023. https://arxiv.org/abs/2310.00280
  • Tianle Li, Wei-Lin Chiang, Lisa Dunlap, May 20, 2024, Introducing Hard Prompts Category in Chatbot Arena, https://lmsys.org/blog/2024-05-17-category-hard/
  • Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
  • Rahul Verma, June 21, 2024, OpenAI's GPT-5 Pushed Back To Late 2025, But Promises Ph.D.-Level Abilities, https://in.mashable.com/tech/77593/openais-gpt-5-pushed-back-to-late-2025-but-promises-phd-level-abilities
  • Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab, 17 Jun 2024, Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs, https://arxiv.org/abs/2406.11695
  • Sachit Menon, Richard Zemel, Carl Vondrick, 20 Jun 2024, Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities, https://arxiv.org/abs/2406.14562
  • Lina M. Rojas-Barahona, 2024, Talking to Machines: do you read me?. Computation and Language, Universite de Lorraine, https://hal.science/tel-04620199/document
  • Rafe Brena, May 24, 2024, 3 Key Differences Between Human and Machine Intelligence You Need to Know: AI is an alien intelligence https://pub.towardsai.net/3-key-differences-between-human-and-machine-intelligence-you-need-to-know-7a34dcee2cd3 (Good article about how LLMs don't have "emotions" or "intelligence" and they don't "pause".)
  • Vishal Rajput, Apr 11, 2024, What’s next for AI: AI agentic workflows? https://medium.com/aiguys/next-for-llms-and-rag-ai-agentic-workflows-1869ba0a6796
  • Rachel Metz, July 12, 2024, OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving: The company believes its technology is approaching the second level of five on the path to artificial general intelligence, Bloomberg, https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai?sref=P6Q0mxvj
  • Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu, 1 May 2024, Causal Evaluation of Language Models, https://arxiv.org/abs/2405.00622 Project: https://opencausalab.github.io/CaLM
  • Anna Tong and Katie Paul July 16, 2024, Exclusive: OpenAI working on new reasoning technology under code name ‘Strawberry’, https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/
  • Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao, 29 Jul 2024, MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, https://arxiv.org/abs/2407.20183 Code: https://github.com/InternLM/MindSearch Project: https://mindsearch.netlify.app
  • Ethan Mollick, May 12, 2024, Superhuman? What does it mean for AI to be better than a human? And how can we tell? https://www.oneusefulthing.org/p/superhuman
  • Ignacio de Gregorio, Aug 2024, Grokking, a New Form of Reasoning, https://medium.com/@ignacio.de.gregorio.noblejas/grokking-a-new-form-of-reasoning-6785ea89d2ec
  • Zarif Bin Akhtar, Mapping Generative Artificial Intelligence (GAI's) Exciting Future: From Gemini to Q* and Beyond, https://publications.eai.eu/index.php/airo/article/view/5962 https://doi.org/10.4108/airo.5962 PDF: https://publications.eai.eu/index.php/airo/article/view/5962/3329
  • Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu, 16 Aug 2024, Visual Agents as Fast and Slow Thinkers, https://arxiv.org/abs/2408.08862
  • Adam Zewe, June 14, 2024, Technique improves the reasoning capabilities of large language models: Combining natural language and programming, the method enables LLMs to solve numerical, analytical, and language-based tasks transparently, MIT News, https://news.mit.edu/2024/technique-improves-reasoning-capabilities-large-language-models-0614
  • Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng, 6 Feb 2024, Self-Discover: Large Language Models Self-Compose Reasoning Structures, https://arxiv.org/abs/2402.03620
  • Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su, 4 Feb 2024 (v2), Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning, https://arxiv.org/abs/2401.17686
  • Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian 14 Mar 2024, Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey, McAuley, Wei Ai, Furong Huang, https://arxiv.org/abs/2403.09606
  • Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou, 25 Aug 2024, Path-Consistency: Prefix Enhancement for Efficient Inference in LLM, https://arxiv.org/abs/2409.01281
  • Cogni Down Under, Sep 2024, Reflection 70B: The AI That Thinks Before It Speaks, https://medium.com/@cognidownunder/reflection-70b-the-ai-that-thinks-before-it-speaks-8a70d3a0e38a
  • Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
  • Alberto Romero. Sep 10, 2024, Big News: OpenAI to Launch AI Model That Can Reason in 2 Weeks, https://www.thealgorithmicbridge.com/p/big-news-openai-to-launch-ai-model
  • Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Bo Tang, Feiyu Xiong, Zhiyu Li, 5 Sep 2024, Attention Heads of Large Language Models: A Survey, https://arxiv.org/abs/2409.03752 https://github.com/IAAR-Shanghai/Awesome-Attention-Heads (This survey is about making attention mechanisms more performant, accurate and intelligent, rather than improving efficiency.)
  • Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
  • Louis Bouchard, Sep 12, 2024, OpenAI's o1 Model: The Future of Reasoning AI? What Sets It Apart, How OpenAI's o1 Model Thinks Through Problems (And Why It's Slower), https://www.louisbouchard.ai/openai-o1/
  • OpenAI, September 12, 2024, Introducing OpenAI o1-preview, A new series of reasoning models for solving hard problems. https://openai.com/index/introducing-openai-o1-preview/
  • OpenAI, September 12, 2024, Learning to Reason with LLMs, https://openai.com/index/learning-to-reason-with-llms/
  • Nathan Lambert, Sep 05, 2024, OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference, Whether or not scaling works, we should spend more on inference, https://www.interconnects.ai/p/openai-strawberry-and-inference-scaling-laws
  • Ignacio de Gregorio Noblejas, September 15, 2024, OpenAI Launches o1. Here’s All You Need to Know, https://thetechoasis.beehiiv.com/p/openai-launches-o1-heres-need-know
  • Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li, 27 Jun 2024 (v2), ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967
  • Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561
  • Michael Nuñez, September 16, 2024, SambaNova challenges OpenAI’s o1 model with Llama 3.1-powered demo on HuggingFace, https://venturebeat.com/ai/sambanova-challenges-openais-o1-model-with-llama-3-1-powered-demo-on-huggingface/
  • Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett, 18 Sep 2024, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, https://arxiv.org/abs/2409.12183
  • Santosh Kumar Radha, Yasamin Nouri Jelyani, Ara Ghukasyan, Oktay Goktas, 19 Sep 2024, Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning, https://arxiv.org/abs/2409.12618
  • Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal, 18 Sep 2024, MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning, https://arxiv.org/abs/2409.12147 https://github.com/dinobby/MAgICoRe
  • Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
  • Artem Shelamanov, Sep 2024, Why OpenAI’s o1 Model Is A Scam, https://pub.towardsai.net/why-openais-o1-model-is-a-scam-eb3356c3d70e
  • Chloe Berger, October 2, 2024, Mark Cuban says his puppy is ‘smarter than AI is today’, https://fortune.com/2024/10/01/mark-cuban-dog-puppy-smarter-than-ai/
  • Julia Love and Rachel Metz, October 2, 2024, Google Is Working on Reasoning AI, Chasing OpenAI’s Efforts, https://www.bloomberg.com/news/articles/2024-10-02/google-is-working-on-reasoning-ai-chasing-openai-s-efforts
  • Zhenwen Liang, Ye Liu, Tong Niu, Xiangliang Zhang, Yingbo Zhou, Semih Yavuz, 5 Oct 2024, Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification, https://arxiv.org/abs/2410.05318
  • Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar, 7 Oct 2024, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, https://arxiv.org/abs/2410.05229
  • Sonya Huang, Pat Grady, and o1, Sequoia, October 9, 2024 Generative AI’s Act o1, https://www.sequoiacap.com/article/generative-ais-act-o1/
  • Ignacio de Gregorio Noblejas, October 20, 2024, The Anti-LLM Revolution Begins,https://thetechoasis.beehiiv.com/p/the-anti-llm-revolution-begins
  • By Asif Razzaq, October 13, 2024, OpenR: An Open-Source AI Framework Enhancing Reasoning in Large Language Models, https://www.marktechpost.com/2024/10/13/openr-an-open-source-ai-framework-enhancing-reasoning-in-large-language-models/
  • Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
  • Latent Space, Nov 05, 2024, Inference, Fast and Slow. When System 1/System 2 analogies are not enough: The 6 types of LLM inference https://www.latent.space/p/inference-fast-and-slow
  • Will Lockett Nov 2024, Apple Calls BS On The AI Revolution, They aren’t late to the AI game; they are just the only sceptical big tech company. https://medium.com/predict/apple-calls-bullshit-on-the-ai-revolution-ae38fdf83392
  • Anthony Ha, Nov 2024, OpenAI reportedly developing new strategies to deal with AI improvement slowdown, https://techcrunch.com/2024/11/09/openai-reportedly-developing-new-strategies-to-deal-with-ai-improvement-slowdown/
  • Michael Nuñez, November 11, 2024, AI’s math problem: FrontierMath benchmark shows how far technology still has to go, https://venturebeat.com/ai/ais-math-problem-frontiermath-benchmark-shows-how-far-technology-still-has-to-go/
  • Kyle Orland, 13 Nov 2024, What if AI doesn’t just keep getting better forever? New reports highlight fears of diminishing returns for traditional LLM training. https://arstechnica.com/ai/2024/11/what-if-ai-doesnt-just-keep-getting-better-forever/
  • Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
  • Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen, 14 Oct 2024 (v3), Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, https://arxiv.org/abs/2408.02442
  • Janelle Teng, Nov 26, 2024, AI's reasoning quandary, https://nextbigteng.substack.com/p/ais-reasoning-quandary
  • Qwen Team, November 28, 2024, QwQ: Reflect Deeply on the Boundaries of the Unknown, https://qwenlm.github.io/blog/qwq-32b-preview/
  • mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
  • Tom Schaul, 25 Nov 2024, Boundless Socratic Learning with Language Games, https://arxiv.org/abs/2411.16905
  • Alberto Romero, Dec 06, 2024, OpenAI Announces o1 Model And ChatGPT Pro ($200/Mo). OpenAI Christmas event: Day 1 of 12, https://www.thealgorithmicbridge.com/p/openai-announces-o1-model-and-chatgpt
  • Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister, 29 Nov 2024, Reverse Thinking Makes LLMs Stronger Reasoners, https://arxiv.org/abs/2411.19865
  • Tiernan Ray, Dec. 10, 2024, How Cerebras boosted Meta's Llama to 'frontier model' performance The company also demonstrates initial training of a one-trillion-parameter AI model on a single machine using conventional DDR5 memory chips. https://www.zdnet.com/article/how-cerebras-boosted-metas-llama-to-frontier-model-performance/
  • Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769 (Performing reasoning in a model trained to operate in the embedding vector space, rather than more directly in the token space.)
  • Arda Sevinc, Abdurrahman Gumus, 9 Dec 2024, AutoReason: Automatic Few-Shot Reasoning Decomposition, https://arxiv.org/abs/2412.06975 https://github.com/miralab-ai/autoreason
  • Kyle Wiggers, December 14, 2024, ‘Reasoning’ AI models have become a trend, for better or worse, https://techcrunch.com/2024/12/14/reasoning-ai-models-have-become-a-trend-for-better-or-worse/
  • Vincent-Pierre Berges, Barlas Oguz, December 12, 2024, Memory Layers at Scale, Meta, https://ai.meta.com/research/publications/memory-layers-at-scale/ https://github.com/facebookresearch/memory (Augmention of an LLM with an additional key-value associative memory, by replacing some FFNs with a "memory layer".)
  • Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
  • Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
  • Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
  • Agnostiq, Dec 2024, multi-agent-llm: LLM based Multi-Agent methods: Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT), https://github.com/AgnostiqHQ/multi-agent-llm
  • Denise Holt, Dec 18, 2024, VERSES AI Crushes OpenAI o1 in Head to Head Competition: VERSES AI 's New Genius™ Platform Delivers Far More Performance than Open AI's Most Advanced Model at a Fraction of the Cost. https://deniseholt.substack.com/p/verses-ai-crushes-openai-o1-in-head
  • Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen, 18 Dec 2024 (v2), Are Your LLMs Capable of Stable Reasoning? https://arxiv.org/abs/2412.13147 https://github.com/open-compass/GPassK
  • Alberto Romero, Dec 21, 2024, OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI. Incredible, a miracle, more than just a better state-of-the-art AI model. https://www.thealgorithmicbridge.com/p/openai-o3-model-is-a-message-from
  • Sabrina Ortiz, Dec. 20, 2024, OpenAI unveils its most advanced o3 reasoning model on its last day of 'shipmas', https://www.zdnet.com/article/openai-unveils-its-most-advanced-o3-reasoning-model-on-its-last-day-of-shipmas/
  • Jie He, Nan Hu, Wanqiu Long, Jiaoyan Chen, Jeff Z. Pan, 22 Dec 2024, MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge, https://arxiv.org/abs/2412.17032 https://github.com/probe2/multi-hop/ (Model evaluation of reasoning abilities.)
  • Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao, 24 Dec 2024, Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search, https://arxiv.org/abs/2412.18319 https://github.com/HJYao00/Mulberry (Multimodal multi-step reasoning like CoT.)
  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
  • Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang, 25 Dec 2024, HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs, https://arxiv.org/abs/2412.18925
  • Lori Dajose, December 17, 2024, Thinking Slowly: The Paradoxical Slowness of Human Behavior, https://www.caltech.edu/about/news/thinking-slowly-the-paradoxical-slowness-of-human-behavior
  • Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
  • Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang, 21 Nov 2024 (v2), Disentangling Memory and Reasoning Ability in Large Language Models, https://arxiv.org/abs/2411.13504 https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning
  • Allen Nie, Yi Su, Bo Chang, Jonathan N. Lee, Ed H. Chi, Quoc V. Le, Minmin Chen, 8 Oct 2024, EVOLvE: Evaluating and Optimizing LLMs For Exploration, https://arxiv.org/abs/2410.06238
  • Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar, 14 Oct 2024, Thinking LLMs: General Instruction Following with Thought Generation, https://arxiv.org/abs/2410.10630 (Training an LLM to reason by generating additional "thoughts" during training.)
  • Xiang Huang, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang, Yuzhong Qu, 27 Dec 2024, TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data, https://arxiv.org/abs/2412.19544?
  • Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen, 2 Jan 2025, Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking, https://arxiv.org/abs/2501.01306
  • Mayi Xu, Yunfeng Ning, Yongqi Li, Jianhao Chen, Jintao Wen, Yao Xiao, Shen Zhou, Birong Pan, Zepeng Bao, Xin Miao, Hankun Kang, Ke Sun, Tieyun Qian, 2 Jan 2025, Reasoning based on symbolic and parametric knowledge bases: a survey, https://arxiv.org/abs/2501.01030 (Extensive survey of reasoning from CoT to knowledge graphs to table-based reasoning.)
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou, 9 Jan 2025, Search-o1: Agentic Search-Enhanced Large Reasoning Models, https://arxiv.org/abs/2501.05366 https://github.com/sunnynexus/Search-o1 (RAG retrieval and agentic methods applied to Large Reasoning Models.)
  • Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu, 8 Jan 2025, InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection, https://arxiv.org/abs/2501.04575
  • Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
  • Ben Hylak and swyx & Alessio, Jan 12, 2025, o1 isn’t a chat model (and that’s the point): How Ben Hylak turned from ol pro skeptic to fan by overcoming his skill issue. https://www.latent.space/p/o1-skill-issue (Prompting reasoning models like "o1" is different from previous model generations.)
  • NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
  • Omkar Thawakar, Dinura Dissanayake, Ketan More, Ritesh Thawkar, Ahmed Heakl, Noor Ahsan, Yuhao Li, Mohammed Zumri, Jean Lahoud, Rao Muhammad Anwer, Hisham Cholakkal, Ivan Laptev, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan, 10 Jan 2025, LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs, https://arxiv.org/abs/2501.06186
  • François Chollet, 25 Nov 2019 (v2), On the Measure of Intelligence, https://arxiv.org/abs/1911.01547
  • Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, Andrew D. White, 30 Dec 2024, Aviary: training language agents on challenging scientific tasks, https://arxiv.org/abs/2412.21154 (Using smaller models combined with multi-step reasoning to compete with big models with 100x less inference cost.)
  • Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou, 21 Nov 2024 (v2), LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning, https://arxiv.org/abs/2410.02884
  • Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang, 18 Nov 2024 (v3), ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search, https://arxiv.org/abs/2406.03816 https://github.com/THUDM/ReST-MCTS
  • Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang, 12 Oct 2024, OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models, https://arxiv.org/abs/2410.09671 https://openreasoner.github.io/
  • Yiwei Qin, Xuefeng Li, Haoyang Zou, Yixiu Liu, Shijie Xia, Zhen Huang, Yixin Ye, Weizhe Yuan, Hector Liu, Yuanzhi Li, Pengfei Liu, 8 Oct 2024, O1 Replication Journey: A Strategic Progress Report -- Part 1. https://arxiv.org/abs/2410.18982
  • Matthias Bastian, Oct 6, 2024, Study reveals major reasoning flaws in smaller AI language models, https://the-decoder.com/study-reveals-major-reasoning-flaws-in-smaller-ai-language-models/
  • Paul Sawers, January 23, 2025, Meta’s Yann LeCun predicts a ‘new AI architectures paradigm’ within 5 years and ‘decade of robotics’, https://techcrunch.com/2025/01/23/metas-yann-lecun-predicts-a-new-ai-architectures-paradigm-within-5-years-and-decade-of-robotics/
  • Latent Space, Jan 25, 2025, Why o3-mini *had* to be free: the coming DeepSeek R1, 2.0 Flash, and Sky-T1 Price War: 2025's biggest surprise so far: Reasoning is less of a moat than anyone thought. https://www.latent.space/p/reasoning-price-war
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Akash Bajwa Jan 27, 2025, The Post-R1 World: AI Economics Have Irreversibly Changed, https://akashbajwa.substack.com/p/the-post-r1-world
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Lan Pan, Hanbo Xie, Robert C. Wilson, 29 Jan 2025, Large Language Models Think Too Fast To Explore Effectively, https://arxiv.org/abs/2501.18009
  • Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Jan 2025, Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs, https://arxiv.org/abs/2501.18585
  • Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang, 31 Jan 2025, BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning, https://arxiv.org/abs/2501.18858
  • Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
  • Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
  • Hieu Minh "Jord" Nguyen, 10 Feb 2025, A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks, https://arxiv.org/abs/2502.06470
  • Daniel Fleischer, Moshe Berchansky, Gad Markovits, Moshe Wasserblat, 13 Feb 2025, SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models, https://arxiv.org/abs/2502.09390 https://github.com/IntelLabs/RAG-FiT/tree/square
  • Salvatore Raieli, Feb 2025, The LLMs’ Dilemma: Thinking Too Much OR Too Little? Exploring the fine line between deep reasoning and computational overkill in large language models., https://levelup.gitconnected.com/the-llms-dilemma-thinking-too-much-or-too-little-619a7532a47e
  • Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
  • Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah, 20 Feb 2025, CER: Confidence Enhanced Reasoning in LLMs, https://arxiv.org/abs/2502.14634 (Using model confidence metrics, i.e., logits, to evaluate reasoning pathways.)
  • Zhipeng Chen, Yingqian Min, Beichen Zhang, Jie Chen, Jinhao Jiang, Daixuan Cheng, Wayne Xin Zhao, Zheng Liu, Xu Miao, Yang Lu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen, 6 Mar 2025, An Empirical Study on Eliciting and Improving R1-like Reasoning Models, https://arxiv.org/abs/2503.04548 https://github.com/RUCAIBox/Slow_Thinking_with_LLMs
  • Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng, 27 Mar 2025, A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond, https://arxiv.org/abs/2503.21614
  • Kenneth Payne, Baptiste Alloui-Cros, 3 Jul 2025, Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory, https://arxiv.org/abs/2507.02618
  • Bin Hong, Jiayu Liu, Zhenya Huang, Kai Zhang, Mengdi Zhang, 13 Aug 2025, Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization, https://arxiv.org/abs/2508.10164
  • Jingde Cheng, 14 Aug 2025, Why Cannot Large Language Models Ever Make True Correct Reasoning?, https://arxiv.org/abs/2508.10265
  • Chuhuai Yue, Chengqi Dong, Yinan Gao, Hang He, Jiajun Chai, Guojun Yin and Wei Lin, 14 Aug 2025, Promoting Efficient Reasoning with Verifiable Stepwise Reward, https://arxiv.org/abs/2508.10293
  • Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu, 14 Aug 2025, What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles, https://arxiv.org/abs/2508.10358
  • Runqi Qiao and Qiuna Tan and Peiqing Yang and Yanzi Wang and Xiaowan Wang and Enhui Wan and Sitong Zhou and Guanting Dong and Yuchen Zeng and Yida Xu and Jie Wang and Chong Sun and Chen Li and Honggang Zhang, 14 Aug 2025, We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning, https://arxiv.org/abs/2508.10433
  • Yushi Feng, Junye Du, Yingying Hong, Qifan Wang, Lequan Yu, 14 Aug 2025, PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning, https://arxiv.org/abs/2508.10501
  • Ma\"el Jullien, Marco Valentino, and Andr\'e Freitas, 14 Aug 2025, The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference, https://arxiv.org/abs/2508.10777
  • Zhipeng Chen, Xiaobo Qin, Youbin Wu, Yue Ling, Qinghao Ye, Wayne Xin Zhao, Guang Shi, 14 Aug 2025, Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models, https://arxiv.org/abs/2508.10751
  • Li Wang, Changhao Zhang, Zengqi Xiu, Kai Lu, Xin Yu, Kui Zhang, Wenjun Wu, 7 Aug 2025, Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning, https://arxiv.org/abs/2508.10019
  • Chuan Li, Qianyi Zhao, Fengran Mo, Cen Chen, 7 Aug 2025, FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2508.10020
  • Kai Zhao, Yanjun Zhao, Jiaming Song, Shien He, Lusheng Zhang, Qiang Zhang, Tianjiao Li, 8 Aug 2025, SABER: Switchable and Balanced Training for Efficient LLM Reasoning, https://arxiv.org/abs/2508.10026
  • Christopher Pinier, Sonia Acu\~na Vargas, Mariia Steeghs-Turchina, Dora Matzke, Claire E. Stevenson, Michael D. Nunez, 12 Aug 2025, Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning, https://arxiv.org/abs/2508.10057
  • Juyuan Wang, Rongchen Zhao, Wei Wei, Yufeng Wang, Mo Yu, Jie Zhou, Jin Xu, Liyan Xu, 14 Aug 2025, ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning, https://arxiv.org/abs/2508.10419
  • Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, and Xiaofeng Yang, 14 Aug 2025, Performance of GPT-5 in Brain Tumor MRI Reasoning, https://arxiv.org/abs/2508.10865
  • Xingyu Wu, Yuchen Yan, Shangke Lyu, Linjuan Wu, Yiwen Qiu, Yongliang Shen, Weiming Lu, Jian Shao, Jun Xiao, Yueting Zhuang, 14 Aug 2025, LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization, https://arxiv.org/abs/2507.15758
  • Liang Zhang, Edith Aurora Graf, 14 Aug 2025, Mathematical Computation and Reasoning Errors by Large Language Models, https://arxiv.org/abs/2508.09932
  • Atin Pothiraj, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal, 13 Aug 2025, CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting, https://arxiv.org/abs/2504.15485
  • GLM-V Team: Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiale Zhu, Jiali Chen, Jing Chen, Jinhao Chen, Jinghao Lin, Jinjiang Wang, Junjie Chen, Leqi Lei, Letian Gong, Leyi Pan, Mingdao Liu, Mingde Xu, Mingzhi Zhang, Qinkai Zheng, Sheng Yang, Shi Zhong, Shiyu Huang, Shuyuan Zhao, Siyan Xue, Shangqin Tu, Shengbiao Meng, Tianshu Zhang, Tianwei Luo, Tianxiang Hao, Tianyu Tong, Wenkai Li, Wei Jia, Xiao Liu, Xiaohan Zhang, Xin Lyu, Xinyue Fan, Xuancheng Huang, Yanling Wang, Yadong Xue, Yanfeng Wang, Yanzi Wang, Yifan An, et al. (22 additional authors not shown), 14 Aug 2025, GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning, https://arxiv.org/abs/2507.01006
  • Zhangquan Chen, Ruihui Zhao, Chuwei Luo, Mingze Sun, Xinlei Yu, Yangyang Kang, Ruqi Huang, 14 Aug 2025, SIFThinker: Spatially-Aware Image Focus for Visual Reasoning, https://arxiv.org/abs/2508.06259
  • Yanhui Li, Yunkang Cao, Chengliang Liu, Yuan Xiong, Xinghui Dong, Chao Huang, 14 Aug 2025, IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection, https://arxiv.org/abs/2508.09178
  • Mo Yu, Tsz Ting Chung, Chulun Zhou, Tong Li, Rui Lu, Jiangnan Li, Liyan Xu, Haoshu Lu, Ning Zhang, Jing Li, Jie Zhou, 14 Aug 2025, PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts, https://arxiv.org/abs/2508.09848
  • Yihao Xue, Baharan Mirzasoleiman, 22 Jul 2025, LoRA is All You Need for Safety Alignment of Reasoning LLMs, https://arxiv.org/abs/2507.17075
  • Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li, 23 Jul 2025, Improving LLMs' Generalized Reasoning Abilities by Graph Problems, https://arxiv.org/abs/2507.17168
  • Luca Salvatore Lorello, Nikolaos Manginas, Marco Lippi, Stefano Melacci, 23 Jul 2025, LTLZinc: a Benchmarking Framework for Continual Learning and Neuro-Symbolic Temporal Reasoning, https://arxiv.org/abs/2507.17482
  • Yu Li, Zhuoshi Pan, Honglin Lin, Mengyuan Sun, Conghui He, Lijun Wu, 23 Jul 2025, Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning, https://arxiv.org/abs/2507.17512
  • Xinyao Liu, Diping Song, 23 Jul 2025, Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning, https://arxiv.org/abs/2507.17539
  • Zhao Song, Song Yue, Jiahao Zhang, 23 Jul 2025, Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations, https://arxiv.org/abs/2507.17699
  • Tao Xu, Dung-Yang Lee and Momiao Xiong, 21 Jul 2025, Reinforcement Learning in hyperbolic space for multi-step reasoning, https://arxiv.org/abs/2507.16864
  • Rishi Parekh, Saisubramaniam Gopalakrishnan, Zishan Ahmad, Anirudh Deodhar, 23 Jul 2025, Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance, https://arxiv.org/abs/2507.17273
  • Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, and Bohan Zhuang, 23 Jul 2025, R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning, https://arxiv.org/abs/2507.17307
  • Xuchen Li, Xuzhao Li, Shiyu Hu, Kaiqi Huang, Wentao Zhang, 22 Jul 2025, CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos, https://arxiv.org/abs/2507.16878
  • Aleksandr Perevalov, Andreas Both, 22 Jul 2025, Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning, https://arxiv.org/abs/2507.16971
  • Zhe Xu, Ziyi Liu, Junlin Hou, Jiabo Ma, Cheng Jin, Yihui Wang, Zhixuan Chen, Zhengyu Zhang, Zhengrui Guo, Fengtao Zhou, Yingxue Xu, Xi Wang, Ronald Cheong Kin Chan, Li Liang, Hao Chen, 23 Jul 2025, A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model, https://arxiv.org/abs/2507.17303
  • Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu, 23 Jul 2025, Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning, https://arxiv.org/abs/2507.17448
  • Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing, 23 Jul 2025, MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs, https://arxiv.org/abs/2507.17476
  • Nima Fathi, Amar Kumar, Tal Arbel, 22 Jul 2025, AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation, https://arxiv.org/abs/2507.16940
  • Adrian Kaiser and Claudiu Leoveanu-Condrei and Ryan Gold and Marius-Constantin Dinu and Markus Hofmarcher, 23 Jul 2025, HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs, https://arxiv.org/abs/2507.15917
  • Wei Sun, Qianlong Du, Fuwei Cui, Jiajun Zhang, 23 Jul 2025, An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning, https://arxiv.org/abs/2503.02382
  • Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang, 23 Jul 2025, Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start, https://arxiv.org/abs/2505.22334
  • Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang and Peng Zhang, 23 Jul 2025, Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning, https://arxiv.org/abs/2507.16802
  • Shai Shalev-Shwartz and Amnon Shashua, 13 Jul 2025, From Reasoning to Super-Intelligence: A Search-Theoretic Perspective, https://arxiv.org/abs/2507.15865
  • Yin Wu, Daniel Slieter, Vivek Subramanian, Ahmed Abouelazm, Robin Bohn, and J. Marius Z\"ollner, 17 Jul 2025, Why Braking? Scenario Extraction and Reasoning Utilizing LLM, https://arxiv.org/abs/2507.15874
  • Andy E. Williams, 18 Jul 2025, The Recursive Coherence Principle: A Formal Constraint on Scalable Intelligence, Alignment, and Reasoning Architecture, https://arxiv.org/abs/2507.15880
  • Lisa Dargasz, 20 Jul 2025, Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture, https://arxiv.org/abs/2507.15895
  • Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo, 21 Jul 2025, Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization, https://arxiv.org/abs/2507.16110
  • Fred Mutisya (1 and 2), Shikoh Gitau (1), Christine Syovata (2), Diana Oigara (2), Ibrahim Matende (2), Muna Aden (2), Munira Ali (2), Ryan Nyotu (2), Diana Marion (2), Job Nyangena (2), Nasubo Ongoma (1), Keith Mbae (1), Elizabeth Wamicha (1), Eric Mibuari (1), Jean Philbert Nsengemana (3), Talkmore Chidede (4) ((1) Qhala (Nairobi, Kenya), (2) Kenya Medical Association (Nairobi, Kenya), (3) Africa CDC (Addis Ababa, Ethiopia), (4) AfCFTA (Accra, Ghana)), 22 Jul 2025, Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens, https://arxiv.org/abs/2507.16322
  • Lucas de Lara (IECL), 22 Jul 2025, Canonical Representations of Markovian Structural Causal Models: A Framework for Counterfactual Reasoning, https://arxiv.org/abs/2507.16370
  • Bo Hou and Xin Tan and Kai Zheng and Fang Liu and Yinghao Zhu and Li Zhang, 22 Jul 2025, LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning, https://arxiv.org/abs/2507.16395
  • Jean Lelong, Adnane Errazine and Annabelle Blangero, 22 Jul 2025, Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications, https://arxiv.org/abs/2507.16507
  • Xu Yang, Qi Zhang, Shuming Jiang, Yaowen Xu, Zhaofan Zou, Hao Sun, Xuelong Li, 22 Jul 2025, METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark, https://arxiv.org/abs/2507.16206
  • Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, Jacob Andreas, 22 Jul 2025, Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty, https://arxiv.org/abs/2507.16806
  • Junhao Shen, Haiteng Zhao, Yuzhe Gu, Songyang Gao, Kuikun Liu, Haian Huang, Jianfei Gao, Dahua Lin, Wenwei Zhang, Kai Chen, 22 Jul 2025, Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning, https://arxiv.org/abs/2507.16814
  • Isaac Shi and Zeyuan Li and Fan Liu and Wenli Wang and Lewei He and Yang Yang and Tianyu Shi, 13 Jul 2025, eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs, https://arxiv.org/abs/2507.15863
  • Run-Ze Fan and Zengzhi Wang and Pengfei Liu, 22 Jul 2025, MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning, https://arxiv.org/abs/2507.16812
  • Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen, Yu-Chiang Frank Wang, Fu-En Yang, 22 Jul 2025, ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning, https://arxiv.org/abs/2507.16815
  • Xiuwei Chen, Wentao Hu, Hanhui Li, Jun Zhou, Zisheng Chen, Meng Cao, Yihan Zeng, Kui Zhang, Yu-Jie Yuan, Jianhua Han, Hang Xu, Xiaodan Liang, 22 Jul 2025, C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning, https://arxiv.org/abs/2507.16518
  • Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum, 22 Jul 2025, Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning, https://arxiv.org/abs/2507.16746
  • Edward Y. Chang, Zeyneb N. Kaya, Ethan Chang, 22 Jul 2025, The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning, https://arxiv.org/abs/2506.02139
  • Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori, 22 Jul 2025, Hierarchical Reasoning Model, https://arxiv.org/abs/2506.21734
  • Yitong Lin, Jiaying He, Jiahe Chen, Xinnan Zhu, Jianwei Zheng, Tao Bo, 22 Jul 2025, BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning, https://arxiv.org/abs/2507.14468
  • Shangke Lyu, Linjuan Wu, Yuchen Yan, Xingyu Wu, Hao Li, Yongliang Shen, Peisheng Jiang, Weiming Lu, Jun Xiao, Yueting Zhuang, 22 Jul 2025, Hierarchical Budget Policy Optimization for Adaptive Reasoning, https://arxiv.org/abs/2507.15844
  • Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng, 22 Jul 2025, BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning, https://arxiv.org/abs/2502.16660
  • Xiachong Feng, Longxu Dou, Lingpeng Kong, 22 Jul 2025, Reasoning Does Not Necessarily Improve Role-Playing Ability, https://arxiv.org/abs/2502.16940
  • Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu, Toby Boyd, Brad Hekman, Aaron Parisi, Chaoyi Zhang, Kornraphop Kawintiranon, Tania Bedrax-Weiss, Oliver Wang, Ya Xu, Ollie Purkiss, Uri Mendlovic, Ila\"i Deutel, Nam Nguyen, Adam Langley, Flip Korn, Lucia Rossazza, Alexandre Ram\'e, Sagar Waghmare, Helen Miller, Nathan Byrd, Ashrith Sheshan, Raia Hadsell Sangnie Bhardwaj, Pawel Janus, Tero Rissa, Dan Horgan, Sharon Silver, Ayzaan Wahid, Sergey Brin, Yves Raimond, Klemen Kloboves, et al. (3255 additional authors not shown), 22 Jul 2025, Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities, https://arxiv.org/abs/2507.06261
  • SaiBarath Sundar, Pranav Satheesan, Udayaadithya Avadhanam, 23 Jul 2025, I2I-STRADA -- Information to Insights via Structured Reasoning Agent for Data Analysis, https://arxiv.org/abs/2507.17874
  • Mutian Yang, Jiandong Gao, and Ji Wu, 24 Jul 2025, Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory, https://arxiv.org/abs/2507.18178
  • Zhuang Qiang Bok and Watson Wei Khong Chua, 24 Jul 2025, Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios, https://arxiv.org/abs/2507.18368
  • Shiye Lei, Zhihao Cheng, Kai Jia, Dacheng Tao, 24 Jul 2025, Revisiting LLM Reasoning via Information Bottleneck, https://arxiv.org/abs/2507.18391
  • Xiaoxu Guo, Siyan Liang, Yachao Cui, Juxiang Zhou, Lei Wang, Han Cao, 21 Jul 2025, Multimodal Fine-grained Reasoning for Post Quality Evaluation, https://arxiv.org/abs/2507.17934
  • Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta, 24 Jul 2025, Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models, https://arxiv.org/abs/2507.18014
  • Matthias Otth, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 24 Jul 2025, Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning, https://arxiv.org/abs/2507.18122
  • Dongyang Guo, Yasmeen Abdrabou, Enkeleda Thaqi, Enkelejda Kasneci, 24 Jul 2025, Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning, https://arxiv.org/abs/2507.18252
  • Rana Alshaikh, Israa Alghanmi, Shelan Jeawak, 24 Jul 2025, AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data, https://arxiv.org/abs/2507.18442
  • Busra Icoz, Goksel Biricik, 24 Jul 2025, Automated Code Review Using Large Language Models with Symbolic Reasoning, https://arxiv.org/abs/2507.18476
  • Andres M Bran, Theo A Neukomm, Daniel P Armstrong, Zlatko Jon\v{c}ev, Philippe Schwaller, 23 Jul 2025, Chemical reasoning in LLMs unlocks strategy-aware synthesis planning and reaction mechanism elucidation, https://arxiv.org/abs/2503.08537
  • Bowen Zhang, Pengcheng Luo, 24 Jul 2025, OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM, https://arxiv.org/abs/2503.10009
  • David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan, Giorgia Ramponi, Bernhard Sch\"olkopf, Zhijing Jin, 24 Jul 2025, Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games, https://arxiv.org/abs/2506.23276
  • Martina Miliani, Serena Auriemma, Alessandro Bondielli, Emmanuele Chersoni, Lucia Passaro, Irene Sucameli, Alessandro Lenci, 24 Jul 2025, ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models, https://arxiv.org/abs/2502.15487
  • Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Ding Wang, Botian Shi, 24 Jul 2025, Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning, https://arxiv.org/abs/2503.12972
  • Renato Ghisellini and Remo Pareschi and Marco Pedroni and Giovanni Battista Raggi, 18 Jul 2025, From Extraction to Synthesis: Entangled Heuristics for Agent-Augmented Strategic Reasoning, https://arxiv.org/abs/2507.13768
  • Shmuel Berman, Jia Deng, 4 Jul 2025, VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs, https://arxiv.org/abs/2507.13361
  • Binbin Ji, Siddharth Agrawal, Qiance Tang, and Yvonne Wu, 6 Jul 2025, Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning, https://arxiv.org/abs/2507.13362
  • Peican Zhu, Yubo Jing, Le Cheng, Bin Chen, Xiaodong Cui, Lianwei Wu, Keke Tang, 17 Jul 2025, SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection, https://arxiv.org/abs/2507.13415
  • Ishant Chintapatla, Kazuma Choji, Naaisha Agarwal, Andrew Lin, Hannah You, Charles Duong, Kevin Zhu, Sean O'Brien, Vasu Sharma, 17 Jul 2025, COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark, https://arxiv.org/abs/2507.13405
  • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 18 Jul 2025, Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567
  • Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar, 18 Jul 2025, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, https://arxiv.org/abs/2506.06941
  • Zhiting Mei, Christina Zhang, Tenny Yin, Justin Lidard, Ola Shorinwa, Anirudha Majumdar, 18 Jul 2025, Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?, https://arxiv.org/abs/2506.18183
  • Ahmed Bahloul, Simon Malberg, 18 Jul 2025, From Roots to Rewards: Dynamic Tree Reasoning with RL, https://arxiv.org/abs/2507.13142
  • Thomas Foster, Anya Sims, Johannes Forkel, Mattie Fellows, Jakob Foerster, 18 Jul 2025, Learning to Reason at the Frontier of Learnability, https://arxiv.org/abs/2502.12272
  • Constantin Venhoff, Iv\'an Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda, 17 Jul 2025, Understanding Reasoning in Thinking Language Models via Steering Vectors, https://arxiv.org/abs/2506.18167
  • Jiayu Song, Mahmud Elahi Akhter, Dana Atzil Slonim, Maria Liakata, 18 Jul 2025, Temporal reasoning for timeline summarisation in social media, https://arxiv.org/abs/2501.00152
  • Z.Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z.F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan, 18 Jul 2025, DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition, https://arxiv.org/abs/2504.21801
  • Bingqian Lin, Yunshuang Nie, Khun Loun Zai, Ziming Wei, Mingfei Han, Rongtao Xu, Minzhe Niu, Jianhua Han, Liang Lin, Cewu Lu, Xiaodan Liang, 18 Jul 2025, EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation, https://arxiv.org/abs/2506.01551
  • Humza Sami, Mubashir ul Islam, Pierre-Emmanuel Gaillardon, Valerio Tenace, 18 Jul 2025, Adaptive Multi-Agent Reasoning via Automated Workflow Generation, https://arxiv.org/abs/2507.14393
  • Michael J. Zellinger and Matt Thomson, 18 Jul 2025, Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering, https://arxiv.org/abs/2507.14406
  • Cole Robertson and Philip Wolff, 21 Jul 2025, LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning, https://arxiv.org/abs/2507.15521
  • Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li, 18 Jul 2025, A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning, https://arxiv.org/abs/2507.14295
  • Ole-Christoffer Granmo and Youmna Abdelwahab and Per-Arne Andersen and Paul F. A. Clarke and Kunal Dumbre and Ylva Gr{\o}nnins{\ae}ter and Vojtech Halenka and Runar Helin and Lei Jiao and Ahmed Khalid and Rebekka Omslandseter and Rupsa Saha and Mayur Shende and Xuan Zhang, 20 Jul 2025, The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs, https://arxiv.org/abs/2507.14874
  • Yiyuan Yang, Zichuan Liu, Lei Song, Kai Ying, Zhiguang Wang, Tom Bamford, Svitlana Vyetrenko, Jiang Bian, Qingsong Wen, 20 Jul 2025, Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback, https://arxiv.org/abs/2507.15066
  • Jiaao Li, Kaiyuan Li, Chen Gao, Yong Li, Xinlei Chen, 21 Jul 2025, EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent, https://arxiv.org/abs/2507.15428
  • Seok Hwan Song, Mohna Chakraborty, Qi Li, Wallapak Tavanapong, 21 Jul 2025, Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?, https://arxiv.org/abs/2507.15707
  • Sahana Srinivasan, Xuguang Ai, Thaddaeus Wai Soon Lo, Aidan Gilson, Minjie Zou, Ke Zou, Hyunjae Kim, Mingjia Yang, Krithi Pushpanathan, Samantha Yew, Wan Ting Loke, Jocelyn Goh, Yibing Chen, Yiming Kong, Emily Yuelei Fu, Michelle Ongyong Hui, Kristen Nwanyanwu, Amisha Dave, Kelvin Zhenghao Li, Chen-Hsin Sun, Mark Chia, Gabriel Dawei Yang, Wendy Meihua Wong, David Ziyou Chen, Dianbo Liu, Maxwell Singer, Fares Antaki, Lucian V Del Priore, Jost Jonas, Ron Adelman, Qingyu Chen, Yih-Chung Tham, 21 Jul 2025, BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning, https://arxiv.org/abs/2507.15717
  • Yihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long, Lyle Ungar, 21 Jul 2025, The Impact of Language Mixing on Bilingual LLM Reasoning, https://arxiv.org/abs/2507.15849
  • Fengxiang Cheng, Haoxuan Li, Fenrong Liu, Robert van Rooij, Kun Zhang, Zhouchen Lin, 21 Jul 2025, Empowering LLMs with Logical Reasoning: A Comprehensive Survey, https://arxiv.org/abs/2502.15652
  • Shaohang Wei, Wei Li, Feifan Song, Wen Luo, Tianyi Zhuang, Haochen Tan, Zhijiang Guo, Houfeng Wang, 19 Jul 2025, TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios, https://arxiv.org/abs/2505.12891
  • Kun Xiang and Heng Li and Terry Jingchen Zhang and Yinya Huang and Zirong Liu and Peixin Qu and Jixi He and Jiaqi Chen and Yu-Jie Yuan and Jianhua Han and Hang Xu and Hanhui Li and Mrinmaya Sachan and Xiaodan Liang, 21 Jul 2025, SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning, https://arxiv.org/abs/2505.19099
  • Xin Wang, Jiyao Liu, Yulong Xiao, Junzhi Ning, Lihao Liu, Junjun He, Botian Shi, Kaicheng Yu, 21 Jul 2025, THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?, https://arxiv.org/abs/2506.21763
  • Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas, 18 Jul 2025, Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning, https://arxiv.org/abs/2507.10571
  • Michal Spiegel, Michal \v{S}tef\'anik, Marek Kadl\v{c}\'ik, Josef Kucha\v{r}, 21 Jul 2025, Attend or Perish: Benchmarking Attention in Algorithmic Reasoning, https://arxiv.org/abs/2503.01909
  • Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou, 21 Jul 2025, Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning, https://arxiv.org/abs/2505.16122
  • Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal, 18 Jul 2025, Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning, https://arxiv.org/abs/2503.05641
  • Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han, 21 Jul 2025, Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, https://arxiv.org/abs/2503.09516
  • Bo-Cheng Chiu, Jen-Jee Chen, Yu-Chee Tseng and Feng-Chi Chen, 21 Jul 2025, DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs, https://arxiv.org/abs/2506.11558
  • Yiming Yang, Yueru Luo, Bingkun He, Hongbin Lin, Suzhong Fu, Chao Zheng, Zhipeng Cao, Erlong Li, Chao Yan, Shuguang Cui, Zhen Li, 20 Jul 2025, TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving, https://arxiv.org/abs/2507.00709
  • Anirudh Choudhary, Mosbah Aouad, Krishnakant Saboo, Angelina Hwang, Jacob Kechter, Blake Bordeaux, Puneet Bhullar, David DiCaudo, Steven Nelson, Nneka Comfere, Emma Johnson, Olayemi Sokumbi, Jason Sluzevich, Leah Swanson, Dennis Murphree, Aaron Mangold, Ravishankar Iyer, 19 Jul 2025, RACR-MIL: Rank-aware contextual reasoning for weakly supervised grading of squamous cell carcinoma using whole slide images, https://arxiv.org/abs/2308.15618
  • Blair Johnson, Clayton Kerce, Faramarz Fekri, 8 Aug 2025, GLIDR: Graph-Like Inductive Logic Programming with Differentiable Reasoning, https://arxiv.org/abs/2508.06716
  • Amit Dhanda, 10 Aug 2025, Multi-Dimensional Summarization Agents with Context-Aware Reasoning over Enterprise Tables, https://arxiv.org/abs/2508.07186
  • Yi Tang, Kaini Wang, Yang Chen, Guangquan Zhou, 10 Aug 2025, EndoAgent: A Memory-Guided Reflective Agent for Intelligent Endoscopic Vision-to-Decision Reasoning, https://arxiv.org/abs/2508.07292
  • He Kong, Die Hu, Jingguo Ge, Liangxiong Li, Hui Li and Tong Li, 10 Aug 2025, Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning, https://arxiv.org/abs/2508.07382
  • Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap, 11 Aug 2025, 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning, https://arxiv.org/abs/2508.07667
  • Zhenpeng Su, Leiyu Pan, Xue Bai, Dening Liu, Guanting Dong, Jiaming Huang, Wenping Hu, Guorui Zhou, 11 Aug 2025, Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization, https://arxiv.org/abs/2508.07629
  • Zihe Liu, Jiashun Liu, Yancheng He, Weixun Wang, Jiaheng Liu, Ling Pan, Xinyu Hu, Shaopan Xiong, Ju Huang, Jian Hu, Shengyi Huang, Siran Yang, Jiamang Wang, Wenbo Su, Bo Zheng, 11 Aug 2025, Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning, https://arxiv.org/abs/2508.08221
  • Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi, 11 Aug 2025, Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent, https://arxiv.org/abs/2508.08222
  • Logan Cross, Erik Brockbank, Tobias Gerstenberg, Judith E. Fan, Daniel L. K. Yamins, Nick Haber, 25 Jul 2025, Understanding Human Limits in Pattern Recognition: A Computational Model of Sequential Reasoning in Rock, Paper, Scissors, https://arxiv.org/abs/2508.06503
  • Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, Zhicheng Dou, 9 Aug 2025, ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability, https://arxiv.org/abs/2508.07050
  • Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali, 9 Aug 2025, Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning, https://arxiv.org/abs/2508.07101
  • Fabio Vitali, 10 Aug 2025, From Knowledge to Conjectures: A Modal Framework for Reasoning about Hypotheses, https://arxiv.org/abs/2508.07304
  • Anirudh Iyengar Kaniyar Narayana Iyengar, Srija Mukhopadhyay, Adnan Qidwai, Shubhankar Singh, Dan Roth, Vivek Gupta, 11 Aug 2025, InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information, https://arxiv.org/abs/2508.07630
  • Chaohong Guo, Xun Mo, Yongwei Nie, Xuemiao Xu, Chao Xu, Fei Yu, and Chengjiang Long, 11 Aug 2025, TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding, https://arxiv.org/abs/2508.07683
  • Shoaib Ahmmad, Zubayer Ahmed Aditto, Md Mehrab Hossain, Noushin Yeasmin, Shorower Hossain, 11 Aug 2025, Autonomous Navigation of Cloud-Controlled Quadcopters in Confined Spaces Using Multi-Modal Perception and LLM-Driven High Semantic Reasoning, https://arxiv.org/abs/2508.07885
  • Meixiu Long, Duolin Sun, Dan Yang, Junjie Wang, Yue Shen, Jian Wang, Peng Wei, Jinjie Gu, Jiahai Wang, 11 Aug 2025, DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval, https://arxiv.org/abs/2508.07995
  • Zhonghao Yan, Muxi Diao, Yuxuan Yang, Jiayuan Xu, Kaizhou Zhang, Ruoyan Jing, Lele Yang, Yanxi Liu, Kongming Liang and Zhanyu Ma, 11 Aug 2025, MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision, https://arxiv.org/abs/2508.08177
  • Shansong Wang and Mingzhe Hu and Qiang Li and Mojtaba Safari and Xiaofeng Yang, 11 Aug 2025, Capabilities of GPT-5 on Multimodal Medical Reasoning, https://arxiv.org/abs/2508.08224
  • Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi, 8 Aug 2025, Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models, https://arxiv.org/abs/2502.19918
  • Alberto Pozanco, Marianela Morales, Daniel Borrajo, Manuela Veloso, 11 Aug 2025, A Planning Compilation to Reason about Goal Achievement at Planning Time, https://arxiv.org/abs/2503.09545
  • Annie Wong, Thomas B\"ack, Aske Plaat, Niki van Stein and Anna V. Kononova, 10 Aug 2025, Reasoning Capabilities of Large Language Models on Dynamic Tasks, https://arxiv.org/abs/2505.10543
  • Lucia Cipolina-Kun and Marianna Nezhurina and Jenia Jitsev, 10 Aug 2025, Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilites of Large Language Models via Game Play, https://arxiv.org/abs/2508.03368
  • Yiye Chen, Harpreet Sawhney, Nicholas Gyd\'e, Yanan Jian, Jack Saunders, Patricio Vela, Ben Lundell, 8 Aug 2025, Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System, https://arxiv.org/abs/2502.03450
  • Rong Cheng, Jinyi Liu, Yan Zheng, Fei Ni, Jiazhen Du, Hangyu Mao, Fuzheng Zhang, Bo Wang, Jianye Hao, 9 Aug 2025, DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering, https://arxiv.org/abs/2504.18243
  • Xuandong Zhao, Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, 11 Aug 2025, Learning to Reason without External Rewards, https://arxiv.org/abs/2505.19590
  • Yunpeng Gao, Zhigang Wang, Pengfei Han, Linglin Jing, Dong Wang, Bin Zhao, 11 Aug 2025, Exploring Spatial Representation to Enhance LLM Reasoning in Aerial Vision-Language Navigation, https://arxiv.org/abs/2410.08500
  • Seyed Pouyan Mousavi Davoudi, Amin Gholami Davodi, Alireza Amiri-Margavi, Alireza Shafiee Fard, Mahdi Jafari, 9 Aug 2025, Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth, https://arxiv.org/abs/2502.20758
  • Lo Pang-Yun Ting, Chengshuai Zhao, Yu-Hua Zeng, Yuan Jee Lim, Kun-Ta Chuang, Huan Liu, 9 Aug 2025, Leaps Beyond the Seen: Reinforced Reasoning Augmented Generation for Clinical Notes, https://arxiv.org/abs/2506.05386
  • Mihir Godbole, Xiangbo Gao, Zhengzhong Tu, 9 Aug 2025, DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving, https://arxiv.org/abs/2506.17590
  • Minghao Guo, Xi Zhu, Jingyuan Huang, Kai Mei, Yongfeng Zhang, 11 Aug 2025, ReaGAN: Node-as-Agent-Reasoning Graph Agentic Network, https://arxiv.org/abs/2508.00429
  • Andrew Kiruluta, 18 Jul 2025, Wavelet Logic Machines: Learning and Reasoning in the Spectral Domain Without Neural Networks, https://arxiv.org/abs/2507.19514
  • Aditya Sharma, Linh Nguyen, Ananya Gupta, Chengyu Wang, Chiamaka Adebayo, and Jakub Kowalski, 26 Jul 2025, Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning, https://arxiv.org/abs/2507.19855
  • Enjun Du, Siyi Liu, Yongqi Zhang, 28 Jul 2025, Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning, https://arxiv.org/abs/2507.20498
  • Ansh Poonia, Maeghal Jain, 28 Jul 2025, Dissecting Persona-Driven Reasoning in Language Models via Activation Patching, https://arxiv.org/abs/2507.20936
  • Dong Du, Shulin Liu, Tao Yang, Shaohua Chen, Yang Li, 26 Jul 2025, UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities, https://arxiv.org/abs/2507.19766
  • Eunkyu Park, Wesley Hanwen Deng, Gunhee Kim, Motahhare Eslami, Maarten Sap, 27 Jul 2025, Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations, https://arxiv.org/abs/2507.20409
  • Xun Liang, Xin Guo, Zhongming Jin, Weihang Pan, Penghui Shang, Deng Cai, Binbin Lin, Jieping Ye, 28 Jul 2025, Enhancing Spatial Reasoning through Visual and Textual Thinking, https://arxiv.org/abs/2507.20529
  • Adrien Bazoge, 28 Jul 2025, MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation, https://arxiv.org/abs/2507.20917
  • Aleksandar Pavlovic, Emanuel Sallinger, Steven Schockaert, 26 Jul 2025, Faithful Differentiable Reasoning with Reshuffled Region-based Embeddings, https://arxiv.org/abs/2406.09529
  • Zheng Zhang, Nuoqian Xiao, Qi Chai, Deheng Ye, Hao Wang, 28 Jul 2025, MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind, https://arxiv.org/abs/2504.18039
  • Kun Li, Zhennan Wu, Shoupeng Wang, Jia Wu, Shirui Pan and Wenbin Hu, 28 Jul 2025, DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery, https://arxiv.org/abs/2505.13940
  • Yun Qu, Qi Wang, Yixiu Mao, Vincent Tao Hu, Bj\"orn Ommer, Xiangyang Ji, 28 Jul 2025, Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?, https://arxiv.org/abs/2507.04632
  • Xuzhao Li and Xuchen Li and Shiyu Hu and Yongzhen Guo and Wentao Zhang, 26 Jul 2025, VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains, https://arxiv.org/abs/2507.09884
  • Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Yue Zhao, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji, 26 Jul 2025, MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning, https://arxiv.org/abs/2409.12059
  • Shamus Sim and Tyrone Chen, 28 Jul 2025, Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models, https://arxiv.org/abs/2412.15748
  • Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu, 28 Jul 2025, LIMO: Less is More for Reasoning, https://arxiv.org/abs/2502.03387
  • Runyu Jiao, Alice Fasoli, Francesco Giuliari, Matteo Bortolon, Sergio Povoli, Guofeng Mei, Yiming Wang, Fabio Poiesi, 28 Jul 2025, Free-form language-based robotic reasoning and grasping, https://arxiv.org/abs/2503.13082
  • Xiangning Yu, Zhuohan Wang, Linyi Yang, Haoxuan Li, Anjie Liu, Xiao Xue, Jun Wang, Mengyue Yang, 26 Jul 2025, Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning, https://arxiv.org/abs/2506.09853
  • Yifu Han and Geo Zhang, 27 Jul 2025, Reinforcement learning fine-tuning of language model for instruction following and math reasoning, https://arxiv.org/abs/2506.21560
  • Shenghe Zheng, Qianjia Cheng, Junchi Yao, Mengsong Wu, Haonan He, Ning Ding, Yu Cheng, Shuyue Hu, Lei Bai, Dongzhan Zhou, Ganqu Cui, Peng Ye, 28 Jul 2025, Scaling Physical Reasoning with the PHYSICS Dataset, https://arxiv.org/abs/2506.00022
  • Khanh Son Pham, Christian Witte, Jens Behley, Johannes Betz, Cyrill Stachniss, 28 Jul 2025, Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps, https://arxiv.org/abs/2507.01397

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: