Aussie AI

Hybrid Reasoning Models

  • Last Updated 29 August, 2025
  • by David Spuler, Ph.D.

What are Hybrid Reasoning Models?

Hybrid reasoning models are LLMs that use a combination of single-step reasoning and inference-based multi-step reasoning. For example, Large Reasoning Models (LRMs) may be trained to use only a single step for some queries. Another example is that less powerful smaller reasoning models may be improved by using multi-step inference-based reasoning, known as "test time compute." Read more about reasoning model techniques.

Research on Hybrid Reasoning Models

Research papers include:

  • Maxwell Zeff, February 24, 2025, Anthropic launches a new AI model that ‘thinks’ as long as you want, https://techcrunch.com/2025/02/24/anthropic-launches-a-new-ai-model-that-thinks-as-long-as-you-want/
  • Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui, 24 Nov 2023 (v4), DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking, https://arxiv.org/abs/2310.18075
  • Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y. Li, Aviv Bick, J. Zico Kolter, Albert Gu, François Fleuret, Tri Dao, 27 Feb 2025, Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners, https://arxiv.org/abs/2502.20339
  • Jianyuan Zhong, Zeju Li, Zhijian Xu, Xiangyu Wen, Qiang Xu, 16 Feb 2025, Dyve: Thinking Fast and Slow for Dynamic Process Verification, https://arxiv.org/abs/2502.11157
  • Kangan Qian, Zhikun Ma, Yangfan He, Ziang Luo, Tianyu Shi, Tianze Zhu, Jiayin Li, Jianhui Wang, Ziyu Chen, Xiao He, Yining Shi, Zheng Fu, Xinyu Jiao, Kun Jiang, Diange Yang, Takafumi Matsumaru, 27 Nov 2024, FASIONAD : FAst and Slow FusION Thinking Systems for Human-Like Autonomous Driving with Adaptive Feedback, https://arxiv.org/abs/2411.18013
  • DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, Yuandong Tian, Qinqing Zheng, 13 Oct 2024, Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces, https://arxiv.org/abs/2410.09918
  • Konstantina Christakopoulou, Shibl Mourad, Maja Matarić, 10 Oct 2024, Agents Thinking Fast and Slow: A Talker-Reasoner Architecture, https://arxiv.org/abs/2410.08328
  • Pengbo Hu, Ji Qi, Xingyu Li, Hong Li, Xinqi Wang, Bing Quan, Ruiyu Wang, Yi Zhou, 21 Aug 2023 (v2), Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning, https://arxiv.org/abs/2308.09658
  • Thilo Hagendorff, Sarah Fabi, Michal Kosinski, 2 Aug 2023 (v2), Thinking Fast and Slow in Large Language Models, https://arxiv.org/abs/2212.05206
  • Wenlin Yao, Haitao Mi, Dong Yu, 25 Sep 2024, HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows, https://arxiv.org/abs/2409.17433
  • Kyle Wiggers, March 4, 2025, Amazon is reportedly developing its own AI ‘reasoning’ model: Amazon reportedly wants to get in on the AI “reasoning” model game, https://techcrunch.com/2025/03/04/amazon-is-reportedly-developing-its-own-ai-reasoning-model/
  • X Zhang, F Zhang, C Du, C Du, T Pang, W Gao, M Lin, Mar 2025, LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation, https://openreview.net/pdf?id=DfgfGTfObm
  • Supreeth Koundinya, March 10, 2025, Manus is a Wrapper of Anthropic’s Claude, and It’s Okay, https://analyticsindiamag.com/ai-features/manus-is-a-wrapper-of-anthropics-claude-and-its-okay/ (“Manus didn’t just slap an API on a model. They built an autonomous system that can execute deep research, deep thinking, and multi-step tasks in a way that no other AI have.”)
  • Sean Michael Kerner, March 18, 2025, Nvidia debuts Llama Nemotron open reasoning models in a bid to advance agentic AI, https://venturebeat.com/ai/nvidia-debuts-llama-nemotron-open-reasoning-models-in-a-bid-to-advance-agentic-ai/
  • Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng, 27 Mar 2025, A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond, https://arxiv.org/abs/2503.21614
  • Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
  • Michael Nuñez, August 4, 2025, ChatGPT rockets to 700M weekly users ahead of GPT-5 launch with reasoning superpowers, https://venturebeat.com/ai/chatgpt-rockets-to-700m-weekly-users-ahead-of-gpt-5-launch-with-reasoning-superpowers/
  • Adrian Kaiser and Claudiu Leoveanu-Condrei and Ryan Gold and Marius-Constantin Dinu and Markus Hofmarcher, 23 Jul 2025, HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs, https://arxiv.org/abs/2507.15917
  • Varun Bharti, Shashwat Jha, Dhruv Kumar, Pankaj Jalote, 1 Aug 2025, Loop Invariant Generation: A Hybrid Framework of Reasoning optimised LLMs and SMT Solvers, https://arxiv.org/abs/2508.00419
  • NVIDIA: Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adi Renduchintala, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan, Ashton Sharabiani, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Banghua Zhu, Barnaby Simkin, Bilal Kartal, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Brian Yu, Bryan Catanzaro, Charles Wang, Charlie Truong, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christian Munley, Christopher Parisien, Dan Su, Daniel Afrimi, Daniel Korzekwa, Daniel Rohrer, Daria Gitman, et al. (161 additional authors not shown), 20 Aug 2025, NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model, https://arxiv.org/abs/2508.14444

Reasoning and CoT Efficiency Topics

Blog articles on reasoning efficiency:

More research information on general efficiency optimization techniques for reasoning models:

Efficiency optimizations to Chain-of-Thought include:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: