Aussie AI

Small Reasoning Models

Last Updated 22 October, 2025

by David Spuler, Ph.D.

What are Small Reasoning Models?

Small reasoning models are the combination of reasoning techniques with small language models. Large reasoning models are very expensive to run and the goal is to reduce the cost via a smaller model, but with some loss of accuracy. Small models can be used for two types of reasoning methods: either single-step reasoning or multiple-step inference-based reasoning.

There are two basic approaches to create a Small Reasoning Model (SRM):

Start with a Large Reasoning Model (LRM) and reduce its size, or
Start with a small model and increase its reasoning capabilities.

Cutting down a Large Reasoning Model to a smaller one may involve:

Model compression (e.g. quantization).
Distillation focused on reasoning knowledge

In the cases of open-source Large Reasoning Models (e.g. DeepSeek R1), there have already been releases of smaller versions, especially quantized ones.

Adding reasoning capabilities to a small model is particularly interesting to the open-source models world. There are many very capable small models of different sizes, but not many are specifically focused on reasoning. Some ways to go about it include:

Multi-step CoT algorithms wrapped around smaller base models.
Improved training and fine-tuning of single-step reasoning techniques to enhance a small model.
Combination of both approaches is also possible.

Research on Small Reasoning Models

Research papers include:

Matthias Bastian, Oct 6, 2024, Study reveals major reasoning flaws in smaller AI language models, https://the-decoder.com/study-reveals-major-reasoning-flaws-in-smaller-ai-language-models/
Shuyang Jiang, Yusheng Liao, Zhe Chen, Ya Zhang, Yanfeng Wang, Yu Wang, 21 Jan 2025, MedS3: Towards Medical Small Language Models with Self-Evolved Slow Thinking, https://arxiv.org/abs/2501.12051 https://github.com/pixas/medsss
Maxwell Zeff, February 5, 2025, Researchers created an open rival to OpenAI’s o1 ‘reasoning’ model for under $50, https://techcrunch.com/2025/02/05/researchers-created-an-open-rival-to-openais-o1-reasoning-model-for-under-50/
Kyle Wiggers, January 11, 2025, Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450,https://techcrunch.com/2025/01/11/researchers-open-source-sky-t1-a-reasoning-ai-model-that-can-be-trained-for-less-than-450/
Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
Asif Razzaq, March 5, 2025, Qwen Releases QwQ-32B: A 32B Reasoning Model that Achieves Significantly Enhanced Performance in Downstream Task, https://www.marktechpost.com/2025/03/05/qwen-releases-qwq-32b-a-32b-reasoning-model-that-achieves-significantly-enhanced-performance-in-downstream-task/ (Features 32B parameters, 32K context length, 64 layers, RoPE, SwiGLU, RMSNorm, and attention enhancements.)
Carl Franzen, March 5, 2025, New open-source math model Light-R1-32B surpasses equivalent DeepSeek performance with only $1000 in training costs, https://venturebeat.com/ai/new-open-source-math-model-light-r1-32b-surpasses-equivalent-deepseek-performance-with-only-1000-in-training-costs/
X Zhang, F Zhang, C Du, C Du, T Pang, W Gao, M Lin, Mar 2025, LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation, https://openreview.net/pdf?id=DfgfGTfObm
Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak, 14 May 2025 (v2), Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement, https://arxiv.org/abs/2505.07961
Xiaomi LLM-Core Team: Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai, Chenhong He, Dong Zhang, Duo Zhang, Guoan Wang, Hao Tian, Haochen Zhao, Heng Qu, Hongshen Xu, Jun Shi, Kainan Bao, QingKai Fang, Kang Zhou, Kangyang Zhou, Lei Li, Menghang Zhu, Nuo Chen, Qiantong Wang, Shaohui Liu, Shicheng Li, Shuhao Gu, Shuhuai Ren, Shuo Liu, Sirui Deng, Weiji Zhuang, Weiwei Lv, Wenyu Yang, Xin Zhang, Xing Yong, Xing Zhang, Xingchen Song, Xinzhe Xu, Xu Wang, Yihan Yan, Yu Tu, Yuanyuan Tian, Yudong Wang, Yue Yu, Zhenru Lin, Zhichao Song, Zihao Yue, Xiaomi, 12 May 2025, MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining, https://arxiv.org/abs/2505.07608
Haoran Xu, Baolin Peng, Hany Awadalla, Dongdong Chen, Yen-Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen, 30 Apr 2025, Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math, https://arxiv.org/abs/2504.21233
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou, 10 Feb 2025, Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling, https://arxiv.org/abs/2502.06703
Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang, 15 Apr 2025, Efficient Reasoning Models: A Survey, https://arxiv.org/abs/2504.10903
Bin Hong, Jiayu Liu, Zhenya Huang, Kai Zhang, Mengdi Zhang, 13 Aug 2025, Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization, https://arxiv.org/abs/2508.10164
Li Wang, Changhao Zhang, Zengqi Xiu, Kai Lu, Xin Yu, Kui Zhang, Wenjun Wu, 7 Aug 2025, Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning, https://arxiv.org/abs/2508.10019
Xinhan Di, JoyJiaoW, 3 Aug 2025, Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention, https://arxiv.org/abs/2508.01604
Dayu Wang, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li, 28 Aug 2025, MSARL: Decoupling Reasoning and Tool Use with Multi-Small-Agent Reinforcement Learning, https://arxiv.org/abs/2508.08882
Viacheslav Sinii, Nikita Balagansky, Yaroslav Aksenov, Vadim Kurochkin, Daniil Laptev, Gleb Gerasimov, Alexey Gorbatovski, Boris Shaposhnikov, Daniil Gavrilov, 8 Sep 2025, Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors, https://arxiv.org/abs/2509.06608
Ozan Gokdemir, Neil Getty, Robert Underwood, Sandeep Madireddy, Franck Cappello, Arvind Ramanathan, Ian T. Foster, and Rick L. Stevens, 12 Sep 2025, Automated MCQA Benchmarking at Scale: Evaluating Reasoning Traces as Retrieval Sources for Domain Adaptation of Small Language Models, https://arxiv.org/abs/2509.10744