Aussie AI

Small Reasoning Models

  • Last Updated 29 August, 2025
  • by David Spuler, Ph.D.

What are Small Reasoning Models?

Small reasoning models are the combination of reasoning techniques with small language models. Large reasoning models are very expensive to run and the goal is to reduce the cost via a smaller model, but with some loss of accuracy. Small models can be used for two types of reasoning methods: either single-step reasoning or multiple-step inference-based reasoning.

There are two basic approaches to create a Small Reasoning Model (SRM):

  • Start with a Large Reasoning Model (LRM) and reduce its size, or
  • Start with a small model and increase its reasoning capabilities.

Cutting down a Large Reasoning Model to a smaller one may involve:

  • Model compression (e.g. quantization).
  • Distillation focused on reasoning knowledge

In the cases of open-source Large Reasoning Models (e.g. DeepSeek R1), there have already been releases of smaller versions, especially quantized ones.

Adding reasoning capabilities to a small model is particularly interesting to the open-source models world. There are many very capable small models of different sizes, but not many are specifically focused on reasoning. Some ways to go about it include:

  • Multi-step CoT algorithms wrapped around smaller base models.
  • Improved training and fine-tuning of single-step reasoning techniques to enhance a small model.
  • Combination of both approaches is also possible.

Research on Small Reasoning Models

Research papers include:

Reasoning and CoT Efficiency Topics

Blog articles on reasoning efficiency:

More research information on general efficiency optimization techniques for reasoning models:

Efficiency optimizations to Chain-of-Thought include:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: