Aussie AI

DeepSeek Training and Inference Optimization Research

  • Last Updated 26 August, 2025
  • by David Spuler, Ph.D.

What is DeepSeek?

DeepSeek is a China-based AI startup with innovative research in training and inference optimizations. In January 2025, the DeepSeek R1 rocked the US AI industry, dropping a number of US tech stocks including NVIDIA, by releasing the DeepSeek R1 reasoning model. The reason it was so impactful was:

  • Smart — better than OpenAI's o1 reasoning model on several metrics (but not all), and
  • Cheap — both training and inference were cheaper and faster.

The papers and LLM models released by DeepSeek include:

  • DeepSeek V1 (Jan 2024) — 7B/67B model trained on 2T tokens (V1 paper).
  • DeepSeek V2 (May 2024) — introduced MLA attention (V2 paper).
  • DeepSeek V3 (Dec 2024) — powerful standard foundation model (V3 paper).
  • DeepSeek R1 (Jan 2025) — powerful and a reasoning model (R1 paper).

DeepSeek also has an impressive line of (non-reasoning) text-to-image models:

Training optimizations. Some of the advances in training of reasoning models included:

  • Single-step reasoning (the main focus of training).
  • Supervised fine-tuning (initial phase)
  • Multi-stage training process (including reinforcement learning and distillation phases)
  • Mixed-precision training (with lots in FP8)
  • Optimized Mixture-of-Experts (MoE) architecture
  • Synthetic data in a reasoning dataset.
  • Human-curated reasoning dataset.
  • Reinforcement learning with a reasoning focus
  • Knowledge distillation techniques
  • Followup training for output readability and other issues

Overall, DeepSeek R1 used about 800,000 individual reasoning sequences for training. This led to the main advance, which is that the model can successfully reason in a single inference step.

DeepSeek R1 Inference Optimizations. The R1 model was also fast at inference, using some of the optimizations introduced in prior DeepSeek models. The main inference optimization techniques included:

  • Single-step reasoning — by having the model reason through complex problems by outputing a "long answer" rather than multiple steps of reasoning, this completely removes the extra steps of inference used in multi-step reasoning models, such as o1/o3 models.
  • Multi-head Latent Attention (MLA) — this optimization to the attention module, a major bottleneck in inference, was actually introduced by their V2 model, but has been improved in R1 (see MLA Attention).
  • Multi-token decoding — one of the major bottlenecks in LLM inference is the "autoregressive" decoding algorithm, which produces one token at a time, and multi-token decoding parallelizes this process as a type of parallel decoding algorithm.

Here are some of our Aussie AI blog articles on DeepSeek's innovations:

Research on DeepSeek Models

Research papers and articles about DeepSeek's innovations and impacts:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: