Aussie AI

DeepSeek Training and Inference Optimization Research

Last Updated 18 September, 2025

by David Spuler, Ph.D.

What is DeepSeek?

DeepSeek is a China-based AI startup with innovative research in training and inference optimizations. In January 2025, the DeepSeek R1 rocked the US AI industry, dropping a number of US tech stocks including NVIDIA, by releasing the DeepSeek R1 reasoning model. The reason it was so impactful was:

Smart — better than OpenAI's o1 reasoning model on several metrics (but not all), and
Cheap — both training and inference were cheaper and faster.

The papers and LLM models released by DeepSeek include:

DeepSeek V1 (Jan 2024) — 7B/67B model trained on 2T tokens (V1 paper).
DeepSeek V2 (May 2024) — introduced MLA attention (V2 paper).
DeepSeek V3 (Dec 2024) — powerful standard foundation model (V3 paper).
DeepSeek R1 (Jan 2025) — powerful and a reasoning model (R1 paper).

DeepSeek also has an impressive line of (non-reasoning) text-to-image models:

Janus (Oct 2024) (Janus paper)
Janus Pro (Jan 2025) (Janus Pro link, paper)

Training optimizations. Some of the advances in training of reasoning models included:

Single-step reasoning (the main focus of training).
Supervised fine-tuning (initial phase)
Multi-stage training process (including reinforcement learning and distillation phases)
Mixed-precision training (with lots in FP8)
Optimized Mixture-of-Experts (MoE) architecture
Synthetic data in a reasoning dataset.
Human-curated reasoning dataset.
Reinforcement learning with a reasoning focus
Knowledge distillation techniques
Followup training for output readability and other issues

Overall, DeepSeek R1 used about 800,000 individual reasoning sequences for training. This led to the main advance, which is that the model can successfully reason in a single inference step.

DeepSeek R1 Inference Optimizations. The R1 model was also fast at inference, using some of the optimizations introduced in prior DeepSeek models. The main inference optimization techniques included:

Single-step reasoning — by having the model reason through complex problems by outputing a "long answer" rather than multiple steps of reasoning, this completely removes the extra steps of inference used in multi-step reasoning models, such as o1/o3 models.
Multi-head Latent Attention (MLA) — this optimization to the attention module, a major bottleneck in inference, was actually introduced by their V2 model, but has been improved in R1 (see MLA Attention).
Multi-token decoding — one of the major bottlenecks in LLM inference is the "autoregressive" decoding algorithm, which produces one token at a time, and multi-token decoding parallelizes this process as a type of parallel decoding algorithm.

Here are some of our Aussie AI blog articles on DeepSeek's innovations:

Research on DeepSeek Models

Research papers and articles about DeepSeek's innovations and impacts:

Tim Urista, Dec 2024, Dramatically Reduce Inference Costs with DeepSeek-V3: A New Era in Open-Source LLMs, https://ai.gopubby.com/dramatically-reduce-inference-costs-with-deepseek-v3-a-new-era-in-open-source-llms-4f1adf760ee1
Alberto Romero, Jan 2025, DeepSeek, a little-known Chinese startup, released R1 yesterday, https://substack.com/@thealgorithmicbridge/note/c-87664591-
Kyle Wiggers, January 27, 2025, Viral AI company DeepSeek releases new image model family, https://techcrunch.com/2025/01/27/viral-ai-company-deepseek-releases-new-image-model-family/
Manish Singh, January 27, 2025, DeepSeek ‘punctures’ AI leaders’ spending plans, and what analysts are saying, https://techcrunch.com/2025/01/27/deepseek-punctures-tech-spending-plans-and-what-analysts-are-saying/
Alex Kantrowitz, Jan 28, 2025, Notes on DeepSeek: Generative AI is All About the Applications Now: Building with AI might cost 5% of what it did a week ago, so what gets built has never been more important. https://www.bigtechnology.com/p/notes-on-deepseek-generative-ai-is
Minhajul Hoque, Jan 4, 2025, DeepSeek V3: How They Achieved Big Results with Small Compute, https://ai.plainenglish.io/deepseek-v3-how-they-achieved-big-results-with-small-compute-fb694606d59a (DeepSeek optimizations included FP8 quantization with outlier handling, attention and KV cache optimization via Multi-Head Latent Attention (MHLA), and multi-token decoding.)
Nandini Lokesh Reddy, Jan 2025, DeepSeek: Bridging Performance and Efficiency in Modern AI, https://medium.com/@nandinilreddy/deepseek-bridging-performance-and-efficiency-in-modern-ai-106181a85693
Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J.L. Cai, Jian Liang, et al. (additional authors not shown), 7 May 2024 (v1), last revised 19 Jun 2024 (v5), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI, https://arxiv.org/abs/2405.04434 (Introduces MHLA attention and FFN optimizations, amongst other advances.)
Charles Rollet, January 29, 2025, Zuck shrugs off DeepSeek, vows to spend hundreds of billions on AI, https://techcrunch.com/2025/01/29/zuck-shrugs-off-deepseek-vows-to-spend-hundreds-of-billions-on-ai/
Tiernan Ray, Jan. 28, 2025, Apple researchers reveal the secret sauce behind DeepSeek AI: The AI model that shook the world is part of a broad trend to squeeze more out of chips using what's called sparsity. https://www.zdnet.com/article/apple-researchers-reveal-the-secret-sauce-behind-deepseek-ai/ (Sparsity applied to MoE.)
Ai2, January 30, 2025, Scaling the Tülu 3 post-training recipes to surpass the performance of DeepSeek V3, https://allenai.org/blog/tulu-3-405B
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, et al. (100+ additional authors not shown), 22 Jan 2025, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948 (The DeepSeek R1 large reasoning model.)
DeepSeek-AI: Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao et al. (authors omitted), 5 Jan 2024, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, https://arxiv.org/abs/2401.02954 (DeepSeek's V1 model paper.)
Isaak Kamau, Jan 24, 2025, A Simple Guide to DeepSeek R1: Architecture, Training, Local Deployment, and Hardware Requirements, https://medium.com/@isaakmwangi2018/a-simple-guide-to-deepseek-r1-architecture-training-local-deployment-and-hardware-requirements-300c87991126
Joanne Tran, Amelia McGuire and Paul Smith, Jan 27, 2025, China AI hopeful DeepSeek rocks Wall Street’s magnificent seven, https://www.afr.com/markets/equity-markets/china-ai-hopeful-deepseek-rocks-wall-street-s-magnificent-seven-20250127-p5l7gq
Wired, Jan 28, 2025, DeepSeek’s New AI Model Sparks Shock, Awe, and Questions From US Competitors, https://www.wired.com/story/deepseek-executives-reaction-silicon-valley/
Tech Fund, Feb 03, 2025, The Winners from DeepSeek, Nvidia, and The Outlook in AI: A tour of the space & AI-exposed stocks, https://www.techinvestments.io/p/the-winners-from-deepseek-nvidia
Thor Olavsrud, How DeepSeek changes the gen AI equation for CIOs, 30 Jan 2025, https://www.cio.com/article/3813555/what-cios-should-learn-now-that-deepseek-is-here.html (" the future of gen AI lies in innovative, cost-efficient approaches")
G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
Anton Shilov, February 1, 2025, U.S. investigates whether DeepSeek smuggled Nvidia AI GPUs via Singapore, https://www.tomshardware.com/tech-industry/artificial-intelligence/u-s-investigates-whether-deepseek-smuggled-nvidia-ai-gpus-via-singapore
Ben Dickson, January 31, 2025, Beyond benchmarks: How DeepSeek-R1 and o1 perform on real-world tasks, https://venturebeat.com/ai/beyond-benchmarks-how-deepseek-r1-and-o1-perform-on-real-world-tasks/
Carl Franzen, January 31, 2025, It’s here: OpenAI’s o3-mini advanced reasoning model arrives to counter DeepSeek’s rise, https://venturebeat.com/ai/its-here-openais-o3-mini-advanced-reasoning-model-arrives-to-counter-deepseeks-rise/
Dr. Ashish Bamania, Feb 2025, HuatuoGPT-o1: A Medical Reasoning LLM With OpenAI o1 Like Capabilities Is Here: A deep dive into the techniques that let HuatuoGPT-o1 outperform all other open-source LLMs on multiple medical benchmarks. https://levelup.gitconnected.com/huatuogpt-o1-a-medical-reasoning-llm-with-openai-o1-like-capabilities-is-here-76ec7cc838df
Chris Metinko, January 27, 2025, Will DeepSeek Burst VC’s AI Bubble? https://news.crunchbase.com/ai/chinas-deepseek-tech-openai-nvda/
Daniel & Michael, Jan 27, 2025, Run DeepSeek R1 Dynamic 1.58-bit, https://unsloth.ai/blog/deepseekr1-dynamic
Ryan Browne, Feb 4 2025, DeepSeek’s breakthrough emboldens open-source AI models like Meta’s Llama, https://www.cnbc.com/2025/02/04/deepseek-breakthrough-emboldens-open-source-ai-models-like-meta-llama.html
Kevin Williams, Feb 2 2025, Chinese AI app DeepSeek was downloaded by millions. Deleting it might come next, https://www.cnbc.com/2025/02/02/why-deleting-chinas-deepseek-ai-may-be-next-for-millions-of-americans.html
elymc, Jan 2025, DeepSeek's New Open Source AI Model, https://www.perplexity.ai/page/deepseek-s-new-open-source-ai-YwAwjp_IQKiAJ2l1qFhN9g (Perplexity offers DeepSeek R1 model.)
Asha Sharma, Jan 29, 2025, DeepSeek R1 is now available on Azure AI Foundry and GitHub, https://azure.microsoft.com/en-us/blog/deepseek-r1-is-now-available-on-azure-ai-foundry-and-github/ (Microsoft offers DeepSeek R1 model.)
Michael Nuñez, January 30, 2025, Cerebras becomes the world’s fastest host for DeepSeek R1, outpacing Nvidia GPUs by 57x, https://venturebeat.com/ai/cerebras-becomes-the-worlds-fastest-host-for-deepseek-r1-outpacing-nvidia-gpus-by-57x/ (Cerebras offers DeepSeek R1 model.)
DeepSeek, Jan 2025, Janus Pro by DeepSeek: DeepSeek's revolutionary open-source multimodal AI model, featuring advanced text-to-image generation and visual understanding. Outperforming DALL-E 3 with 84.2% DPG-Bench accuracy, available in both 1B and 7B versions for flexible deployment, https://janus-deepseek.com/ (DeepSeek's Janus Pro text-to-image model.)
Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo, 17 Oct 2024, Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation, https://arxiv.org/abs/2410.13848 (First paper on Janus text-to-image model.)
Lily Ottinger and Jordan Schneider, Feb 01, 2025, DeepSeek: what it means and what happens next if you only read one thing, read this interview, https://www.chinatalk.media/p/deepseek-what-it-means-and-what-happens
Steven Sinofsky, Jan 27, 2025, 228. DeepSeek Has Been Inevitable and Here's Why (History Tells Us): DeepSeek was certain to happen. The only unknown was who was going to do it. The choices were a startup or someone outside the current center of US AI leadership and innovation, https://hardcoresoftware.learningbyshipping.com/p/228-deepseek-has-been-inevitable
Ryan Browne, Dylan Butts, Jan 31 2025, DeepSeek’s AI claims have shaken the world — but not everyone’s convinced, https://www.cnbc.com/2025/01/30/chinas-deepseek-has-some-big-ai-claims-not-all-experts-are-convinced-.html
Samantha Subin, Jan 27 2025, Nvidia sheds almost $600 billion in market cap, biggest one-day loss in U.S. history, https://www.cnbc.com/2025/01/27/nvidia-sheds-almost-600-billion-in-market-cap-biggest-drop-ever.html
Jasmine Wu, Deirdre Bosa, Jan 24 2025, How China’s new AI model DeepSeek is threatening U.S. dominance, https://www.cnbc.com/2025/01/24/how-chinas-new-ai-model-deepseek-is-threatening-us-dominance.html
Jenni Reid, Alex Harring, Jan 27 2025, Nvidia drops nearly 17% as China’s cheaper AI model DeepSeek sparks global tech sell-off, https://www.cnbc.com/2025/01/27/nvidia-falls-10percent-in-premarket-trading-as-chinas-deepseek-triggers-global-tech-sell-off.html
Mohammed Karimkhan Pathan, February 3, 2025, Open-source revolution: How DeepSeek-R1 challenges OpenAI’s o1 with superior processing, cost efficiency, https://venturebeat.com/ai/open-source-revolution-how-deepseek-r1-challenges-openais-o1-with-superior-processing-cost-efficiency/
Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, 29 Jan 2025, Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling, https://arxiv.org/abs/2501.17811
HuggingFace, February 3, 2025, OpenAI's Deep Research vs DeepSeek R1, https://huggingface.co/blog/LLMhacker/openais-deep-research-vs-deepseek-r1
Elie Bakouch, Leandro von Werra, Lewis Tunstall, January 28, 2025, HuggingFace, Open-R1: a fully open reproduction of DeepSeek-R1, https://huggingface.co/blog/open-r1
Kyle Wiggers, Jan 28, 2025, David Sacks claims there’s ‘substantial evidence’ that DeepSeek used OpenAI’s models to train its own, https://techcrunch.com/2025/01/28/david-sacks-claims-theres-substantial-evidence-that-deepseek-used-openais-models-to-train-its-own/
Kylie Robison and Elizabeth Lopatto, Jan 29, 2025, Why everyone is freaking out about DeepSeek: Did AI just disrupt Sam Altman’s job? https://www.theverge.com/ai-artificial-intelligence/598846/deepseek-big-tech-ai-industry-nvidia-impac
Reddit, Feb 2025, DeepSeek R1 takes #1 overall on a Creative Short Story Writing Benchmark, https://www.reddit.com/r/LocalLLaMA/comments/1ieooqe/deepseek_r1_takes_1_overall_on_a_creative_short/?rdt=65315
Lucas Mearian, 05 Feb 2025, AI chatbot war breaks out with DeepSeek debut, and the winner is…you, https://www.computerworld.com/article/3816605/ai-chatbot-war-breaks-out-with-deepseek-debut-and-the-winner-isyou.html
Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
Kyle Wiggers, January 28, 2025, Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model, https://techcrunch.com/2025/01/28/hugging-face-researchers-are-trying-to-build-a-more-open-version-of-deepseeks-ai-reasoning-model/
Amanda Silberling, Jan 28, 2025, Who is Liang Wenfeng? DeepSeek founder comes from AI investing, https://techcrunch.com/2025/01/28/who-is-liang-wenfeng-deepseek-founder-comes-from-ai-investing/
Wade Tyler Millward, February 4, 2025, Google Q4 2024 Earnings: CEO Pichai Says DeepSeek Models Less ‘Efficient’ Than Gemini’s, 'The cost of actually using it is going to keep coming down, which will make more use cases feasible,' Alphabet CEO Sundar Pichai says. https://www.crn.com/news/ai/2025/google-q4-2024-earnings-ceo-pichai-says-deepseek-models-less-efficient-than-gemini-s
Dylan Butts, Feb 6 2025, DeepSeek has rattled large AI players — but smaller chip firms see it as a force multiplier, https://www.cnbc.com/2025/02/07/deepseek-force-multiplier-for-smaller-ai-chip-firms-.html
Melissa Lee, Jan 28 2025, DeepSeek will spur new innovation in AI, says Groq COO, https://www.cnbc.com/video/2025/01/27/deepseek-will-spur-new-innovation-in-ai-says-groq-coo.html
Henrique Centieiro & Bee Lee, Feb 2025, 10 Myths About DeepSeek AI That Everyone Gets Wrong, https://medium.com/limitless-investor/10-myths-about-deepseek-ai-that-everyone-gets-wrong-2e5de797b756
Nikhil Anand, Feb 2025, Why I think DeepSeek-R1 just revealed the path to AGI. Here’s a visual explanation of exactly what makes DeepSeek-R1 so good. https://ai.gopubby.com/why-i-think-deepseek-r1-just-revealed-the-path-to-agi-d0add267197d
Tanay Jaipuria, Feb 11, 2025, How Big Tech Sees DeepSeek: Five Key Takeaways: On diffusion of innovation, the need for strong business models, lower inference costs benefiting apps and investing in infrastructure as a strategic advantage, https://www.tanayj.com/p/how-big-tech-sees-deepseek-five-key
Maya Akim, Feb 2025, Deepseek: How US accidentally Made China Great Again, https://medium.com/@mayaakim/deepseek-how-us-accidentally-made-china-great-again-1becf9a2a091
Leah Hodgson, February 8, 2025, DeepSeek's gift to the AI app space: DeepSeek might be just what the AI app space needs, https://pitchbook.com/news/articles/deepseek-might-be-just-what-the-ai-app-space-needs
Ignacio de Gregorio, Feb 2025, Let’s Settle The DeepSeek Drama Once and for All: What Should You Take Away From All This? https://medium.com/@ignacio.de.gregorio.noblejas/lets-settle-the-deepseek-drama-once-and-for-all-864500c972ca
Rafe Brena, Jan 31, 2025, AI Isn’t ‘Hitting A Wall.” Here Is Why: What does DeepSeek have to do with it? https://pub.towardsai.net/ai-isnt-hitting-a-wall-here-is-why-e75fe86e47f1
Dr. Ashish Bamania, Feb 2025, Multi-Head Latent Attention Is The Powerful Engine Behind DeepSeek: A deep dive Into DeepSeek’s innovative Attention mechanism that makes its LLMs so good https://levelup.gitconnected.com/multi-head-latent-attention-is-the-powerful-engine-behind-deepseek-0ecfd29e0b04 (MLA versus GQA/MQA attention and how MLA achieves KV cache compression.)
Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
John Liu, February 14, 2025, AI giants Baidu, OpenAI offer their chatbots for free in response to DeepSeek’s advance, https://edition.cnn.com/2025/02/14/tech/china-baidu-deepseek-ai-competition-free-services-intl-hnk/index.html
April Roach, Feb 14 2025, How China’s DeepSeek could boost the already booming data center market, https://www.cnbc.com/2025/02/14/how-chinas-deepseek-could-boost-the-already-booming-data-center-market.html
Lisa Eadicicco, February 14, 2025, The real reason behind the DeepSeek hype, according to AI experts, https://edition.cnn.com/2025/02/14/tech/deepseek-ai-openai-hype/index.html
Terry Chen, Bing Xu and Kirthi Devleker, Feb 12, 2025, Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling, https://developer.nvidia.com/blog/automating-gpu-kernel-generation-with-deepseek-r1-and-inference-time-scaling/
Ryan Browne, Feb 17 2025, ‘Game on’: Tech execs say DeepSeek ramps up China-U.S. competition but won’t hurt OpenAI, https://www.cnbc.com/2025/02/17/deepseek-ramps-up-china-us-competition-but-wont-hurt-openai.html
Tiernan Ray, Feb. 19, 2025, What is sparsity? DeepSeek AI's secret, revealed by Apple researchers: The AI model that shook the world is part of a broad trend to squeeze more out of chips. Here's how it works. https://www.zdnet.com/article/what-is-sparsity-deepseek-ais-secret-revealed-by-apple-researchers/
Tiernan Ray, Feb. 10, 2025, Cerebras CEO on DeepSeek: Every time computing gets cheaper, the market gets bigger. https://www.zdnet.com/article/cerebras-ceo-on-deepseek-every-time-computing-gets-cheaper-the-market-gets-bigger/ ("The economic breakthrough of DeepSeek's techniques will lead not only to an expansion of AI use but a continued arms race to achieve breakthroughs, says CEO Andrew Feldman.")
Jasmine Wu, Deirdre Bosa, Feb 21 2025, How DeepSeek used distillation to train its artificial intelligence model, and what it means for companies such as OpenAI, https://www.cnbc.com/2025/02/21/deepseek-trained-ai-model-using-distillation-now-a-disruptive-force.html
swyx, Jan 2025, X post: updated price-elo pareto frontier with deepseek v3/r1 and gemini 2 flash thinking 2 results, https://x.com/swyx/status/1882933368444309723
Anniek Bao, Feb 21 2025, Beijing embraces DeepSeek to lead AI adoption as it looks for new growth drivers, https://www.cnbc.com/2025/02/21/deepseek-led-ai-adoption-offers-china-opportunity-to-boost-growth.html
Rebecca Szkutak, February 21, 2025, Nvidia CEO Jensen Huang says market got it wrong about DeepSeek’s impact, https://techcrunch.com/2025/02/21/nvidia-ceo-jensen-huang-says-market-got-it-wrong-about-deepseeks-impact/
Kyle Wiggers, February 21, 2025, DeepSeek to open source parts of online services code, https://techcrunch.com/2025/02/21/deepseek-to-open-source-parts-of-online-services-code/
Tao Ji, Bin Guo, Yuanbin Wu, Qipeng Guo, Lixing Shen, Zhan Chen, Xipeng Qiu, Qi Zhang, Tao Gui, 20 Feb 2025, Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs, https://arxiv.org/abs/2502.14837
XYZ Labs, Feb 23, 2025, Open Reasoner Zero: A Breakthrough in AI Training Efficiency Matches DeepSeek with Just 1/30th of Training Steps. Major AI Figures Including Kai-Fu Lee, Harry Shum, and Xiangyu Zhang Unveil Revolutionary Open-Source Training Method. https://xyzlabs.substack.com/p/open-reasoner-zero-a-breakthrough
Lee Ying Shan, Feb 24 2025, DeepSeek has stoked a rotation out of India stocks into Chinese equities — but experts advise caution, https://www.cnbc.com/2025/02/24/deepseeks-rise-fuels-rally-in-chinas-markets-while-indias-allure-diminishes.html
Nickie Louise, February 24, 2025, DeepSeek launches FlashMLA: A breakthrough in AI speed and efficiency for NVIDIA GPUs, https://techstartups.com/2025/02/24/deepseek-launches-flashmla-a-breakthrough-in-ai-speed-and-efficiency-for-nvidia-gpus/
Jacob Siegal, Feb 25th, 2025, DeepSeek is rushing to get its next-gen R2 model out sooner than expected, https://bgr.com/tech/deepseek-is-rushing-to-get-its-next-gen-r2-model-out-sooner-than-expected/
DeepSeek, Feb 2025, DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling, https://github.com/deepseek-ai/DeepGEMM (DeepSeek releases FP8 GEMM kernels.)
Kif Leswing, Feb 26 2025, Nvidia CEO Huang says AI has to do ’100 times more’ computation now than when ChatGPT was released, https://www.cnbc.com/2025/02/26/nvidia-ceo-huang-says-next-generation-ai-will-need-more-compute.html (The thesis that AI reasoning will need 100 times more compute, regardless of whether it is a single-step "long answers" model thinking out loud, or a multi-step test time compute model.)
Greg McKenna, February 27, 2025, Jensen Huang hails DeepSeek, touts ‘extraordinary’ Blackwell demand after Nvidia crushes earnings—again, https://fortune.com/2025/02/26/jensen-huang-hails-deepseek-touts-extraordinary-blackwell-demand-nvidia-stock-earnings-call/
Greg McKenna, February 27, 2025, Nvidia smashes expectations yet again, posts record $130.5 billion in revenue for the year, https://fortune.com/2025/02/26/nvidia-earnings-call-stock-market-investors-expectations-wall-street/
Reuters, February 26, 2025, DeepSeek cuts off-peak pricing for developers by up to 75%, https://www.reuters.com/technology/chinas-deepseek-cuts-off-peak-pricing-by-up-75-2025-02-26/
Reuters, March 1, 2025, China's DeepSeek claims theoretical cost-profit ratio of 545% per day, https://www.reuters.com/technology/chinas-deepseek-claims-theoretical-cost-profit-ratio-545-per-day-2025-03-01/
Clare Duffy, February 26, 2025, Nvidia doubled profits in 2024. And its outlook is rosy despite AI jitters, https://edition.cnn.com/2025/02/26/tech/nvidia-earnings-ai-growth/index.html
Prasanth Aby Thomas, Feb 27, 2025, DeepSeek offers steep discounts, escalating AI price war, https://www.infoworld.com/article/3834662/deepseek-offers-steep-discounts-escalating-ai-price-war.html
Pradeep Viswanathan @pradeepviswav, Mar 3, 2025 , Microsoft brings DeepSeek 7B and 14B AI models to Copilot+ PCs, https://www.neowin.net/news/microsoft-brings-deepseek-7b-and-14b-ai-models-to-copilot-pcs/
Ashley Goolam, March 4, 2025, DeepSeek Open Source Week: A Complete Summary, https://apidog.com/blog/deepseek-open-source-week/
Zhipeng Chen, Yingqian Min, Beichen Zhang, Jie Chen, Jinhao Jiang, Daixuan Cheng, Wayne Xin Zhao, Zheng Liu, Xu Miao, Yang Lu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen, 6 Mar 2025, An Empirical Study on Eliciting and Improving R1-like Reasoning Models, https://arxiv.org/abs/2503.04548 https://github.com/RUCAIBox/Slow_Thinking_with_LLMs
Mehran Maghoumi, Chintan Patel and Chris Alexiuk, Feb 28, 2025, Build an AI Agent with Expert Reasoning Capabilities Using the DeepSeek-R1 NIM, https://developer.nvidia.com/blog/build-ai-agents-with-expert-reasoning-capabilities-using-deepseek-r1-nim/
Benjamin Spector, Aaryan Singhal, Dan Fu, Chris Ré, March 4, 2025, ThunderMLA: FlashMLA, Faster and Fused-er! https://hazyresearch.stanford.edu/blog/2025-03-04-thundermla https://github.com/HazyResearch/ThunderKittens/blob/mla/kernels/attn/demo/mla_decode/template_mla_decode.cu (Using a single CUDA "megakernel" to perform all jobs and passing it meta-instructions, thereby avoiding launching and shutting down kernels.)
Kyle Wiggers, March 9, 2025, Manus probably isn’t China’s second ‘DeepSeek moment’, https://techcrunch.com/2025/03/09/manus-probably-isnt-chinas-second-deepseek-moment/
Simon Sharwood, Mon 10 Mar 2025, Manus mania is here: Chinese ‘general agent’ is this week’s ‘future of AI' and OpenAI-killer: Prompts see it scour the web for info and turn it into decent documents at reasonable speed, https://www.theregister.com/2025/03/10/manus_chinese_general_ai_agent/
Bowen Jin, Hansi Zeng, Zhenrui Yue, Dong Wang, Hamed Zamani, Jiawei Han, 12 Mar 2025, Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, https://arxiv.org/abs/2503.09516 https://github.com/PeterGriffinJin/Search-R1
Florian Douetteau, Mar 13, 2025, How DeepSeek innovated large language models, https://www.infoworld.com/article/3842447/how-deepseek-innovated-large-language-models.html
Carl Franzen, March 17, 2025, Baidu delivers new LLMs ERNIE 4.5 and ERNIE X1 undercutting DeepSeek, OpenAI on cost — but they’re not open source (yet), https://venturebeat.com/ai/baidu-delivers-new-llms-ernie-4-5-and-ernie-x1-undercutting-deepseek-openai-on-cost-but-theyre-not-open-source-yet/
Ashraf Eassa, Anjali Shah, Huizi Mao, Hao Lu, Erin Ho, Justin Xin and Omri Almog, Mar 18, 2025, NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance, https://developer.nvidia.com/blog/nvidia-blackwell-delivers-world-record-deepseek-r1-inference-performance/
Chengen Wang, Murat Kantarcioglu, 14 Mar 2025, A Review of DeepSeek Models' Key Innovative Techniques, https://arxiv.org/abs/2503.11486
Michael Nuñez, March 24, 2025, DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI, https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/
L. Xiong et al., "DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models," in IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 5, pp. 841-858, May 2025, doi: 10.1109/JAS.2025.125495, https://ieeexplore.ieee.org/abstract/document/11005752/
Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y.X. Wei, 14 May 2025, Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures, https://arxiv.org/abs/2505.09343
Alexandra Sternlicht, June 18, 2025, China’s MiniMax debuts M1 AI model that it says costs 200x less to train than OpenAI’s GPT-4, https://fortune.com/2025/06/18/chinas-minimax-m1-ai-model-200x-less-expensive-to-train-than-openai-gpt-4/
Nathan Lambert, Jul 04, 2025, The American DeepSeek Project: What I think the next goal for the open-source AI community is, https://www.interconnects.ai/p/the-american-deepseek-project
Sebastian Raschka, Jul 19, 2025, The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design, https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
Petr Spelda and Vit Stritecky, 13 Aug 2025, Benchmark-Driven Selection of AI: Evidence from DeepSeek-R1, https://arxiv.org/abs/2508.10173
Donghao Huang and Zhaoxia Wang, 14 Aug 2025, Explainable Sentiment Analysis with DeepSeek-R1: Performance, Efficiency, and Few-Shot Learning, https://arxiv.org/abs/2503.11655
Mingda Zhang, Jianglong Qin, 22 Jul 2025, A Method for the Architecture of a Medical Vertical Large Language Model Based on Deepseek R1, https://arxiv.org/abs/2505.00025
Hulayyil Alshammari, Praveen Rao, 23 Jul 2025, Evaluating the Performance of AI Text Detectors, Few-Shot and Chain-of-Thought Prompting Using DeepSeek Generated Text, https://arxiv.org/abs/2507.17944
Marcin Pietro\'n, Rafa{\l} Olszowski, Jakub Gomu{\l}ka, Filip Gampel, Andrzej Tomski, 24 Jul 2025, A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1, https://arxiv.org/abs/2507.08621
Z.Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z.F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan, 18 Jul 2025, DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition, https://arxiv.org/abs/2504.21801
Jun Yin, Pengyu Zeng, Jing Zhong, Peilin Li, Miao Zhang, Ran Luo, Shuai Lu, 2 Aug 2025, FloorPlan-DeepSeek (FPDS): A multimodal approach to floorplan generation using vector-based next room prediction, https://arxiv.org/abs/2506.21562
Shubham Sharma, Sneha Tuli and Narendra Badam, 29 Aug 2025, Challenges and Applications of Large Language Models: A Comparison of GPT and DeepSeek family of models, https://arxiv.org/abs/2508.21377
Hexian Zhang, Xinyu Yan, Yanqi Yang, Lijian Jin, Ping Yang, Junwen Wang, 2 Sep 2025, DeepSeek performs better than other Large Language Models in Dental Cases, https://arxiv.org/abs/2509.02036