Aussie AI

Scaling Laws in Generative AI

Last Updated 27 August, 2025

by David Spuler, Ph.D.

What are AI Scaling Laws?

Scaling laws are the contention that AI models will become smarter by scaling the model size in terms of paramter count, and/or the total number of input tokens used in model training. Recent reductions in effects from greater training have thrown some of these scaling laws in doubt, giving rise to a new scaling law called the "inference scaling law," which says that scaling the amount of inference computations can also increase model intelligence.

What are Inference Scaling Laws?

Inference scaling laws are the contention that smarter LLMs can be created by using additional inference computations, such as repeated LLM queries at runtime, rather than by more extensive training. The success of the OpenAI "o1" model has supported this trend, as it is based on a multi-step inference algorithm called "Chain-of-Thought."

Research on Inference Scaling Laws

Research papers on the scaling laws in regard to multi-step inference:

Ethan Mollick, Sep 16, 2024, Scaling: The State of Play in AI, https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai
Akash Bajwa, Oct 07, 2024, Inference Time Scaling Laws: AI Megacycle Of System 1 And System 2 Applications, https://akashbajwa.substack.com/p/inference-time-scaling-laws
Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
Krystal Hu and Anna Tong, November 15, 2024, OpenAI and others seek new path to smarter AI as current methods hit limitations, https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/
Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
Rohin Manvi, Anikait Singh, Stefano Ermon, 3 Oct 2024, Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation, https://arxiv.org/abs/2410.02725
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li, 19 Jan 2024, Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. The Twelfth International Conference on Learning Representations, 2024, https://arxiv.org/abs/2401.10480 https://github.com/Yiwei98/ESC (Uses "early stopping" idea to improve CoT efficiency during inference.)
Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute,https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
Maxwell Zeff, January 7, 2025, Nvidia CEO says his AI chips are improving faster than Moore’s Law, https://techcrunch.com/2025/01/07/nvidia-ceo-says-his-ai-chips-are-improving-faster-than-moores-law/
Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo, 24 Feb 2025, How Do Large Language Monkeys Get Their Power (Laws)? https://arxiv.org/abs/2502.17578 (More attempts at reasoning on a problem means more accuracy.)
Shreya Kadambi, Risheek Garrepalli, Shubhankar Borse, Munawar Hyatt, Fatih Porikli, 16 Jul 2025, MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing, https://arxiv.org/abs/2507.13401
Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath, 18 Jul 2025, A General Framework for Inference-time Scaling and Steering of Diffusion Models, https://arxiv.org/abs/2501.06848
Zhuohao Yu, Xingru Jiang, Weizheng Gu, Yidong Wang, Shikun Zhang, Wei Ye, 11 Aug 2025, SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling, https://arxiv.org/abs/2508.08211
Youkang Wang, Jian Wang, Rubing Chen, Xiao-Yong Wei, 10 Aug 2025, Probabilistic Optimality for Inference-time Scaling, https://arxiv.org/abs/2506.22376
Yizhu Jin, Zhen Ye, Zeyue Tian, Haohe Liu, Qiuqiang Kong, Yike Guo, Wei Xue, 4 Aug 2025, Inference-time Scaling for Diffusion-based Audio Super-resolution, https://arxiv.org/abs/2508.02391
Tejomay Kishor Padole, Suyash P Awate and Pushpak Bhattacharyya, 14 Aug 2025, Improving Text Style Transfer using Masked Diffusion Language Models with Inference-time Scaling, https://arxiv.org/abs/2508.10995
Xun Su, Jianming Huang, Yang Yusen, Zhongxi Fang, Hiroyuki Kasai, 17 Aug 2025, Navigating the Exploration-Exploitation Tradeoff in Inference-Time Scaling of Diffusion Models, https://arxiv.org/abs/2508.12361
Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu and Akash Srivastava, 14 Aug 2025, Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods, https://arxiv.org/abs/2502.01618

What is Test Time Compute?

Test time compute is using additional computation at the LLM inference stage, rather than in pre-training or fine-tuning. The model weights stay constant during inference, but certain algorithms can improve reasoning through advanced prompting strategies and multi-step inference algorithms.

Research papers on test time compute:

Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute,https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng, 23 Jan 2025, Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, https://arxiv.org/abs/2501.13926 https://github.com/ZiyuGuo99/Image-Generation-CoT
G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, 20 Feb 2025, S*: Test Time Scaling for Code Generation, https://arxiv.org/abs/2502.14382 https://github.com/NovaSky-AI/SkyThought
Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://arxiv.org/abs/2502.12521
Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
Maxwell Zeff, February 24, 2025, Anthropic launches a new AI model that ‘thinks’ as long as you want, https://techcrunch.com/2025/02/24/anthropic-launches-a-new-ai-model-that-thinks-as-long-as-you-want/
Kif Leswing, Feb 26 2025, Nvidia CEO Huang says AI has to do ’100 times more’ computation now than when ChatGPT was released, https://www.cnbc.com/2025/02/26/nvidia-ceo-huang-says-next-generation-ai-will-need-more-compute.html (The thesis that AI reasoning will need 100 times more compute, regardless of whether it is a single-step "long answers" model thinking out loud, or a multi-step test time compute model.)
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei, 25 Feb 2025, Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning, https://arxiv.org/abs/2502.18080 (Trying to generate the "shortest correct response" by examining the lengths needed for CoT.)
Juntai Cao, Xiang Zhang, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini, 27 Feb 2025, Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing, https://arxiv.org/abs/2502.20592 (Test time computed applied to the multi-document summarization use case.)
Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
Supreeth Koundinya, March 10, 2025, Manus is a Wrapper of Anthropic’s Claude, and It’s Okay, https://analyticsindiamag.com/ai-features/manus-is-a-wrapper-of-anthropics-claude-and-its-okay/ (“Manus didn’t just slap an API on a model. They built an autonomous system that can execute deep research, deep thinking, and multi-step tasks in a way that no other AI have.”)
Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi, 20 Feb 2025 (v2), Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/abs/2502.01839 (Wrapping a single model with a Best-of-N approach that self-selects the best answer can significantly improve reasoning rates.)
Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal, 13 Mar 2025, Thinking Machines: A Survey of LLM based Reasoning Strategies, https://arxiv.org/abs/2503.10814
Michael Nuñez, July 22, 2025, Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber, https://venturebeat.com/ai/anthropic-researchers-discover-the-weird-ai-problem-why-thinking-longer-makes-models-dumber/
Sebastian Raschka, Mar 8, 2025, Inference-Time Compute Scaling Methods to Improve Reasoning Models: Part 1: Inference-Time Compute Scaling Methods, https://sebastianraschka.com/blog/2025/state-of-llm-reasoning-and-inference-scaling.html
Yafu Li, Xuyang Hu, Xiaoye Qu, Linjie Li, Yu Cheng, 22 Jan 2025, Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback, https://arxiv.org/abs/2501.12895 https://github.com/yafuly/TPO
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou, 10 Feb 2025, Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling, https://arxiv.org/abs/2502.06703
Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://www.arxiv.org/abs/2502.12521
Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang, 23 Feb 2025 (v2), Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking, https://arxiv.org/abs/2502.13842
Aryo Pradipta Gema, Alexander Hägele, Runjin Chen, Andy Arditi, Jacob Goldman-Wetzler, Kit Fraser-Taliente, Henry Sleight, Linda Petrini, Julian Michael, Beatrice Alex, Pasquale Minervini, Yanda Chen, Joe Benton, Ethan Perez, 19 Jul 2025, Inverse Scaling in Test-Time Compute, https://arxiv.org/abs/2507.14417
Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu, 17 Feb 2025, Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? https://arxiv.org/abs/2502.12215
Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu, 25 Feb 2025 (v2), From System 1 to System 2: A Survey of Reasoning Large Language Models, https://arxiv.org/abs/2502.17419
Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan, 16 May 2025, Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory, https://arxiv.org/abs/2505.10981
Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He, 8 May 2025 (v2), A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law, https://arxiv.org/abs/2505.02665
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen, 6 Jun 2025 (v2), Kinetics: Rethinking Test-Time Scaling Laws, https://arxiv.org/abs/2506.05333
Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty, 23 May 2025, First Finish Search: Efficient Test-Time Scaling in Large Language Models, https://arxiv.org/abs/2505.18149 (Running multiple parallel decoding steps but stopping when the fastest and usually shortest one completes.)
Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
Peter Wildeford, Aug 08, 2025, GPT-5: a small step for intelligence, a giant leap for normal people: GPT-5 focuses on where the money is - everyday users, not AI elites, https://peterwildeford.substack.com/p/gpt-5-a-small-step-for-intelligence
J. Pablo Mu\~noz and Jinjie Yuan, 7 Aug 2025, RTTC: Reward-Guided Collaborative Test-Time Compute, https://arxiv.org/abs/2508.10024
Guojun Wu, 19 Jul 2025, It's Not That Simple. An Analysis of Simple Test-Time Scaling, https://arxiv.org/abs/2507.14419
Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou, 21 Jul 2025, Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning, https://arxiv.org/abs/2505.16122
Mrinal Mathur, Mike Doan, Barak Pearlmutter, Sergey Plis, 17 Jul 2025, Change of Thought: Adaptive Test-Time Computation, https://arxiv.org/abs/2507.13569
Weichuang Shao, Iman Yi Liao, Tomas Henrique Bode Maul, and Tissa Chandesa, 21 Jul 2025, An Investigation of Test-time Adaptation for Audio Classification under Background Noise, https://arxiv.org/abs/2507.15523
Wooseong Jeong, Jegyeong Cho, Youngho Yoon, Kuk-Jin Yoon, 21 Jul 2025, Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training, https://arxiv.org/abs/2507.07778
Simon Ouellette, 17 Jul 2025, Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning, https://arxiv.org/abs/2507.15877
Tong Wu, Chong Xiang, Jiachen T. Wang, Weichen Yu, Chawin Sitawarin, Vikash Sehwag, Prateek Mittal, 21 Jul 2025, Does More Inference-Time Compute Really Help Robustness?, https://arxiv.org/abs/2507.15974
Matthias Otth, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 24 Jul 2025, Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning, https://arxiv.org/abs/2507.18122
Hao Luan, Yi Xian Goh, See-Kiong Ng, Chun Kai Ling, 14 Aug 2025, Projected Coupled Diffusion for Test-Time Constrained Joint Generation, https://arxiv.org/abs/2508.10531
Xingwu Chen, Miao Lu, Beining Wu, Difan Zou, 11 Aug 2025, Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression, https://arxiv.org/abs/2508.07571
Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Wanli Ouyang, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, Hao Zhou, 8 Aug 2025, AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model, https://arxiv.org/abs/2507.08920
Yufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin, Xiaofeng Zou, Cen Chen, Huiping Zhuang, 27 Jul 2025, Analytic Continual Test-Time Adaptation for Multi-Modality Corruption, https://arxiv.org/abs/2410.22373
Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He, 29 Jul 2025, Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning, https://arxiv.org/abs/2507.21494
Trae Research Team: Pengfei Gao, Zhao Tian, Xiangxin Meng, Xinchen Wang, Ruida Hu, Yuanan Xiao, Yizhou Liu, Zhao Zhang, Junjie Chen, Cuiyun Gao, Yun Lin, Yingfei Xiong, Chao Peng, Xia Liu, 31 Jul 2025, Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling, https://arxiv.org/abs/2507.23370
Mohammad Abdul Hafeez Khan, Yash Jain, Siddhartha Bhattacharyya and Vibhav Vineet, 22 Jul 2025, Test-time Prompt Refinement for Text-to-Image Models, https://arxiv.org/abs/2507.22076
Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag, 29 Jul 2025, Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation, https://arxiv.org/abs/2403.19776
Jiale Zhou, Wenhan Wang, Shikun Li, Xiaolei Qu, Xin Guo, Yizhong Liu, Wenzhong Tang, Xun Lin, Yefeng Zheng, 1 Aug 2025, TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation, https://arxiv.org/abs/2508.00442
Irene Iele, Francesco Di Feola, Valerio Guarrasi, Paolo Soda, 1 Aug 2025, Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation, https://arxiv.org/abs/2508.00766
Zixian Su, Jingwei Guo, Xi Yang, Qiufeng Wang, Kaizhu Huang, 1 Aug 2025, Un-mixing Test-time Adaptation under Heterogeneous Data Streams, https://arxiv.org/abs/2411.15173
Fali Wang, Hui Liu, Zhenwei Dai, Jingying Zeng, Zhiwei Zhang, Zongyu Wu, Chen Luo, Zhen Li, Xianfeng Tang, Qi He, Suhang Wang, 26 Jul 2025, AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks, https://arxiv.org/abs/2508.00890
Chenxu Yang, Qingyi Si, Mz Dai, Dingyu Yao, Mingyu Zheng, Minghui Chen, Zheng Lin, Weiping Wang, 4 Aug 2025, Test-time Prompt Intervention, https://arxiv.org/abs/2508.02511
Xinyu Chen, Haotian Zhai, Can Zhang, Xiupeng Shi, Ruirui Li, 2 Aug 2025, Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models, https://arxiv.org/abs/2508.01225
Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty, 3 Aug 2025, Test-Time Training for Speech Enhancement, https://arxiv.org/abs/2508.01847
Zhonghao Shi, Xuan Shi, Anfeng Xu, Tiantian Feng, Harshvardhan Srivastava, Shrikanth Narayanan, Maja J. Matari\'c, 3 Aug 2025, Examining Test-Time Adaptation for Personalized Child Speech Recognition, https://arxiv.org/abs/2409.13095
Zhende Song, Shengji Tang, Peng Ye, Jiayuan Fan, Tao Chen, 5 Aug 2025, CTTS: Collective Test-Time Scaling, https://arxiv.org/abs/2508.03333
Xinlei Yu, Zhangquan Chen, Yudong Zhang, Shilin Lu, Ruolin Shen, Jiangning Zhang, Xiaobin Hu, Yanwei Fu, Shuicheng Yan, 5 Aug 2025, Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling, https://arxiv.org/abs/2508.03404
Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen, 7 Aug 2025, Test-Time Reinforcement Learning for GUI Grounding via Region Consistency, https://arxiv.org/abs/2508.05615
Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong, 8 Aug 2025, ETA: Energy-based Test-time Adaptation for Depth Completion, https://arxiv.org/abs/2508.05989
Orion Weller and Kathryn Ricci and Eugene Yang and Andrew Yates and Dawn Lawrie and Benjamin Van Durme, 8 Aug 2025, Rank1: Test-Time Compute for Reranking in Information Retrieval, https://arxiv.org/abs/2502.18418
Peter Phan, Dhruv Agarwal, Kavitha Srinivas, Horst Samulowitz, Pavan Kapanipathi, Andrew McCallum, 12 Aug 2025, MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time, https://arxiv.org/abs/2508.08641
Sameer Ambekar, Daniel M. Lang, Julia A. Schnabel, 11 Aug 2025, Hierarchical Adaptive networks with Task vectors for Test-Time Adaptation, https://arxiv.org/abs/2508.09223
Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, Zeynep Akata, 13 Aug 2025, Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models, https://arxiv.org/abs/2508.09968
Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu, 12 Aug 2025, AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation, https://arxiv.org/abs/2504.07532
Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao, 15 Aug 2025, ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism, https://arxiv.org/abs/2508.11356
Dongjae Jeon, Taeheon Kim, Seongwon Cho, Minhyuk Seo, Jonghyun Choi, 18 Aug 2025, TTA-DAME: Test-Time Adaptation with Domain Augmentation and Model Ensemble for Dynamic Driving Conditions, https://arxiv.org/abs/2508.12690
Can Jin, Hongwu Peng, Qixin Zhang, Yujin Tang, Dimitris N. Metaxas, Tong Che, 19 Aug 2025, Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning, https://arxiv.org/abs/2504.09772
Amirmohsen Sattarifard, Sepehr Lavasani, Ehsan Imani, Kunlin Zhang, Hanlin Xu, Fengyu Sun, Negar Hassanpour, Chao Gao, 19 Aug 2025, GLASS: Test-Time Acceleration for LLMs via Global-Local Neural Importance Aggregation, https://arxiv.org/abs/2508.14302
Shansong Wang and Mojtaba Safari and Mingzhe Hu and Qiang Li and Chih-Wei Chang and Richard LJ Qiu and Xiaofeng Yang, 20 Aug 2025, DINOv3 with Test-Time Training for Medical Image Registration, https://arxiv.org/abs/2508.14809
Mandeep Rathee, Venktesh V, Sean MacAvaney, Avishek Anand, 21 Aug 2025, Test-time Corpus Feedback: From Retrieval to RAG, https://arxiv.org/abs/2508.15437
Youjia Zhang, Youngeun Kim, Young-Geun Choi, Hongyeob Kim, Huiling Liu, Sungeun Hong, 21 Aug 2025, Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment, https://arxiv.org/abs/2508.15568
Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
Ivan Rodkin, Daniil Orel, Konstantin Smirnov, Arman Bolatov, Bilal Elbouardi, Besher Hassan, Yuri Kuratov, Aydar Bulatov, Preslav Nakov, Timothy Baldwin, Artem Shelmanov and Mikhail Burtsev, 22 Aug 2025, Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling, https://arxiv.org/abs/2508.16745
V Venktesh, Mandeep rathee and Avishek Anand, 20 Aug 2025, Trust but Verify! A Survey on Verification Design for Test-time Scaling, https://arxiv.org/abs/2508.16665
Hung-Chun Hsu, Yuan-Ching Kuo, Chao-Han Huck Yang, Szu-Wei Fu, Hanrong Ye, Hongxu Yin, Yu-Chiang Frank Wang, Ming-Feng Tsai, Chuan-Ju Wang, 25 Aug 2025, Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations, https://arxiv.org/abs/2508.18132

Research on Scaling Laws

Research on the traditional scaling laws of model size and training data:

Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann, 2024, Navigating Scaling Laws: Compute Optimality in Adaptive Model Training https://openreview.net/pdf?id=3KxPo62PYn (Evaluates some model properties, such as width, on vision Transformers from the point of view of the scaling laws.)
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020. https://arxiv.org/abs/2001.08361
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. Training compute-optimal large language models. CoRR, abs/2203.15556, 2022. doi: 10.48550/arXiv.2203.15556. https://arxiv.org/abs/2203.15556
Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu, Karen Simonyan, 9 Feb 2022 (v2), Unified Scaling Laws for Routed Language Models, https://arxiv.org/abs/2202.01169
Benj Edwards, 16 July, 2024, Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism, https://arstechnica.com/information-technology/2024/07/microsoft-cto-defies-critics-ai-progress-not-slowing-down-its-just-warming-up/
Nandu Anilal, July 16, 2024, Infrastructure after AI Scaling: Why AI scaling won't last forever (and what comes next) https://nandu.substack.com/p/infrastructure-after-ai-scaling
Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong, 18 Jul 2024, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies, https://arxiv.org/abs/2407.13623
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Tiernan Ray, July 24, 2024, 3 ways Meta's Llama 3.1 is an advance for Gen AI, https://www.zdnet.com/article/3-ways-metas-llama-3-1-is-an-advance-for-gen-ai/
Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini, 31 Jul 2024, Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, https://arxiv.org/abs/2407.21787 (Generating multiple answers by repeated inference queries, and then using a verifier to choose the best one, which is shown to greatly increase overall accuracy.)
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang, 1 Aug 2024, An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, https://arxiv.org/abs/2408.00724
Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn, Jun 06, 2024, Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data, Epoch AI, https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
Nathan Lambert, Sep 05, 2024, OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference, Whether or not scaling works, we should spend more on inference, https://www.interconnects.ai/p/openai-strawberry-and-inference-scaling-laws
Ethan Mollick, Sep 16, 2024, Scaling: The State of Play in AI, https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai
Chuhan Wu, Ruiming Tang, 17 September 2024, Towards a Universal Scaling Law of LLM Training and Inference, DOI: 10.14293/PR2199.001074.v1, https://www.scienceopen.com/document_file/b3ff92f8-76a6-42ca-94d2-48693442bf98/ScienceOpenPreprint/Unified_law_arxiv.pdf
Elias Frantar, September, 2024, Compressing Large Neural Networks Algorithms, Systems and Scaling Laws, Ph.D. Thesis, Graduate School, Institute of Science and Technology, Austria, https://research-explorer.ista.ac.at/download/17485/17880/frantar_thesis_final.pdf
Akash Bajwa, Oct 07, 2024, Inference Time Scaling Laws: AI Megacycle Of System 1 And System 2 Applications, https://akashbajwa.substack.com/p/inference-time-scaling-laws
Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, Aditi Raghunathan, 7 Nov 2024, Scaling Laws for Precision, https://arxiv.org/abs/2411.04330
Krystal Hu and Anna Tong, November 15, 2024, OpenAI and others seek new path to smarter AI as current methods hit limitations, https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/
Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, 12 Nov 2024, Circuit Complexity Bounds for RoPE-based Transformer Architecture, https://arxiv.org/abs/2411.07602
Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu, 27 Nov 2024 (v2), Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens, https://arxiv.org/abs/2411.17691
Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
Gary Grossman, Edelman, December 1, 2024, The end of AI scaling may not be nigh: Here’s what’s next, https://venturebeat.com/ai/the-end-of-ai-scaling-may-not-be-nigh-heres-whats-next/
Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
Maxwell Zeff, January 7, 2025, Nvidia CEO says his AI chips are improving faster than Moore’s Law, https://techcrunch.com/2025/01/07/nvidia-ceo-says-his-ai-chips-are-improving-faster-than-moores-law/
Chien-Ping Lu, 8 Jan 2025 (v3), The Race to Efficiency: A New Perspective on AI Scaling Laws, https://arxiv.org/abs/2501.02156
Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Samira Abnar, Harshay Shah, Dan Busbridge, Alaaeldin Mohamed Elnouby Ali, Josh Susskind, Vimal Thilak, 21 Jan 2025, Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models, https://arxiv.org/abs/2501.12370
Mohit Sewak, Ph.D., January 29, 2025, Achieving General Intelligence (AGI) and Super Intelligence (ASI): Pathways, Uncertainties, and Ethical Concerns, https://towardsai.net/p/l/achieving-general-intelligence-agi-and-super-intelligence-asi-pathways-uncertainties-and-ethical-concerns
Da Yu, Edith Cohen, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Chiyuan Zhang, 3 Feb 2025, Scaling Embedding Layers in Language Models, https://arxiv.org/abs/2502.01637 (Using n-gram multi-token embedding layers, because embeddings are cheap to compute, rather than increasing vocabulary size.)
Sam Altman, Feb 10, 2025, Three Observations, https://blog.samaltman.com/three-observations (Talks about scaling laws, inference costs reducing, and AGI. One of them: "The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. ")
Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
Alberto Romero, Feb 19, 2025, Grok 3: Another Win For The Bitter Lesson. Congratulations to the xAI team—and the advocates of the scaling laws, https://www.thealgorithmicbridge.com/p/grok-3-another-win-for-the-bitter
Jeremy Kahn, February 26, 2025, The $19.6 billion pivot: How OpenAI’s 2-year struggle to launch GPT-5 revealed that its core AI strategy has stopped working, https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/
Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo, 24 Feb 2025, How Do Large Language Monkeys Get Their Power (Laws)? https://arxiv.org/abs/2502.17578 (More attempts at reasoning on a problem means more accuracy.)
METR, 19 March 2025 Measuring AI Ability to Complete Long Tasks, https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ (A unique analysis of a new "scaling law" that measures how good AI is against humans, in terms of the length of the tasks in minutes.)
Akash Bajwa, May 19, 2025, The Parallel Scaling Law for Language Models: Alibaba's Qwen team delivers a new scaling law, https://www.akashbajwa.co/p/the-parallel-scaling-law-for-language
Peter Wildeford, Aug 08, 2025, GPT-5: a small step for intelligence, a giant leap for normal people: GPT-5 focuses on where the money is - everyday users, not AI elites, https://peterwildeford.substack.com/p/gpt-5-a-small-step-for-intelligence
Xinji Mai, Haotian Xu, Xing W, Weinong Wang, Jian Hu, Yingying Zhang, Wenqiang Zhang, 23 Jul 2025, Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving, https://arxiv.org/abs/2505.07773
Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta, 24 Jul 2025, Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models, https://arxiv.org/abs/2507.18014
Md Arafat Hossain, Xingfu Wu, Valerie Taylor, Ali Jannesari, 8 Aug 2025, Generalizing Scaling Laws for Dense and Sparse Large Language Models, https://arxiv.org/abs/2508.06617
Lei Jiang and Fan Chen, 2 Aug 2025, CarbonScaling: Extending Neural Scaling Laws for Carbon Footprint in Large Language Models, https://arxiv.org/abs/2508.06524
Haowei Lin and Xiangyu Wang and Jianzhu Ma and Yitao Liang, 27 Jul 2025, EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models, https://arxiv.org/abs/2507.21184
Oleksiy Ostapenko, Charles Guille-Escuret, Luke Kumar, Max Tian, Denis Kocetkov, Gopeshh Subbaraj, Raymond Li, Joel Lamy-Poirier, Sebastien Paquet, Torsten Scholak, 29 Jul 2025, Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training, https://arxiv.org/abs/2507.22250
Dai Li, Kevin Course, Wei Li, Hongwei Li, Jie Hua, Yiqi Chen, Zhao Zhu, Rui Jian, Xuan Cao, Bi Xue, Yu Shi, Jing Qian, Kai Ren, Matt Ma, Qunshu Zhang, Rui Li, 4 Aug 2025, Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment, https://arxiv.org/abs/2508.02929
G\'erard Ben Arous, Murat A. Erdogdu, N. Mert Vural, Denny Wu, 5 Aug 2025, Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws, https://arxiv.org/abs/2508.03688
Zeyu Cao, Boyang Gu, Cheng Zhang, Pedro Gimenes, Jianqiao Lu, Jianyi Cheng, Xitong Gao, Yiren Zhao, 6 Aug 2025, Scaling Laws For Mixed Quantization, https://arxiv.org/abs/2410.06722
Xiaojun Wu, Xiaoguang Jiang, Huiyang Li, Jucai Zhai, Dengfeng Liu, Qiaobo Hao, Huang Liu, Zhiguo Yang, Ji Xie, Ninglun Gu, Jin Yang, Kailai Zhang, Yelun Bao, Jun Wang, 13 Aug 2025, Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning, https://arxiv.org/abs/2508.09883
Sheng Zhang, Qin Liu, Naoto Usuyama, Cliff Wong, Tristan Naumann, Hoifung Poon, 13 Aug 2025, Exploring Scaling Laws for EHR Foundation Models, https://arxiv.org/abs/2505.22964
Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi, 22 Aug 2025, Establishing Task Scaling Laws via Compute-Efficient Model Ladders, https://arxiv.org/abs/2412.04403