Aussie AI

Scaling Laws in Generative AI

  • Last Updated 22 October, 2025
  • by David Spuler, Ph.D.

What are AI Scaling Laws?

Scaling laws are the contention that AI models will become smarter by scaling the model size in terms of paramter count, and/or the total number of input tokens used in model training. Recent reductions in effects from greater training have thrown some of these scaling laws in doubt, giving rise to a new scaling law called the "inference scaling law," which says that scaling the amount of inference computations can also increase model intelligence.

What are Inference Scaling Laws?

Inference scaling laws are the contention that smarter LLMs can be created by using additional inference computations, such as repeated LLM queries at runtime, rather than by more extensive training. The success of the OpenAI "o1" model has supported this trend, as it is based on a multi-step inference algorithm called "Chain-of-Thought."

Research on Inference Scaling Laws

Research papers on the scaling laws in regard to multi-step inference:

What is Test Time Compute?

Test time compute is using additional computation at the LLM inference stage, rather than in pre-training or fine-tuning. The model weights stay constant during inference, but certain algorithms can improve reasoning through advanced prompting strategies and multi-step inference algorithms.

Research papers on test time compute:

  • Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
  • Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
  • mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
  • Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
  • Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
  • Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
  • Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
  • Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
  • Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
  • Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute,https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
  • Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
  • Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng, 23 Jan 2025, Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, https://arxiv.org/abs/2501.13926 https://github.com/ZiyuGuo99/Image-Generation-CoT
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
  • Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
  • Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
  • Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, 20 Feb 2025, S*: Test Time Scaling for Code Generation, https://arxiv.org/abs/2502.14382 https://github.com/NovaSky-AI/SkyThought
  • Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
  • Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://arxiv.org/abs/2502.12521
  • Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
  • Maxwell Zeff, February 24, 2025, Anthropic launches a new AI model that ‘thinks’ as long as you want, https://techcrunch.com/2025/02/24/anthropic-launches-a-new-ai-model-that-thinks-as-long-as-you-want/
  • Kif Leswing, Feb 26 2025, Nvidia CEO Huang says AI has to do ’100 times more’ computation now than when ChatGPT was released, https://www.cnbc.com/2025/02/26/nvidia-ceo-huang-says-next-generation-ai-will-need-more-compute.html (The thesis that AI reasoning will need 100 times more compute, regardless of whether it is a single-step "long answers" model thinking out loud, or a multi-step test time compute model.)
  • Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei, 25 Feb 2025, Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning, https://arxiv.org/abs/2502.18080 (Trying to generate the "shortest correct response" by examining the lengths needed for CoT.)
  • Juntai Cao, Xiang Zhang, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini, 27 Feb 2025, Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing, https://arxiv.org/abs/2502.20592 (Test time computed applied to the multi-document summarization use case.)
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Supreeth Koundinya, March 10, 2025, Manus is a Wrapper of Anthropic’s Claude, and It’s Okay, https://analyticsindiamag.com/ai-features/manus-is-a-wrapper-of-anthropics-claude-and-its-okay/ (“Manus didn’t just slap an API on a model. They built an autonomous system that can execute deep research, deep thinking, and multi-step tasks in a way that no other AI have.”)
  • Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi, 20 Feb 2025 (v2), Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/abs/2502.01839 (Wrapping a single model with a Best-of-N approach that self-selects the best answer can significantly improve reasoning rates.)
  • Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal, 13 Mar 2025, Thinking Machines: A Survey of LLM based Reasoning Strategies, https://arxiv.org/abs/2503.10814
  • Michael Nuñez, July 22, 2025, Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber, https://venturebeat.com/ai/anthropic-researchers-discover-the-weird-ai-problem-why-thinking-longer-makes-models-dumber/
  • Sebastian Raschka, Mar 8, 2025, Inference-Time Compute Scaling Methods to Improve Reasoning Models: Part 1: Inference-Time Compute Scaling Methods, https://sebastianraschka.com/blog/2025/state-of-llm-reasoning-and-inference-scaling.html
  • Yafu Li, Xuyang Hu, Xiaoye Qu, Linjie Li, Yu Cheng, 22 Jan 2025, Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback, https://arxiv.org/abs/2501.12895 https://github.com/yafuly/TPO
  • Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou, 10 Feb 2025, Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling, https://arxiv.org/abs/2502.06703
  • Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://www.arxiv.org/abs/2502.12521
  • Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang, 23 Feb 2025 (v2), Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking, https://arxiv.org/abs/2502.13842
  • Aryo Pradipta Gema, Alexander Hägele, Runjin Chen, Andy Arditi, Jacob Goldman-Wetzler, Kit Fraser-Taliente, Henry Sleight, Linda Petrini, Julian Michael, Beatrice Alex, Pasquale Minervini, Yanda Chen, Joe Benton, Ethan Perez, 19 Jul 2025, Inverse Scaling in Test-Time Compute, https://arxiv.org/abs/2507.14417
  • Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
  • Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu, 17 Feb 2025, Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? https://arxiv.org/abs/2502.12215
  • Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu, 25 Feb 2025 (v2), From System 1 to System 2: A Survey of Reasoning Large Language Models, https://arxiv.org/abs/2502.17419
  • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
  • Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan, 16 May 2025, Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory, https://arxiv.org/abs/2505.10981
  • Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He, 8 May 2025 (v2), A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law, https://arxiv.org/abs/2505.02665
  • Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen, 6 Jun 2025 (v2), Kinetics: Rethinking Test-Time Scaling Laws, https://arxiv.org/abs/2506.05333
  • Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty, 23 May 2025, First Finish Search: Efficient Test-Time Scaling in Large Language Models, https://arxiv.org/abs/2505.18149 (Running multiple parallel decoding steps but stopping when the fastest and usually shortest one completes.)
  • Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
  • Peter Wildeford, Aug 08, 2025, GPT-5: a small step for intelligence, a giant leap for normal people: GPT-5 focuses on where the money is - everyday users, not AI elites, https://peterwildeford.substack.com/p/gpt-5-a-small-step-for-intelligence
  • J. Pablo Mu\~noz and Jinjie Yuan, 7 Aug 2025, RTTC: Reward-Guided Collaborative Test-Time Compute, https://arxiv.org/abs/2508.10024
  • Guojun Wu, 19 Jul 2025, It's Not That Simple. An Analysis of Simple Test-Time Scaling, https://arxiv.org/abs/2507.14419
  • Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou, 21 Jul 2025, Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning, https://arxiv.org/abs/2505.16122
  • Mrinal Mathur, Mike Doan, Barak Pearlmutter, Sergey Plis, 17 Jul 2025, Change of Thought: Adaptive Test-Time Computation, https://arxiv.org/abs/2507.13569
  • Weichuang Shao, Iman Yi Liao, Tomas Henrique Bode Maul, and Tissa Chandesa, 21 Jul 2025, An Investigation of Test-time Adaptation for Audio Classification under Background Noise, https://arxiv.org/abs/2507.15523
  • Wooseong Jeong, Jegyeong Cho, Youngho Yoon, Kuk-Jin Yoon, 21 Jul 2025, Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training, https://arxiv.org/abs/2507.07778
  • Simon Ouellette, 17 Jul 2025, Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning, https://arxiv.org/abs/2507.15877
  • Tong Wu, Chong Xiang, Jiachen T. Wang, Weichen Yu, Chawin Sitawarin, Vikash Sehwag, Prateek Mittal, 21 Jul 2025, Does More Inference-Time Compute Really Help Robustness?, https://arxiv.org/abs/2507.15974
  • Matthias Otth, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 24 Jul 2025, Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning, https://arxiv.org/abs/2507.18122
  • Hao Luan, Yi Xian Goh, See-Kiong Ng, Chun Kai Ling, 14 Aug 2025, Projected Coupled Diffusion for Test-Time Constrained Joint Generation, https://arxiv.org/abs/2508.10531
  • Xingwu Chen, Miao Lu, Beining Wu, Difan Zou, 11 Aug 2025, Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression, https://arxiv.org/abs/2508.07571
  • Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Wanli Ouyang, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, Hao Zhou, 8 Aug 2025, AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model, https://arxiv.org/abs/2507.08920
  • Yufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin, Xiaofeng Zou, Cen Chen, Huiping Zhuang, 27 Jul 2025, Analytic Continual Test-Time Adaptation for Multi-Modality Corruption, https://arxiv.org/abs/2410.22373
  • Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He, 29 Jul 2025, Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning, https://arxiv.org/abs/2507.21494
  • Trae Research Team: Pengfei Gao, Zhao Tian, Xiangxin Meng, Xinchen Wang, Ruida Hu, Yuanan Xiao, Yizhou Liu, Zhao Zhang, Junjie Chen, Cuiyun Gao, Yun Lin, Yingfei Xiong, Chao Peng, Xia Liu, 31 Jul 2025, Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling, https://arxiv.org/abs/2507.23370
  • Mohammad Abdul Hafeez Khan, Yash Jain, Siddhartha Bhattacharyya and Vibhav Vineet, 22 Jul 2025, Test-time Prompt Refinement for Text-to-Image Models, https://arxiv.org/abs/2507.22076
  • Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
  • Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag, 29 Jul 2025, Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation, https://arxiv.org/abs/2403.19776
  • Jiale Zhou, Wenhan Wang, Shikun Li, Xiaolei Qu, Xin Guo, Yizhong Liu, Wenzhong Tang, Xun Lin, Yefeng Zheng, 1 Aug 2025, TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation, https://arxiv.org/abs/2508.00442
  • Irene Iele, Francesco Di Feola, Valerio Guarrasi, Paolo Soda, 1 Aug 2025, Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation, https://arxiv.org/abs/2508.00766
  • Zixian Su, Jingwei Guo, Xi Yang, Qiufeng Wang, Kaizhu Huang, 1 Aug 2025, Un-mixing Test-time Adaptation under Heterogeneous Data Streams, https://arxiv.org/abs/2411.15173
  • Fali Wang, Hui Liu, Zhenwei Dai, Jingying Zeng, Zhiwei Zhang, Zongyu Wu, Chen Luo, Zhen Li, Xianfeng Tang, Qi He, Suhang Wang, 26 Jul 2025, AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks, https://arxiv.org/abs/2508.00890
  • Chenxu Yang, Qingyi Si, Mz Dai, Dingyu Yao, Mingyu Zheng, Minghui Chen, Zheng Lin, Weiping Wang, 4 Aug 2025, Test-time Prompt Intervention, https://arxiv.org/abs/2508.02511
  • Xinyu Chen, Haotian Zhai, Can Zhang, Xiupeng Shi, Ruirui Li, 2 Aug 2025, Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models, https://arxiv.org/abs/2508.01225
  • Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty, 3 Aug 2025, Test-Time Training for Speech Enhancement, https://arxiv.org/abs/2508.01847
  • Zhonghao Shi, Xuan Shi, Anfeng Xu, Tiantian Feng, Harshvardhan Srivastava, Shrikanth Narayanan, Maja J. Matari\'c, 3 Aug 2025, Examining Test-Time Adaptation for Personalized Child Speech Recognition, https://arxiv.org/abs/2409.13095
  • Zhende Song, Shengji Tang, Peng Ye, Jiayuan Fan, Tao Chen, 5 Aug 2025, CTTS: Collective Test-Time Scaling, https://arxiv.org/abs/2508.03333
  • Xinlei Yu, Zhangquan Chen, Yudong Zhang, Shilin Lu, Ruolin Shen, Jiangning Zhang, Xiaobin Hu, Yanwei Fu, Shuicheng Yan, 5 Aug 2025, Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling, https://arxiv.org/abs/2508.03404
  • Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen, 7 Aug 2025, Test-Time Reinforcement Learning for GUI Grounding via Region Consistency, https://arxiv.org/abs/2508.05615
  • Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong, 8 Aug 2025, ETA: Energy-based Test-time Adaptation for Depth Completion, https://arxiv.org/abs/2508.05989
  • Orion Weller and Kathryn Ricci and Eugene Yang and Andrew Yates and Dawn Lawrie and Benjamin Van Durme, 8 Aug 2025, Rank1: Test-Time Compute for Reranking in Information Retrieval, https://arxiv.org/abs/2502.18418
  • Peter Phan, Dhruv Agarwal, Kavitha Srinivas, Horst Samulowitz, Pavan Kapanipathi, Andrew McCallum, 12 Aug 2025, MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time, https://arxiv.org/abs/2508.08641
  • Sameer Ambekar, Daniel M. Lang, Julia A. Schnabel, 11 Aug 2025, Hierarchical Adaptive networks with Task vectors for Test-Time Adaptation, https://arxiv.org/abs/2508.09223
  • Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, Zeynep Akata, 13 Aug 2025, Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models, https://arxiv.org/abs/2508.09968
  • Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu, 12 Aug 2025, AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation, https://arxiv.org/abs/2504.07532
  • Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao, 15 Aug 2025, ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism, https://arxiv.org/abs/2508.11356
  • Dongjae Jeon, Taeheon Kim, Seongwon Cho, Minhyuk Seo, Jonghyun Choi, 18 Aug 2025, TTA-DAME: Test-Time Adaptation with Domain Augmentation and Model Ensemble for Dynamic Driving Conditions, https://arxiv.org/abs/2508.12690
  • Can Jin, Hongwu Peng, Qixin Zhang, Yujin Tang, Dimitris N. Metaxas, Tong Che, 19 Aug 2025, Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning, https://arxiv.org/abs/2504.09772
  • Amirmohsen Sattarifard, Sepehr Lavasani, Ehsan Imani, Kunlin Zhang, Hanlin Xu, Fengyu Sun, Negar Hassanpour, Chao Gao, 19 Aug 2025, GLASS: Test-Time Acceleration for LLMs via Global-Local Neural Importance Aggregation, https://arxiv.org/abs/2508.14302
  • Shansong Wang and Mojtaba Safari and Mingzhe Hu and Qiang Li and Chih-Wei Chang and Richard LJ Qiu and Xiaofeng Yang, 20 Aug 2025, DINOv3 with Test-Time Training for Medical Image Registration, https://arxiv.org/abs/2508.14809
  • Mandeep Rathee, Venktesh V, Sean MacAvaney, Avishek Anand, 21 Aug 2025, Test-time Corpus Feedback: From Retrieval to RAG, https://arxiv.org/abs/2508.15437
  • Youjia Zhang, Youngeun Kim, Young-Geun Choi, Hongyeob Kim, Huiling Liu, Sungeun Hong, 21 Aug 2025, Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment, https://arxiv.org/abs/2508.15568
  • Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
  • Ivan Rodkin, Daniil Orel, Konstantin Smirnov, Arman Bolatov, Bilal Elbouardi, Besher Hassan, Yuri Kuratov, Aydar Bulatov, Preslav Nakov, Timothy Baldwin, Artem Shelmanov and Mikhail Burtsev, 22 Aug 2025, Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling, https://arxiv.org/abs/2508.16745
  • V Venktesh, Mandeep rathee and Avishek Anand, 20 Aug 2025, Trust but Verify! A Survey on Verification Design for Test-time Scaling, https://arxiv.org/abs/2508.16665
  • Hung-Chun Hsu, Yuan-Ching Kuo, Chao-Han Huck Yang, Szu-Wei Fu, Hanrong Ye, Hongxu Yin, Yu-Chiang Frank Wang, Ming-Feng Tsai, Chuan-Ju Wang, 25 Aug 2025, Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations, https://arxiv.org/abs/2508.18132
  • Jeremy Berman, Sep 17, 2025, How I got the highest score on ARC-AGI again swapping Python for English: Using Multi-Agent Collaboration with Evolutionary Test-Time Compute, https://jeremyberman.substack.com/p/how-i-got-the-highest-score-on-arc-agi-again (Generates multiple solutions then prunes them with "evolution" and iterates in multi-step inference.)
  • Davide Paglieri, Bart{\l}omiej Cupia{\l}, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rockt\"aschel, 3 Sep 2025, Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents, https://arxiv.org/abs/2509.03581
  • Minjong Yoo, Jinwoo Jang, Sihyung Yoon, Honguk Woo, 4 Sep 2025, World Model Implanting for Test-time Adaptation of Embodied Agents, https://arxiv.org/abs/2509.03956
  • Isidoro Tamassia and Wendelin B\"ohmer, 4 Sep 2025, Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes, https://arxiv.org/abs/2509.04317
  • Yuchen Jiao, Yuxin Chen, Gen Li, 4 Sep 2025, Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology, https://arxiv.org/abs/2509.04372
  • Jie Chen, Jinhao Jiang, Yingqian Min, Zican Dong, Shijie Wang, Wayne Xin Zhao, Ji-Rong Wen, 5 Sep 2025, Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework, https://arxiv.org/abs/2509.05007
  • Shuaicheng Niu, Guohao Chen, Deyu Chen, Yifan Zhang, Jiaxiang Wu, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Chunyan Miao, Mingkui Tan, 5 Sep 2025, Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization, https://arxiv.org/abs/2509.04977
  • Shengyin Sun and Yiming Li and Xing Li and Yingzhao Lian and Weizhe Lin and Hui-Ling Zhen and Zhiyuan Yang and Chen Chen and Xianzhi Yu and Mingxuan Yuan and Chen Ma, 30 Aug 2025, Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling, https://arxiv.org/abs/2509.04474
  • Hao Wen, Yifan Su, Feifei Zhang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li, 30 Aug 2025, ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute, https://arxiv.org/abs/2509.04475
  • Hang Wu, Hongkai Chen, Yujun Cai, Chang Liu, Qingwen Ye, Ming-Hsuan Yang, Yiwei Wang, 5 Sep 2025, DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning, https://arxiv.org/abs/2507.00008
  • Byung-Joon Lee, Jin-Seop Lee, Jee-Hyong Lee, 26 Aug 2025, Stabilizing Open-Set Test-Time Adaptation via Primary-Auxiliary Filtering and Knowledge-Integrated Prediction, https://arxiv.org/abs/2508.18751
  • Mingkui Tan, Guohao Chen, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Peilin Zhao, Shuaicheng Niu, 26 Aug 2025, Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting, https://arxiv.org/abs/2403.11491
  • Ramya Keerthy Thatikonda, Wray Buntine, Ehsan Shareghi, 27 Aug 2025, Logical Reasoning with Outcome Reward Models for Test-Time Scaling, https://arxiv.org/abs/2508.19903
  • Lijun Sheng, Jian Liang, Zilei Wang, Ran He, 27 Aug 2025, R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning, https://arxiv.org/abs/2504.11195
  • Hao Mark Chen, Zhiwen Mo, Guanxi Lu, Shuang Liang, Lingxiao Ma, Wayne Luk, Hongxiang Fan, 29 Aug 2025, Democratizing Agentic AI with Fast Test-Time Scaling on the Edge, https://arxiv.org/abs/2509.00195
  • Sachin Goyal, David Lopez-Paz, Kartik Ahuja, 1 Sep 2025, Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling, https://arxiv.org/abs/2509.01649
  • Jintao Cheng, Weibin Li, Jiehao Luo, Xiaoyu Tang, Zhijian He, Jin Wu, Yao Zou, Wei Zhang, 2 Sep 2025, Scale, Don't Fine-tune: Guiding Multimodal LLMs for Efficient Visual Place Recognition at Test-Time, https://arxiv.org/abs/2509.02129
  • Jiefeng Chen, Jie Ren, Xinyun Chen, Chengrun Yang, Ruoxi Sun, Jinsung Yoon, Sercan \"O Ar{\i}k, 30 Aug 2025, SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling, https://arxiv.org/abs/2501.19306
  • Hritik Arasu, Faisal R Jahangiri, 3 Sep 2025, StableSleep: Source-Free Test-Time Adaptation for Sleep Staging with Lightweight Safety Rails, https://arxiv.org/abs/2509.02982
  • James Xu Zhao, Bryan Hooi, See-Kiong Ng, 8 Sep 2025, Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet, https://arxiv.org/abs/2509.06861
  • Fin Amin and Jung-Eun Kim, 7 Sep 2025, The Over-Certainty Phenomenon in Modern Test-Time Adaptation Algorithms, https://arxiv.org/abs/2404.16168
  • Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao, 6 Sep 2025, M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models, https://arxiv.org/abs/2504.10449
  • Vignav Ramesh, Morteza Mardani, 6 Sep 2025, Test-Time Scaling of Diffusion Models via Noise Trajectory Search, https://arxiv.org/abs/2506.03164
  • Jenny Y. Huang, Mehul Damani, Yousef El-Kurdi, Ramon Astudillo, Wei Sun, 11 Sep 2025, Latency and Token-Aware Test-Time Compute, https://arxiv.org/abs/2509.09864
  • Taylor Archibald and Tony Martinez, 10 Sep 2025, Improving MLLM Historical Record Extraction with Test-Time Image, https://arxiv.org/abs/2509.09722
  • Xianzhi Li, Ethan Callanan, Abdellah Ghassel, Xiaodan Zhu, 11 Sep 2025, Entropy-Gated Branching for Efficient Test-Time Reasoning, https://arxiv.org/abs/2503.21961
  • Xinyu Luo, Kecheng Chen, Pao-Sheng Vincent Sun, Chris Xing Tian, Arindam Basu, Haoliang Li, 19 Sep 2025, SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks, https://arxiv.org/abs/2504.02298
  • Mukai Li and Linfeng Song and Zhenwen Liang and Jiahao Xu and Shansan Gong and Qi Liu and Haitao Mi and Dong Yu, 16 Sep 2025, EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving, https://arxiv.org/abs/2509.12603
  • Utkarsh Singhal, Ryan Feng, Stella X. Yu, Atul Prakash, 15 Sep 2025, Test-Time Canonicalization by Foundation Models for Robust Perception, https://arxiv.org/abs/2507.10375
  • Nikita Rajaneesh, Thomas Zollo, Richard Zemel, 12 Sep 2025, Test-Time Warmup for Multimodal Large Language Models, https://arxiv.org/abs/2509.10641
  • Muhammad H. Ashiq, Peter Triantafillou, Hung Yun Tseng, Grigoris G. Chrysos, 15 Sep 2025, Inducing Uncertainty for Test-Time Privacy, https://arxiv.org/abs/2509.11625
  • Zhicheng Lin, Xiaolin Wu, Xi Zhang, 17 Sep 2025, Class-invariant Test-Time Augmentation for Domain Generalization, https://arxiv.org/abs/2509.14420
  • Bingxuan Li, Yiwei Wang, Jiuxiang Gu, Kai-Wei Chang, Nanyun Peng, 17 Sep 2025, METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling, https://arxiv.org/abs/2502.17651
  • Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard, 18 Sep 2025, Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering, https://arxiv.org/abs/2411.12590
  • Xingshuai Huang, Di Wu, Benoit Boulet, 17 Sep 2025, DRDT3: Diffusion-Refined Decision Test-Time Training Model, https://arxiv.org/abs/2501.06718

Research on Scaling Laws

Research on the traditional scaling laws of model size and training data:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: