Aussie AI

Test Time Compute

  • Last Updated 17 November, 2025
  • by David Spuler, Ph.D.

What is Test Time Compute?

Test time compute is the idea of making an LLM smarter by letting it spend more compute time thinking at inference time. This was generally used to improve reasoning abilities in LLMs by having them perform multiple steps of inference. However, the general idea can also apply to single-step reasoning that generated longer outputs, thereby spending more time on inference, which is called "implicit reasoning".

Research on Test Time Compute

Research papers include:

  • Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
  • Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
  • mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
  • Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
  • Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
  • Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
  • Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
  • Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
  • Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
  • Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
  • Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute, https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
  • Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
  • Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng, 23 Jan 2025, Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, https://arxiv.org/abs/2501.13926 https://github.com/ZiyuGuo99/Image-Generation-CoT
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
  • Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
  • Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
  • Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, 20 Feb 2025, S*: Test Time Scaling for Code Generation, https://arxiv.org/abs/2502.14382 https://github.com/NovaSky-AI/SkyThought
  • Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
  • Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu, 17 Feb 2025, Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? https://arxiv.org/abs/2502.12215
  • Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://arxiv.org/abs/2502.12521
  • Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
  • Maxwell Zeff, February 24, 2025, Anthropic launches a new AI model that ‘thinks’ as long as you want, https://techcrunch.com/2025/02/24/anthropic-launches-a-new-ai-model-that-thinks-as-long-as-you-want/
  • Kif Leswing, Feb 26 2025, Nvidia CEO Huang says AI has to do ’100 times more’ computation now than when ChatGPT was released, https://www.cnbc.com/2025/02/26/nvidia-ceo-huang-says-next-generation-ai-will-need-more-compute.html (The thesis that AI reasoning will need 100 times more compute, regardless of whether it is a single-step "long answers" model thinking out loud, or a multi-step test time compute model.)
  • Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu, 25 Feb 2025 (v2), From System 1 to System 2: A Survey of Reasoning Large Language Models, https://arxiv.org/abs/2502.17419
  • Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei, 25 Feb 2025, Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning, https://arxiv.org/abs/2502.18080 (Trying to generate the "shortest correct response" by examining the lengths needed for CoT.)
  • Juntai Cao, Xiang Zhang, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini, 27 Feb 2025, Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing, https://arxiv.org/abs/2502.20592 (Test time computed applied to the multi-document summarization use case.)
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Supreeth Koundinya, March 10, 2025, Manus is a Wrapper of Anthropic’s Claude, and It’s Okay, https://analyticsindiamag.com/ai-features/manus-is-a-wrapper-of-anthropics-claude-and-its-okay/ (“Manus didn’t just slap an API on a model. They built an autonomous system that can execute deep research, deep thinking, and multi-step tasks in a way that no other AI have.”)
  • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
  • Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi, 20 Feb 2025 (v2), Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/abs/2502.01839 (Wrapping a single model with a Best-of-N approach that self-selects the best answer can significantly improve reasoning rates.)
  • Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal, 13 Mar 2025, Thinking Machines: A Survey of LLM based Reasoning Strategies, https://arxiv.org/abs/2503.10814
  • Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan, 16 May 2025, Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory, https://arxiv.org/abs/2505.10981
  • Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He, 8 May 2025 (v2), A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law, https://arxiv.org/abs/2505.02665
  • Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen, 6 Jun 2025 (v2), Kinetics: Rethinking Test-Time Scaling Laws, https://arxiv.org/abs/2506.05333
  • Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty, 23 May 2025, First Finish Search: Efficient Test-Time Scaling in Large Language Models, https://arxiv.org/abs/2505.18149 (Running multiple parallel decoding steps but stopping when the fastest and usually shortest one completes.)
  • Michael Nuñez, July 22, 2025, Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber, https://venturebeat.com/ai/anthropic-researchers-discover-the-weird-ai-problem-why-thinking-longer-makes-models-dumber/
  • Sebastian Raschka, Mar 8, 2025, Inference-Time Compute Scaling Methods to Improve Reasoning Models: Part 1: Inference-Time Compute Scaling Methods, https://sebastianraschka.com/blog/2025/state-of-llm-reasoning-and-inference-scaling.html
  • Yafu Li, Xuyang Hu, Xiaoye Qu, Linjie Li, Yu Cheng, 22 Jan 2025, Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback, https://arxiv.org/abs/2501.12895 https://github.com/yafuly/TPO
  • Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou, 10 Feb 2025, Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling, https://arxiv.org/abs/2502.06703
  • Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://www.arxiv.org/abs/2502.12521
  • Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang, 23 Feb 2025 (v2), Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking, https://arxiv.org/abs/2502.13842
  • Aryo Pradipta Gema, Alexander Hägele, Runjin Chen, Andy Arditi, Jacob Goldman-Wetzler, Kit Fraser-Taliente, Henry Sleight, Linda Petrini, Julian Michael, Beatrice Alex, Pasquale Minervini, Yanda Chen, Joe Benton, Ethan Perez, 19 Jul 2025, Inverse Scaling in Test-Time Compute, https://arxiv.org/abs/2507.14417
  • Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
  • Peter Wildeford, Aug 08, 2025, GPT-5: a small step for intelligence, a giant leap for normal people: GPT-5 focuses on where the money is - everyday users, not AI elites, https://peterwildeford.substack.com/p/gpt-5-a-small-step-for-intelligence
  • J. Pablo Mu\~noz and Jinjie Yuan, 7 Aug 2025, RTTC: Reward-Guided Collaborative Test-Time Compute, https://arxiv.org/abs/2508.10024
  • Guojun Wu, 19 Jul 2025, It's Not That Simple. An Analysis of Simple Test-Time Scaling, https://arxiv.org/abs/2507.14419
  • Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou, 21 Jul 2025, Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning, https://arxiv.org/abs/2505.16122
  • Mrinal Mathur, Mike Doan, Barak Pearlmutter, Sergey Plis, 17 Jul 2025, Change of Thought: Adaptive Test-Time Computation, https://arxiv.org/abs/2507.13569
  • Weichuang Shao, Iman Yi Liao, Tomas Henrique Bode Maul, and Tissa Chandesa, 21 Jul 2025, An Investigation of Test-time Adaptation for Audio Classification under Background Noise, https://arxiv.org/abs/2507.15523
  • Wooseong Jeong, Jegyeong Cho, Youngho Yoon, Kuk-Jin Yoon, 21 Jul 2025, Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training, https://arxiv.org/abs/2507.07778
  • Simon Ouellette, 17 Jul 2025, Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning, https://arxiv.org/abs/2507.15877
  • Tong Wu, Chong Xiang, Jiachen T. Wang, Weichen Yu, Chawin Sitawarin, Vikash Sehwag, Prateek Mittal, 21 Jul 2025, Does More Inference-Time Compute Really Help Robustness?, https://arxiv.org/abs/2507.15974
  • Matthias Otth, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 24 Jul 2025, Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning, https://arxiv.org/abs/2507.18122
  • Hao Luan, Yi Xian Goh, See-Kiong Ng, Chun Kai Ling, 14 Aug 2025, Projected Coupled Diffusion for Test-Time Constrained Joint Generation, https://arxiv.org/abs/2508.10531
  • Xingwu Chen, Miao Lu, Beining Wu, Difan Zou, 11 Aug 2025, Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression, https://arxiv.org/abs/2508.07571
  • Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Wanli Ouyang, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, Hao Zhou, 8 Aug 2025, AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model, https://arxiv.org/abs/2507.08920
  • Yufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin, Xiaofeng Zou, Cen Chen, Huiping Zhuang, 27 Jul 2025, Analytic Continual Test-Time Adaptation for Multi-Modality Corruption, https://arxiv.org/abs/2410.22373
  • Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He, 29 Jul 2025, Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning, https://arxiv.org/abs/2507.21494
  • Trae Research Team: Pengfei Gao, Zhao Tian, Xiangxin Meng, Xinchen Wang, Ruida Hu, Yuanan Xiao, Yizhou Liu, Zhao Zhang, Junjie Chen, Cuiyun Gao, Yun Lin, Yingfei Xiong, Chao Peng, Xia Liu, 31 Jul 2025, Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling, https://arxiv.org/abs/2507.23370
  • Mohammad Abdul Hafeez Khan, Yash Jain, Siddhartha Bhattacharyya and Vibhav Vineet, 22 Jul 2025, Test-time Prompt Refinement for Text-to-Image Models, https://arxiv.org/abs/2507.22076
  • Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
  • Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag, 29 Jul 2025, Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation, https://arxiv.org/abs/2403.19776
  • Jiale Zhou, Wenhan Wang, Shikun Li, Xiaolei Qu, Xin Guo, Yizhong Liu, Wenzhong Tang, Xun Lin, Yefeng Zheng, 1 Aug 2025, TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation, https://arxiv.org/abs/2508.00442
  • Irene Iele, Francesco Di Feola, Valerio Guarrasi, Paolo Soda, 1 Aug 2025, Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation, https://arxiv.org/abs/2508.00766
  • Zixian Su, Jingwei Guo, Xi Yang, Qiufeng Wang, Kaizhu Huang, 1 Aug 2025, Un-mixing Test-time Adaptation under Heterogeneous Data Streams, https://arxiv.org/abs/2411.15173
  • Fali Wang, Hui Liu, Zhenwei Dai, Jingying Zeng, Zhiwei Zhang, Zongyu Wu, Chen Luo, Zhen Li, Xianfeng Tang, Qi He, Suhang Wang, 26 Jul 2025, AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks, https://arxiv.org/abs/2508.00890
  • Chenxu Yang, Qingyi Si, Mz Dai, Dingyu Yao, Mingyu Zheng, Minghui Chen, Zheng Lin, Weiping Wang, 4 Aug 2025, Test-time Prompt Intervention, https://arxiv.org/abs/2508.02511
  • Xinyu Chen, Haotian Zhai, Can Zhang, Xiupeng Shi, Ruirui Li, 2 Aug 2025, Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models, https://arxiv.org/abs/2508.01225
  • Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty, 3 Aug 2025, Test-Time Training for Speech Enhancement, https://arxiv.org/abs/2508.01847
  • Zhonghao Shi, Xuan Shi, Anfeng Xu, Tiantian Feng, Harshvardhan Srivastava, Shrikanth Narayanan, Maja J. Matari\'c, 3 Aug 2025, Examining Test-Time Adaptation for Personalized Child Speech Recognition, https://arxiv.org/abs/2409.13095
  • Zhende Song, Shengji Tang, Peng Ye, Jiayuan Fan, Tao Chen, 5 Aug 2025, CTTS: Collective Test-Time Scaling, https://arxiv.org/abs/2508.03333
  • Xinlei Yu, Zhangquan Chen, Yudong Zhang, Shilin Lu, Ruolin Shen, Jiangning Zhang, Xiaobin Hu, Yanwei Fu, Shuicheng Yan, 5 Aug 2025, Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling, https://arxiv.org/abs/2508.03404
  • Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen, 7 Aug 2025, Test-Time Reinforcement Learning for GUI Grounding via Region Consistency, https://arxiv.org/abs/2508.05615
  • Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong, 8 Aug 2025, ETA: Energy-based Test-time Adaptation for Depth Completion, https://arxiv.org/abs/2508.05989
  • Orion Weller and Kathryn Ricci and Eugene Yang and Andrew Yates and Dawn Lawrie and Benjamin Van Durme, 8 Aug 2025, Rank1: Test-Time Compute for Reranking in Information Retrieval, https://arxiv.org/abs/2502.18418
  • Peter Phan, Dhruv Agarwal, Kavitha Srinivas, Horst Samulowitz, Pavan Kapanipathi, Andrew McCallum, 12 Aug 2025, MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time, https://arxiv.org/abs/2508.08641
  • Sameer Ambekar, Daniel M. Lang, Julia A. Schnabel, 11 Aug 2025, Hierarchical Adaptive networks with Task vectors for Test-Time Adaptation, https://arxiv.org/abs/2508.09223
  • Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, Zeynep Akata, 13 Aug 2025, Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models, https://arxiv.org/abs/2508.09968
  • Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu, 12 Aug 2025, AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation, https://arxiv.org/abs/2504.07532
  • Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao, 15 Aug 2025, ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism, https://arxiv.org/abs/2508.11356
  • Dongjae Jeon, Taeheon Kim, Seongwon Cho, Minhyuk Seo, Jonghyun Choi, 18 Aug 2025, TTA-DAME: Test-Time Adaptation with Domain Augmentation and Model Ensemble for Dynamic Driving Conditions, https://arxiv.org/abs/2508.12690
  • Can Jin, Hongwu Peng, Qixin Zhang, Yujin Tang, Dimitris N. Metaxas, Tong Che, 19 Aug 2025, Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning, https://arxiv.org/abs/2504.09772
  • Amirmohsen Sattarifard, Sepehr Lavasani, Ehsan Imani, Kunlin Zhang, Hanlin Xu, Fengyu Sun, Negar Hassanpour, Chao Gao, 19 Aug 2025, GLASS: Test-Time Acceleration for LLMs via Global-Local Neural Importance Aggregation, https://arxiv.org/abs/2508.14302
  • Shansong Wang and Mojtaba Safari and Mingzhe Hu and Qiang Li and Chih-Wei Chang and Richard LJ Qiu and Xiaofeng Yang, 20 Aug 2025, DINOv3 with Test-Time Training for Medical Image Registration, https://arxiv.org/abs/2508.14809
  • Mandeep Rathee, Venktesh V, Sean MacAvaney, Avishek Anand, 21 Aug 2025, Test-time Corpus Feedback: From Retrieval to RAG, https://arxiv.org/abs/2508.15437
  • Youjia Zhang, Youngeun Kim, Young-Geun Choi, Hongyeob Kim, Huiling Liu, Sungeun Hong, 21 Aug 2025, Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment, https://arxiv.org/abs/2508.15568
  • Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
  • Ivan Rodkin, Daniil Orel, Konstantin Smirnov, Arman Bolatov, Bilal Elbouardi, Besher Hassan, Yuri Kuratov, Aydar Bulatov, Preslav Nakov, Timothy Baldwin, Artem Shelmanov and Mikhail Burtsev, 22 Aug 2025, Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling, https://arxiv.org/abs/2508.16745
  • V Venktesh, Mandeep rathee and Avishek Anand, 20 Aug 2025, Trust but Verify! A Survey on Verification Design for Test-time Scaling, https://arxiv.org/abs/2508.16665
  • Hung-Chun Hsu, Yuan-Ching Kuo, Chao-Han Huck Yang, Szu-Wei Fu, Hanrong Ye, Hongxu Yin, Yu-Chiang Frank Wang, Ming-Feng Tsai, Chuan-Ju Wang, 25 Aug 2025, Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations, https://arxiv.org/abs/2508.18132
  • Jeremy Berman, Sep 17, 2025, How I got the highest score on ARC-AGI again swapping Python for English: Using Multi-Agent Collaboration with Evolutionary Test-Time Compute, https://jeremyberman.substack.com/p/how-i-got-the-highest-score-on-arc-agi-again (Generates multiple solutions then prunes them with "evolution" and iterates in multi-step inference.)
  • Davide Paglieri, Bart{\l}omiej Cupia{\l}, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rockt\"aschel, 3 Sep 2025, Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents, https://arxiv.org/abs/2509.03581
  • Minjong Yoo, Jinwoo Jang, Sihyung Yoon, Honguk Woo, 4 Sep 2025, World Model Implanting for Test-time Adaptation of Embodied Agents, https://arxiv.org/abs/2509.03956
  • Isidoro Tamassia and Wendelin B\"ohmer, 4 Sep 2025, Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes, https://arxiv.org/abs/2509.04317
  • Yuchen Jiao, Yuxin Chen, Gen Li, 4 Sep 2025, Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology, https://arxiv.org/abs/2509.04372
  • Jie Chen, Jinhao Jiang, Yingqian Min, Zican Dong, Shijie Wang, Wayne Xin Zhao, Ji-Rong Wen, 5 Sep 2025, Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework, https://arxiv.org/abs/2509.05007
  • Shuaicheng Niu, Guohao Chen, Deyu Chen, Yifan Zhang, Jiaxiang Wu, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Chunyan Miao, Mingkui Tan, 5 Sep 2025, Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization, https://arxiv.org/abs/2509.04977
  • Shengyin Sun and Yiming Li and Xing Li and Yingzhao Lian and Weizhe Lin and Hui-Ling Zhen and Zhiyuan Yang and Chen Chen and Xianzhi Yu and Mingxuan Yuan and Chen Ma, 30 Aug 2025, Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling, https://arxiv.org/abs/2509.04474
  • Hao Wen, Yifan Su, Feifei Zhang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li, 30 Aug 2025, ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute, https://arxiv.org/abs/2509.04475
  • Hang Wu, Hongkai Chen, Yujun Cai, Chang Liu, Qingwen Ye, Ming-Hsuan Yang, Yiwei Wang, 5 Sep 2025, DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning, https://arxiv.org/abs/2507.00008
  • Byung-Joon Lee, Jin-Seop Lee, Jee-Hyong Lee, 26 Aug 2025, Stabilizing Open-Set Test-Time Adaptation via Primary-Auxiliary Filtering and Knowledge-Integrated Prediction, https://arxiv.org/abs/2508.18751
  • Mingkui Tan, Guohao Chen, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Peilin Zhao, Shuaicheng Niu, 26 Aug 2025, Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting, https://arxiv.org/abs/2403.11491
  • Ramya Keerthy Thatikonda, Wray Buntine, Ehsan Shareghi, 27 Aug 2025, Logical Reasoning with Outcome Reward Models for Test-Time Scaling, https://arxiv.org/abs/2508.19903
  • Lijun Sheng, Jian Liang, Zilei Wang, Ran He, 27 Aug 2025, R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning, https://arxiv.org/abs/2504.11195
  • Hao Mark Chen, Zhiwen Mo, Guanxi Lu, Shuang Liang, Lingxiao Ma, Wayne Luk, Hongxiang Fan, 29 Aug 2025, Democratizing Agentic AI with Fast Test-Time Scaling on the Edge, https://arxiv.org/abs/2509.00195
  • Sachin Goyal, David Lopez-Paz, Kartik Ahuja, 1 Sep 2025, Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling, https://arxiv.org/abs/2509.01649
  • Jintao Cheng, Weibin Li, Jiehao Luo, Xiaoyu Tang, Zhijian He, Jin Wu, Yao Zou, Wei Zhang, 2 Sep 2025, Scale, Don't Fine-tune: Guiding Multimodal LLMs for Efficient Visual Place Recognition at Test-Time, https://arxiv.org/abs/2509.02129
  • Jiefeng Chen, Jie Ren, Xinyun Chen, Chengrun Yang, Ruoxi Sun, Jinsung Yoon, Sercan \"O Ar{\i}k, 30 Aug 2025, SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling, https://arxiv.org/abs/2501.19306
  • Hritik Arasu, Faisal R Jahangiri, 3 Sep 2025, StableSleep: Source-Free Test-Time Adaptation for Sleep Staging with Lightweight Safety Rails, https://arxiv.org/abs/2509.02982
  • James Xu Zhao, Bryan Hooi, See-Kiong Ng, 8 Sep 2025, Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet, https://arxiv.org/abs/2509.06861
  • Fin Amin and Jung-Eun Kim, 7 Sep 2025, The Over-Certainty Phenomenon in Modern Test-Time Adaptation Algorithms, https://arxiv.org/abs/2404.16168
  • Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao, 6 Sep 2025, M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models, https://arxiv.org/abs/2504.10449
  • Vignav Ramesh, Morteza Mardani, 6 Sep 2025, Test-Time Scaling of Diffusion Models via Noise Trajectory Search, https://arxiv.org/abs/2506.03164
  • Jenny Y. Huang, Mehul Damani, Yousef El-Kurdi, Ramon Astudillo, Wei Sun, 11 Sep 2025, Latency and Token-Aware Test-Time Compute, https://arxiv.org/abs/2509.09864
  • Taylor Archibald and Tony Martinez, 10 Sep 2025, Improving MLLM Historical Record Extraction with Test-Time Image, https://arxiv.org/abs/2509.09722
  • Xianzhi Li, Ethan Callanan, Abdellah Ghassel, Xiaodan Zhu, 11 Sep 2025, Entropy-Gated Branching for Efficient Test-Time Reasoning, https://arxiv.org/abs/2503.21961
  • Xinyu Luo, Kecheng Chen, Pao-Sheng Vincent Sun, Chris Xing Tian, Arindam Basu, Haoliang Li, 19 Sep 2025, SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks, https://arxiv.org/abs/2504.02298
  • Mukai Li and Linfeng Song and Zhenwen Liang and Jiahao Xu and Shansan Gong and Qi Liu and Haitao Mi and Dong Yu, 16 Sep 2025, EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving, https://arxiv.org/abs/2509.12603
  • Utkarsh Singhal, Ryan Feng, Stella X. Yu, Atul Prakash, 15 Sep 2025, Test-Time Canonicalization by Foundation Models for Robust Perception, https://arxiv.org/abs/2507.10375
  • Nikita Rajaneesh, Thomas Zollo, Richard Zemel, 12 Sep 2025, Test-Time Warmup for Multimodal Large Language Models, https://arxiv.org/abs/2509.10641
  • Muhammad H. Ashiq, Peter Triantafillou, Hung Yun Tseng, Grigoris G. Chrysos, 15 Sep 2025, Inducing Uncertainty for Test-Time Privacy, https://arxiv.org/abs/2509.11625
  • Zhicheng Lin, Xiaolin Wu, Xi Zhang, 17 Sep 2025, Class-invariant Test-Time Augmentation for Domain Generalization, https://arxiv.org/abs/2509.14420
  • Bingxuan Li, Yiwei Wang, Jiuxiang Gu, Kai-Wei Chang, Nanyun Peng, 17 Sep 2025, METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling, https://arxiv.org/abs/2502.17651
  • Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard, 18 Sep 2025, Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering, https://arxiv.org/abs/2411.12590
  • Xingshuai Huang, Di Wu, Benoit Boulet, 17 Sep 2025, DRDT3: Diffusion-Refined Decision Test-Time Training Model, https://arxiv.org/abs/2501.06718
  • Yifei Zuo, Yutong Yin, Zhichen Zeng, Ang Li, Banghua Zhu, Zhaoran Wang, 1 Oct 2025, Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression, https://arxiv.org/abs/2510.01450
  • Litu Rout, Andreas Lugmayr, Yasamin Jafarian, Srivatsan Varadharajan, Constantine Caramanis, Sanjay Shakkottai, Ira Kemelmacher-Shlizerman, 2 Oct 2025, Test-Time Anchoring for Discrete Diffusion Posterior Sampling, https://arxiv.org/abs/2510.02291
  • Yongchao Chen, Jiefeng Chen, Rui Meng, Ji Yin, Na Li, Chuchu Fan, Chi Wang, Tomas Pfister, Jinsung Yoon, 30 Sep 2025, TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture, https://arxiv.org/abs/2510.01279
  • Dong Bok Lee, Seanie Lee, Sangwoo Park, Minki Kang, Jinheon Baek, Dongki Kim, Dominik Wagner, Jiongdao Jin, Heejun Lee, Tobias Bocklet, Jinyu Wang, Jingjing Fu, Sung Ju Hwang, Jiang Bian, Lei Song, 2 Oct 2025, Rethinking Reward Models for Multi-Domain Test-Time Scaling, https://arxiv.org/abs/2510.00492
  • Suli Wang, Yangshen Deng, Zhenghua Bao, Xinyu Zhan, Yiqun Duan, 1 Oct 2025, NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training, https://arxiv.org/abs/2509.26301
  • Junsoo Oh, Wei Huang, Taiji Suzuki, 14 Oct 2025, Mamaba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning, https://arxiv.org/abs/2510.12026
  • Tomas Ruiz, Siyao Peng, Barbara Plank, Carsten Schwemmer, 14 Oct 2025, BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet), https://arxiv.org/abs/2510.12516
  • Dadi Guo, Tianyi Zhou, Dongrui Liu, Chen Qian, Qihan Ren, Shuai Shao, Zhiyuan Fan, Yi R. Fung, Kun Wang, Linfeng Zhang, Jing Shao, 1 Oct 2025, Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm, https://arxiv.org/abs/2510.00415
  • Yoonju Sim, Hyeonah Kim, Changhyun Kwon, 1 Oct 2025, Test-Time Search in Neural Graph Coarsening Procedures for the Capacitated Vehicle Routing Problem, https://arxiv.org/abs/2510.00958
  • Jiahang Cao, Yize Huang, Hanzhong Guo, Rui Zhang, Mu Nan, Weijian Mai, Jiaxu Wang, Hao Cheng, Jingkai Sun, Gang Han, Wen Zhao, Qiang Zhang, Yijie Guo, Qihao Zheng, Chunfeng Song, Xiao Li, Ping Luo, Andrew F. Luo, 1 Oct 2025, Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition, https://arxiv.org/abs/2510.01068
  • Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi, Soujanya Poria, 1 Oct 2025, Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned, https://arxiv.org/abs/2509.23250
  • Jonathan Geuter and Youssef Mroueh and David Alvarez-Melis, 30 Sep 2025, Guided Speculative Inference for Efficient Test-Time Alignment of LLMs, https://arxiv.org/abs/2506.04118
  • Tong Nie, Yuewen Mei, Yihong Tang, Junlin He, Jie Sun, Haotian Shi, Wei Ma, Jian Sun, 24 Sep 2025, Steerable Adversarial Scenario Generation through Test-Time Preference Alignment, https://arxiv.org/abs/2509.20102
  • Felipe Oviedo, Fiodar Kazhamiaka, Esha Choukse, Allen Kim, Amy Luers, Melanie Nakagawa, Ricardo Bianchini, Juan M. Lavista Ferres, 24 Sep 2025, Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute, https://arxiv.org/abs/2509.20241
  • Youpeng Zhao, Jinpeng LV, Di Wu, Jun Wang, Christopher Gooley, 23 Sep 2025, Are We Scaling the Right Thing? A System Perspective on Test-Time Scaling, https://arxiv.org/abs/2509.19645
  • Prateek Verma, Mert Pilanci, 24 Sep 2025, Thinking While Listening: Simple Test Time Scaling For Audio Classification, https://arxiv.org/abs/2509.19676
  • Laura Mismetti, Marvin Alberts, Andreas Krause, Mara Graziani, 27 Oct 2025, Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra, https://arxiv.org/abs/2510.23746
  • Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks, 27 Oct 2025, Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment, https://arxiv.org/abs/2510.05024
  • Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou, 28 Oct 2025, Provable Scaling Laws for the Test-Time Compute of Large Language Models, https://arxiv.org/abs/2411.19477
  • Litu Ou, Kuan Li, Huifeng Yin, Liwen Zhang, Zhongwang Zhang, Xixi Wu, Rui Ye, Zile Qiao, Pengjun Xie, Jingren Zhou, Yong Jiang, 28 Oct 2025, BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents, https://arxiv.org/abs/2510.23458
  • Yunjiang Jiang, Ayush Agarwal, Yang Liu, Bi Xue, 27 Oct 2025, LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling, https://arxiv.org/abs/2510.18239
  • Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Yifu Lu, Mengdi Wang, Dinesh Manocha, Furong Huang, Mohammad Ghavamzadeh, Amrit Singh Bedi, 23 Oct 2025, Does Thinking More always Help? Mirage of Test-Time Scaling in Reasoning Models, https://arxiv.org/abs/2506.04210
  • Paula Cordero-Encinar, Andrew B. Duncan, 23 Oct 2025, Certified Self-Consistency: Statistical Guarantees and Test-Time Training for Reliable Reasoning in LLMs, https://arxiv.org/abs/2510.17472
  • Avrim Blum, Daniel Hsu, Cyrus Rashtchian, Donya Saless, 18 Oct 2025, Prior Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods, https://arxiv.org/abs/2510.16609
  • Dong Li, Xujiang Zhao, Linlin Yu, Yanchi Liu, Wei Cheng, Zhengzhang Chen, Zhong Chen, Feng Chen, Chen Zhao, Haifeng Chen, 19 Oct 2025, SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search, https://arxiv.org/abs/2510.16916
  • Adam Stecklov, Noah El Rimawi-Fine, Mathieu Blanchette, 20 Oct 2025, Inference-Time Compute Scaling For Flow Matching, https://arxiv.org/abs/2510.17786
  • Delaram Pirhayati, Arlei Silva, 19 Oct 2025, Cross-Domain Graph Anomaly Detection via Test-Time Training with Homophily-Guided Self-Supervision, https://arxiv.org/abs/2502.14293
  • Xinglin Wang, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Yueqi Zhang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li, 20 Oct 2025, Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling, https://arxiv.org/abs/2506.15707
  • Kang-il Lee, Jahyun Koo, Seunghyun Yoon, Minbeom Kim, Hyukhun Koh, Dongryeol Lee, Kyomin Jung, 22 Sep 2025, Program Synthesis via Test-Time Transduction, https://arxiv.org/abs/2509.17393
  • Zongqian Wu, Baoduo Xu, Tianyu Li, Zhu Sun, Xiaofeng Zhu, Lei Feng, 22 Sep 2025, Mitigating Strategy-Selection Bias in Reasoning for More Effective Test-Time Scaling, https://arxiv.org/abs/2509.17905
  • Chung-En (Johnny) Yu, Brian Jalaian, Nathaniel D. Bastian, 19 Sep 2025, Agentic Reasoning for Robust Vision Systems via Increased Test-Time Compute, https://arxiv.org/abs/2509.16343
  • Sanjay Basu, Sadiq Y. Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji, 19 Sep 2025, Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management, https://arxiv.org/abs/2509.16291
  • Morgan Thomas, Albert Bou, Gianni De Fabritiis, 22 Sep 2025, Test-Time Training Scaling Laws for Chemical Exploration in Drug Design, https://arxiv.org/abs/2501.19153
  • Yuwei Niu, Shuo He, Qi Wei, Zongyu Wu, Feng Liu, Lei Feng, 22 Sep 2025, Test-Time Multimodal Backdoor Detection by Contrastive Prompting, https://arxiv.org/abs/2405.15269
  • Keyu Wang, Tian Lyu, Guinan Su, Jonas Geiping, Lu Yin, Marco Canini, Shiwei Liu, 25 Oct 2025, When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs, https://arxiv.org/abs/2510.22228
  • Yan Jiang, Ruihong Qiu, Zi Huang, 25 Oct 2025, Does Homophily Help in Robust Test-time Node Classification?, https://arxiv.org/abs/2510.22289
  • Wenxuan Bao, Ruxi Deng, Jingrui He, 25 Oct 2025, Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions, https://arxiv.org/abs/2510.22127
  • Tianyi Ma, Tengyao Wang, Richard J. Samworth, 27 Oct 2025, Provable test-time adaptivity and distributional robustness of in-context learning, https://arxiv.org/abs/2510.23254
  • Sarthak Kumar Maharana, Saksham Singh Kushwaha, Baoming Zhang, Adrian Rodriguez, Songtao Wei, Yapeng Tian, Yunhui Guo, 24 Oct 2025, $\texttt{AVROBUSTBENCH}$: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time, https://arxiv.org/abs/2506.00358
  • Yufei He, Juncheng Liu, Yue Liu, Yibo Li, Tri Cao, Zhiyuan Hu, Xinxing Xu, Bryan Hooi, 15 Oct 2025, EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems, https://arxiv.org/abs/2510.13220
  • Yang Yang, Severi Rissanen, Paul E. Chang, Nasrulloh Loka, Daolang Huang, Arno Solin, Markus Heinonen, Luigi Acerbi, 15 Oct 2025, PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference, https://arxiv.org/abs/2510.13763
  • Yiming Wang, Pei Zhang, Siyuan Huang, Baosong Yang, Zhuosheng Zhang, Fei Huang, Rui Wang, 15 Oct 2025, Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding, https://arxiv.org/abs/2503.01422
  • Ziyun Liang, Xiaoqing Guo, Wentian Xu, Yasin Ibrahim, Natalie Voets, Pieter M Pretorius, J. Alison Noble, Konstantinos Kamnitsas, 15 Oct 2025, IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR, https://arxiv.org/abs/2504.04911
  • Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, Changqing Zhang, 26 Sep 2025, DOTA: Distributional Test-Time Adaptation of Vision-Language Models, https://arxiv.org/abs/2409.19375
  • Mert Kayaalp, Caner Turkmen, Oleksandr Shchur, Pedro Mercado, Abdul Fatir Ansari, Michael Bohlke-Schneider, Bernie Wang, 7 Oct 2025, Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting, https://arxiv.org/abs/2510.06419
  • Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski, 8 Oct 2025, Test-Time Graph Search for Goal-Conditioned Reinforcement Learning, https://arxiv.org/abs/2510.07257
  • Wen Ye, Zhaocheng Liu, Yuwei Gui, Tingyu Yuan, Yunyue Su, Bowen Fang, Chaoyang Zhao, Qiang Liu, Liang Wang, 8 Oct 2025, GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation, https://arxiv.org/abs/2510.07217
  • Xiaogeng Liu, Chaowei Xiao, 8 Oct 2025, AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling, https://arxiv.org/abs/2510.05379
  • Yuheng Wu, Azalia Mirhoseini, Thierry Tambe, 2 Oct 2025, On the Role of Temperature Sampling in Test-Time Scaling, https://arxiv.org/abs/2510.02611
  • Dong Lao, Yuxiang Zhang, Haniyeh Ehsani Oskouie, Yangchao Wu, Alex Wong, Stefano Soatto, 3 Oct 2025, Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles, https://arxiv.org/abs/2510.03224
  • Chenwei Tang, Jingyu Xing, Xinyu Liu, Wei Ju, Jiancheng Lv, Deng Xiong, Ziyue Qiao, 20 Oct 2025, Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning, https://arxiv.org/abs/2510.17923
  • Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu, 21 Oct 2025, LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling, https://arxiv.org/abs/2505.19187
  • Duke Nguyen, Aditya Joshi, Flora Salim, 21 Oct 2025, Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English, https://arxiv.org/abs/2503.12858
  • Theo Uscidda and Matthew Trager and Michael Kleinman and Aditya Chattopadhyay and Wei Xia and Stefano Soatto, 16 Sep 2025, LATTS: Locally Adaptive Test-Time Scaling, https://arxiv.org/abs/2509.20368
  • Junpei Komiyama and Daisuke Oba and Masafumi Oyamada, 25 Sep 2025, Best-of-$\infty$ -- Asymptotic Performance of Test-Time Compute, https://arxiv.org/abs/2509.21091
  • Rajat Modi, Yogesh Singh Rawat, 25 Sep 2025, Asynchronous Perception Machine For Efficient Test-Time-Training, https://arxiv.org/abs/2410.20535
  • Haotian Zhang, Liu Liu, Baosheng Yu, Jiayan Qiu, Likang Xiao, Yanwei Ren, Quan Chen, Xianglong Liu, 29 Sep 2025, ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling, https://arxiv.org/abs/2509.24460
  • Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Yiwei Wang, Xiaodan Liang, Jing Tang, 27 Sep 2025, Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers, https://arxiv.org/abs/2509.23152
  • Guohao Chen, Shuaicheng Niu, Deyu Chen, Jiahao Yang, Zitian Zhang, Mingkui Tan, Pengcheng Wu, Zhiqi Shen, 27 Sep 2025, ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse, https://arxiv.org/abs/2509.23183
  • Bo Li, Xin Zheng, Ming Jin, Can Wang, Shirui Pan, 28 Sep 2025, Test-time GNN Model Evaluation on Dynamic Graphs, https://arxiv.org/abs/2509.23816
  • Jonas H\"ubotter, Patrik Wolf, Alexander Shevchenko, Dennis J\"uni, Andreas Krause, Gil Kur, 29 Sep 2025, Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models, https://arxiv.org/abs/2509.24510
  • Zikun Qu, Min Zhang, Mingze Kong, Xiang Li, Zhiwei Shang, Zhiyong Wang, Yikun Ban, Shuang Qiu, Yao Shu, Zhongxiang Dai, 29 Sep 2025, T-POP: Test-Time Personalization with Online Preference Feedback, https://arxiv.org/abs/2509.24696
  • Yapeng Mi, Hengli Li, Yanpeng Zhao, Chenxi Li, Huimin Wu, Xiaojian Ma, Song-Chun Zhu, Ying Nian Wu, Qing Li, 26 Sep 2025, MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning, https://arxiv.org/abs/2509.22761
  • Zixu Hao, Jianyu Wei, Tuowei Wang, Minxing Huang, Huiqiang Jiang, Shiqi Jiang, Ting Cao and Ju Ren, 27 Sep 2025, Scaling LLM Test-Time Compute with Mobile NPU on Smartphones, https://arxiv.org/abs/2509.23324
  • Gabriela Pinto, Palash Goyal, Yiwen Song, Souradip Chakraborty, Zifeng Wang, Tomas Pfister, Hamid Palangi, 26 Sep 2025, HEART: Emotionally-driven test-time scaling of Language Models, https://arxiv.org/abs/2509.22876
  • Yan Yang and Dongxu Li and Yutong Dai and Yuhao Yang and Ziyang Luo and Zirui Zhao and Zhiyuan Hu and Junzhe Huang and Amrita Saha and Zeyuan Chen and Ran Xu and Liyuan Pan and Silvio Savarese and Caiming Xiong and Junnan Li, 29 Sep 2025, GTA1: GUI Test-time Scaling Agent, https://arxiv.org/abs/2507.05791
  • Zihuan Qiu, Yi Xu, Chiyuan He, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li, 29 Sep 2025, MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging, https://arxiv.org/abs/2505.11883
  • Anh Bui, Trang Vu, Trung Le, Junae Kim, Tamas Abraham, Rollin Omari, Amar Kaur, Dinh Phung, 26 Sep 2025, Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment, https://arxiv.org/abs/2506.22685
  • Zilin Xiao, Jaywon Koo, Siru Ouyang, Jefferson Hernandez, Yu Meng, Vicente Ordonez, 27 Sep 2025, ProxyThinker: Test-Time Guidance through Small Visual Reasoners, https://arxiv.org/abs/2505.24872
  • Yung-Chen Tang, Pin-Yu Chen, Andrea Cavallaro, 17 Oct 2025, CarBoN: Calibrated Best-of-N Sampling Improves Test-time Reasoning, https://arxiv.org/abs/2510.15674
  • Jisu Han, Wonjun Hwang, 17 Oct 2025, D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models, https://arxiv.org/abs/2510.09473
  • Adel Javanmard, Baharan Mirzasoleiman, Vahab Mirrokni, 4 Oct 2025, Understanding the Role of Training Data in Test-Time Scaling, https://arxiv.org/abs/2510.03605
  • Chang'an Yi, Xiaohui Deng, Shuaicheng Niu, and Yan Zhou, 26 Sep 2025, POEM: Explore Unexplored Reliable Samples to Enhance Test-Time Adaptation, https://arxiv.org/abs/2510.03258
  • Mehmet Onurcan Kaya, Desmond Elliott, Dim P. Papadopoulos, 3 Oct 2025, Efficient Test-Time Scaling for Small Vision-Language Models, https://arxiv.org/abs/2510.03574
  • Jialin Liu, Lisang Ding, Stanley Osher, Wotao Yin, 4 Oct 2025, Implicit Models: Expressive Power Scales with Test-Time Compute, https://arxiv.org/abs/2510.03638
  • Behraj Khan, Tahir Qasim Syed, 4 Oct 2025, Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting, https://arxiv.org/abs/2510.03839
  • Chenlu Ding, Jiancan Wu, Leheng Sheng, Fan Zhang, Yancheng Yuan, Xiang Wang, Xiangnan He, 5 Oct 2025, MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering, https://arxiv.org/abs/2510.04217
  • Jonas H\"ubotter, Leander Diaz-Bone, Ido Hakimi, Andreas Krause, Moritz Hardt, 6 Oct 2025, Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning, https://arxiv.org/abs/2510.04786
  • Jihoon Lee, Hoyeon Moon, Kevin Zhai, Arun Kumar Chithanar, Anit Kumar Sahu, Soummya Kar, Chul Lee, Souradip Chakraborty, Amrit Singh Bedi, 6 Oct 2025, Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts, https://arxiv.org/abs/2510.05040
  • Wengao Ye, Yan Liang, Lianlei Shan, 5 Oct 2025, Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization, https://arxiv.org/abs/2510.04182
  • Daniel Tan, Anders Woodruff, Niels Warncke, Arun Jose, Maxime Rich\'e, David Demitri Africa, Mia Taylor, 5 Oct 2025, Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time, https://arxiv.org/abs/2510.04340
  • Runchu Tian, Junxia Cui, Xueqiang Xu, Feng Yao, Jingbo Shang, 6 Oct 2025, Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models, https://arxiv.org/abs/2510.05090
  • Xingjian Li, Qifeng Wu, Adithya S. Ubaradka, Yiran Ding, Colleen Que, Runmin Jiang, Jianhua Xing, Tianyang Wang, Min Xu, 5 Oct 2025, AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models, https://arxiv.org/abs/2505.17931
  • Gavriel Di Nepi, Federico Siciliano, Fabrizio Silvestri, 10 Oct 2025, Titans Revisited: A Lightweight Reimplementation and Critical Analysis of a Test-Time Memory Model, https://arxiv.org/abs/2510.09551
  • Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng, 10 Oct 2025, MATT-CTR: Unleashing a Model-Agnostic Test-Time Paradigm for CTR Prediction with Confidence-Guided Inference Paths, https://arxiv.org/abs/2510.08932
  • Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, Anshuman Chhabra, 4 Oct 2025, Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models, https://arxiv.org/abs/2510.08592
  • Sondos Mahmoud Bsharat and Zhiqiang Shen, 10 Oct 2025, Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation, https://arxiv.org/abs/2510.09599
  • Florian E. Dorner, Yatong Chen, Andr\'e F. Cruz, Fanny Yang, 10 Oct 2025, ROC-n-reroll: How verifier imperfection affects test-time scaling, https://arxiv.org/abs/2507.12399
  • Raul Cavalcante Dinardi, Bruno Yamamoto, Anna Helena Reali Costa and Artur Jordao, 24 Oct 2025, The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning, https://arxiv.org/abs/2510.21067
  • Hyeongyu Kim, Geonhui Han, Dosik Hwang, 24 Oct 2025, Buffer layers for Test-Time Adaptation, https://arxiv.org/abs/2510.21271
  • Yuichi Inoue, Kou Misaki, Yuki Imajuku, So Kuroki, Taishi Nakamura, Takuya Akiba, 24 Oct 2025, Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search, https://arxiv.org/abs/2503.04412
  • Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal, 24 Oct 2025, Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning, https://arxiv.org/abs/2507.06485
  • Xinyu Luo, Jie Liu, Kecheng Chen, Junyi Yang, Bo Ding, Arindam Basu, Haoliang Li, 13 Oct 2025, Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction, https://arxiv.org/abs/2510.11068
  • Yingnan Liu, Rui Qiao, Mong Li Lee, Wynne Hsu, 13 Oct 2025, Test-Time Adaptation by Causal Trimming, https://arxiv.org/abs/2510.11133
  • Jens Tuyls, Dylan J. Foster, Akshay Krishnamurthy, Jordan T. Ash, 13 Oct 2025, Representation-Based Exploration for Language Models: From Test-Time to Post-Training, https://arxiv.org/abs/2510.11686
  • Yijie Xu, Huizai Yao, Zhiyu Guo, Weiyu Guo, Pengteng Li, Aiwei Liu, Xuming Hu, Hui Xiong, 11 Oct 2025, You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs, https://arxiv.org/abs/2510.10223
  • Hongwei Chen, Yishu Lei, Dan Zhang, Bo Ke, Danxiang Zhu, Xuyi Chen, Yuxiang Lu, Zhengjie Huang, Shikun Feng, Jingzhou He, Yu Sun, Hua Wu, Haifeng Wang, 11 Oct 2025, MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning, https://arxiv.org/abs/2510.10293
  • Jiaqi Wei, Hao Zhou, Xiang Zhang, Di Zhang, Zijie Qiu, Wei Wei, Jinzhe Li, Wanli Ouyang, Siqi Sun, 11 Oct 2025, Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization, https://arxiv.org/abs/2504.14858
  • Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan, 13 Oct 2025, The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models, https://arxiv.org/abs/2506.24000
  • Christos Ziakas and Alessandra Russo, 12 Oct 2025, VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models, https://arxiv.org/abs/2506.10085
  • Yinglun Zhu, Jiancheng Zhang, Fuzhi Tang, 9 Oct 2025, Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models, https://arxiv.org/abs/2510.07632
  • Xiangwei Lv, JinLuan Yang, Wang Lin, Jingyuan Chen, Beishui Liao, 9 Oct 2025, From Noisy to Native: LLM-driven Graph Restoration for Test-Time Graph Domain Adaptation, https://arxiv.org/abs/2510.07762
  • Yeskendir Koishekenov, Aldo Lipani, Nicola Cancedda, 8 Oct 2025, Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts, https://arxiv.org/abs/2510.07358
  • Emre Can Acikgoz, Cheng Qian, Heng Ji, Dilek Hakkani-T\"ur, Gokhan Tur, 9 Oct 2025, Self-Improving LLM Agents at Test-Time, https://arxiv.org/abs/2510.07841
  • Runyang You, Yongqi Li, Meng Liu, Wenjie Wang, Liqiang Nie, Wenjie Li, 9 Oct 2025, Parallel Test-Time Scaling for Latent Reasoning Models, https://arxiv.org/abs/2510.07745
  • Leigang Qu, Ziyang Wang, Na Zheng, Wenjie Wang, Liqiang Nie, Tat-Seng Chua, 9 Oct 2025, TTOM: Test-Time Optimization and Memorization for Compositional Video Generation, https://arxiv.org/abs/2510.07940
  • Nathan Egbuna, Saatvik Gaur, Sunishchal Dev, Ashwinee Panda, Maheep Chaudhary, 10 Sep 2025, Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization, https://arxiv.org/abs/2509.18116
  • Arpan Mukherjee, Marcello Bullo, Debabrota Basu, Deniz G\"und\"uz, 21 Oct 2025, Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality, https://arxiv.org/abs/2510.18982
  • Yingqian Cui, Zhenwei Dai, Pengfei He, Bing He, Hui Liu, Xianfeng Tang, Jingying Zeng, Suhang Wang, Yue Xing, Jiliang Tang, Benoit Dumoulin, 29 Sep 2025, Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search, https://arxiv.org/abs/2509.25420
  • Tingyu Shi, Fan Lyu, Shaoliang Peng, 30 Sep 2025, Annotation-Efficient Active Test-Time Adaptation with Conformal Prediction, https://arxiv.org/abs/2509.25692
  • Jiacheng Shi, Hongfei Du, Y. Alicia Hong, Ye Gao, 29 Sep 2025, EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition, https://arxiv.org/abs/2509.25495
  • Tianlang Chen, Minkai Xu, Jure Leskovec, Stefano Ermon, 29 Sep 2025, RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance, https://arxiv.org/abs/2509.25604
  • Zhendong Tan, Xingjun Zhang, Chaoyi Hu, Yancheng Pan, Shaoxun Wang, 30 Sep 2025, Adaptive Rectification Sampling for Test-Time Compute Scaling, https://arxiv.org/abs/2504.01317
  • Zhangyue Yin, Qiushi Sun, Zhiyuan Zeng, Zhiyuan Yu, Qipeng Guo, Xuanjing Huang, Xipeng Qiu, 7 Oct 2025, ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models, https://arxiv.org/abs/2510.06014
  • Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He, 7 Oct 2025, Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification, https://arxiv.org/abs/2510.06135
  • Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He, 7 Oct 2025, TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning, https://arxiv.org/abs/2510.06217
  • Harshil Vejendla, 7 Oct 2025, LATTA: Langevin-Anchored Test-Time Adaptation for Enhanced Robustness and Stability, https://arxiv.org/abs/2510.05530
  • Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee, Abhirup Ghosh, 7 Oct 2025, NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering, https://arxiv.org/abs/2510.05635
  • Suhyeok Jang, Dongyoung Kim, Changyeon Kim, Youngsuk Kim, Jinwoo Shin, 7 Oct 2025, Verifier-free Test-Time Sampling for Vision Language Action Models, https://arxiv.org/abs/2510.05681
  • Wen-Kwang Tsao, Yao-Ching Yu, Chien-Ming Huang, 16 Oct 2025, Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improves Without Labels or Model Updates, https://arxiv.org/abs/2510.14900
  • Kyle Montgomery, Sijun Tan, Yuqi Chen, Siyuan Zhuang, Tianjun Zhang, Raluca Ada Popa, Chenguang Wang, 16 Oct 2025, Budget-aware Test-time Scaling via Discriminative Verification, https://arxiv.org/abs/2510.14913
  • Mehrzad Samadi, Aleksander Ficek, Sean Narenthiran, Siddhartha Jain, Wasi Uddin Ahmad, Somshubra Majumdar, Vahid Noroozi, Boris Ginsburg, 16 Oct 2025, Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models, https://arxiv.org/abs/2510.14232
  • Yue Hou, He Zhu, Ruomei Liu, Yingke Su, Junran Wu, Ke Xu, 16 Oct 2025, Redundancy-Aware Test-Time Graph Out-of-Distribution Detection, https://arxiv.org/abs/2510.14562
  • Zhichen Zeng, Qi Yu, Xiao Lin, Ruizhong Qiu, Xuying Ning, Tianxin Wei, Yuchen Yan, Jingrui He, Hanghang Tong, 12 Oct 2025, Harnessing Consistency for Robust Test-Time LLM Ensemble, https://arxiv.org/abs/2510.13855
  • Zhen Yang, Mingyang Zhang, Feng Chen, Ganggui Ding, Liang Hou, Xin Tao, Pengfei Wan, Ying-Cong Chen, 15 Oct 2025, Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention, https://arxiv.org/abs/2510.13940

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research Topics

Read more about: