Aussie AI
Training Optimization
-
Last Updated 30 August, 2025
-
by David Spuler, Ph.D.
Training is very expensive, leading to a rise in papers on optimization of model training methods. Training cost is typically many multiples of inference, but obviously the total inference cost can overshadow training cost given enough users. Nevertheless, the total cost of training to the industry is likely to remain high, since almost all use cases require not only initial training, but also ongoing fine-tuning and re-training.
Research on training algorithms in general:
- Unsupervised learning
- Reinforceument Learning from Human Feedback (RLHF)
- In-context learning (ICL)
- Direct Preference Optimization (DPO)
- Self-supervised learning (automated AI Feedback)
- Human-In-The-Loop (HITL)
General concepts in LLM reasoning and model capabilities:
- AGI (human-level intelligence)
- Scaling laws
- Inference scaling laws
- Test time compute
- The Wall
Information on improving the accuracy and/or speed of training algorithms:
- Training speed optimizations
- Distributed training
- Federated Learning
- Loss functions
- Gradient optimizers
- Early dropout
- Network optimizations
- Training costs
Improvements in resiliency of the training infastructure for multi-GPU clusters in data centers:
- Resiliency improvements (overview)
- Stragglers
- Hangs
- Silent Data Corruption (SDC)
- GPU failures
- GPU overheating
- GPU aging
- Transient Soft Errors
- High network latency
- Floating-point computation errors
- Floating-point checkers
- Checkpointing
- In-memory checkpointing
- Asynchronous checkpointing
Research on the data used in pre-training:
Modified types of pre-training for models:
Fine-tuning methods include:
- Fine-tuning (traditional full-parameter)
- Parameter-Efficient Fine-Tuning (PEFT)
- LoRA (low-rank adapters)
- QLoRA (quantized LoRA)
- Multi-LoRA
- Post-Optimization Fine-Tuning (POFT) (e.g., after quantization, pruning)
Lesser-known alternatives to fine-tuning being researched for improving model capabilities that require only a single inference step, but may also require a short training-like phase:
- Prompt tuning (extended vocabulary PEFT, typically with extra soft tokens prepended to prompt)
- Decoding-based reasoning in single inference step (e.g., tree decoding)
Retrieval-based alternatives to fine-tuning for extra LLM capabilities and intelligence/accuracy (without requiring any extra training):
- Plug-ins (data source integrations)
- RAG
- RALM (generalized retrieval)
- TAG (database table data)
- Agent architectures (read/write capabilities)
- Agentic RAG
- Agentic workflow (multi-agent, multi-step)
Non-retrieval methods of giving LLMs additional context information for their inference queries, but only with a single inference query (and without traditional RAG-type data retrieval):
- Tool usage
- TALM
- Inference "hooks"
Prompt engineering enhancements to LLM capabilities (single-step):
- Basic prompting methods (e.g., examples, formatting)
- Step-by-step prompting
- Emotional prompting
- Least-to-Most
- Self-ask prompting
- Concise prompting
Advanced topics in prompt engineering (single-shot):
- Prompt optimization techniques
- Programmatic prompt optimization (auto-prompting)
- Advanced prompt engineering (overview)
Inference-based reasoning algorithms with multiple steps combining prompt engineering and inference processing of queries:
- Chain-of-Thought (COT)
- Tree-of-Thought
- Skeleton of Thought
- ReAct (Reason-and-Act)
- Self-reflection (often just called "reflection")
- LLM as Judge
- Best of N (BoN)
- Multi-step inference for reasoning (overview)
Addressing limitations of model intelligence:
- LLM Safety (overview)
- Hallucinations
- Toxicity
- Refusal modules
- Jailbreak prevention
- Prompt shield
- Attribution and citation management
- Explainability
- Bias
- Fairness
- Ethics
- Alignment
- Testing safety
- Factuality
- Data Leakage
- Security
- Guardrails
- Privacy
- Safety monitoring
- Safe C++ programming (for AI engines)
Other directions for model intelligence:
- Planning
- Followup questions
- Interactive prompting
- Program execution models (e.g., LLM generates Python code to run)
- Symbolic reasoning
- Concept models ("large concept models" or LCMs)
Survey Papers on Training Optimizations
Survey papers on speeding up training:
- Yarally T, Cruz L, Feitosa D, et al (2023), Uncovering energy-efficient practices in deep learning training: Preliminary steps towards green AI. International Conference on AI Engineering - Software Engineering for AI (CAIN), https://arxiv.org/abs/2303.13972
- A. Apicella, F. Donnarumma, F. Isgrò, and R. Prevete, A survey on modern trainable activation functions, Neural Networks, vol. 138, pp.14–32, 2021, https://arxiv.org/abs/2005.00817 (Extensive survey all about training with activation functions, e.g. RELU, Swish, Maxout, leaky RELU.)
- R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/ (Survey of on-device training for TinyML/edge computing.)
- P Freire, E Manuylovich, JE Prilepsky, SK Turitsyn, 2023, Artificial neural networks for photonic applications—from algorithms to implementation: tutorial, Advances in Optics and Photonics, Sep 2023, https://opg.optica.org/directpdfaccess/f0ae8746-2f89-4ac4-bb598eda29c7977c_539680/aop-15-3-739.pdf?da=1&id=539680&seq=0&mobile=no (Large survey covering many aspects of the future of training optimization.)
- Marcos Treviso, Tianchu Ji, Ji-Ung Lee, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Pedro H. Martins, Andre F. T. Martins, Pe- ´ ter Milder, Colin Raffel, Edwin Simpson, Noam Slonim, Niranjan Balasubramanian, Leon Derczynski, Roy Schwartz, Aug 2022, Efficient Methods for Natural Language Processing: A Survey. arxiv:2209.00099[cs], August 2022. http://arxiv.org/abs/2209.00099
- MM YAPICI, N Topaloğlu, 2021, Computers and Informatics, Performance comparison of deep learning frameworks https://dergipark.org.tr/en/pub/ci/issue/60236/769457, PDF: https://dergipark.org.tr/en/download/article-file/1201877 (Examines Torch, Theano, Caffe, Caffe2, MXNet, Keras, TensorFlow, and CNTK frameworks in terms of training speed.)
- H. Jahangir, S. K. Goel and S. Khurana, "Scaling Up the Transformers: A Survey of Training and Inference Optimization Techniques," 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT), Greater Noida, India, 2024, pp. 1-6, doi: 10.1109/ICEECT61758.2024.10739061. https://ieeexplore.ieee.org/abstract/document/10739061
- Jiahang Zhou, Yanyu Chen, Zicong Hong, Wuhui Chen, Yue Yu, Tao Zhang, Hui Wang, Chuanfu Zhang, Zibin Zheng, 5 Jan 2024, Training and Serving System of Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2401.02643
- Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, 20 Feb 2024 (v2), Large Language Models: A Survey, https://arxiv.org/abs/2402.06196
- R Abdulkadirov, P Lyakhov, N Nagornov, 2023, Survey of Optimization Algorithms in Modern Neural Networks https://www.mdpi.com/2227-7390/11/11/2466 https://www.mdpi.com/2227-7390/11/11/2466/pdf
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
- Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
- Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
- Zehao Xiao, Cees G. M. Snoek, 6 Nov 2024, Beyond Model Adaptation at Test Time: A Survey. https://arxiv.org/abs/2411.03687
- Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
- Dan Zhang, Tao Feng, Lilong Xue, Yuandong Wang, Yuxiao Dong, Jie Tang, 23 Jan 2025, Parameter-Efficient Fine-Tuning for Foundation Models, https://arxiv.org/abs/2501.13787
Training Speed Optimizations
Papers with specific techniques for optimization of training in terms of throughput, latency or processing speed, rather than accuracy or perplexity of results (chosen out of literally thousands):
- Campos, V., Jou, B., i Nieto, X. G., Torres, J., and Chang, S.-F. (2018). Skip RNN: Learning to skip state updates in recurrent neural networks. In International Conference on Learning Representations. https://openreview.net/forum?id=HkwVAXyCW
- Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi, 2023, SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks, https://arxiv.org/abs/2309.00255 (Generalization of multi-dimensional pruning, by training a large neural network with many sub-networks across different width and depth dimensions.)
- W. Jung, D. Jung, B. Kim, S. Lee, W. Rhee, and J. Ahn, “Restructuring Batch Normalization to Accelerate CNN Training,” in The Conference on Systems and Machine Learning, 2019, https://arxiv.org/abs/1807.01702
- EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian O Gordon, HV Habi, A Netzer, arXiv preprint arXiv:2309.11531, 2023, https://arxiv.org/pdf/2309.11531.pdf Code: https://github.com/sony/model_optimization
- Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
- S Tuli, NK Jha, 2023, TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference, IEEE Transactions on Computer-Aided Design, https://ieeexplore.ieee.org/abstract/document/10144614/, https://arxiv.org/pdf/2303.14882
- M. Mathieu, M. Henaff, and Y. LeCun, 2014, “Fast training of convolutional networks through FFTs,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., https://arxiv.org/abs/1312.5851
- D Zhu, N Yang, L Wang, Y Song, W Wu, F Wei, 2023, PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training https://arxiv.org/abs/2309.10400
- Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism, http://arxiv.org/abs/1811.06965
- Jonas Geiping, Tom Goldstein, Dec 2022, Cramming: Training a Language Model on a Single GPU in One Day, https://arxiv.org/abs/2212.14034 Code: https://github.com/JonasGeiping/cramming (Note: uses Pytorch nvFuser deep learning compiler, which seems to be deprecated now.)
- Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Yong Wu, Sameh Gobriel, Charlie Tai, Anshumali Shrivastava, Mar 2021, Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More, https://arxiv.org/abs/2103.10891, Code: https://github.com/RUSH-LAB/SLIDE (Fast training on CPUs using AVX-512 and locality-sensitive hashing of vectors.)
- GY Lee, T Dam, MM Ferdaus, DP Poenar, VN Duong, Oct 2023, Unlocking the capabilities of explainable fewshot learning in remote sensing, https://arxiv.org/pdf/2310.08619.pdf
- Kai Lv, Yuqing Yang, Tengxiao Liu, Qinghui Gao, Qipeng Guo, and Xipeng Qiu, June 2023, Full parameter fine-tuning for large language models with limited resources, arXiv preprint arXiv:2306.09782, https://arxiv.org/abs/2306.09782 (Fused gradient computation and parameter update saves memory in training kernel by not saving the gradient tensor in memory.)
- Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari, 22 Apr 2024, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, Apple Research, https://arxiv.org/abs/2404.14619 Code: https://huggingface.co/apple/OpenELM
- Benjue Weng, 13 Apr 2024, Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies, https://arxiv.org/abs/2404.09022 (Reviewing fine-tuning of large models.)
- Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang, 2024, Look Ahead or Look Around? ATheoretical Comparison Between Autoregressive and Masked Pretraining, https://openreview.net/pdf?id=2rPoTgEmjV Code: https://github.com/PKU-ML/LookAheadLookAround (Evaluates autoregressive and masked methods in training.)
- Haikuo Shao; Jinming Lu; Meiqi Wang; Zhongfeng Wang, 2023, An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization, IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Early Access), https://ieeexplore.ieee.org/document/10251161
- Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, Jan 2024, Understanding LLMs: A Comprehensive Overview from Training to Inference https://arxiv.org/abs/2401.02038
- Jiahang Zhou, Yanyu Chen, Zicong Hong, Wuhui Chen, Yue Yu, Tao Zhang, Hui Wang, Chuanfu Zhang, Zibin Zheng, 5 Jan 2024, Training and Serving System of Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2401.02643
- Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu, Nov 2023, Initializing Models with Larger Ones, https://arxiv.org/abs/2311.18823 Code: https://github.com/OscarXZQ/weight-selection
- Noam Shazeer, Mitchell Stern, Apr 2018, Adafactor: Adaptive Learning Rates with Sublinear Memory Cost, https://arxiv.org/abs/1804.04235
- Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, Feb 2018, Mixed Precision Training, https://arxiv.org/abs/1710.03740
- M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catanzaro, “Megatron-LM: Training multi-billion parameter language models using model parallelism,” arXiv preprint arXiv:1909.08053, 2019, https://arxiv.org/abs/1909.08053
- Ruixiang Tang, Dehan Kong, Longtao Huang, Hui Xue May 2023 Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning, https://arxiv.org/abs/2305.17256
- Diana Hu, 29/03/2024, Building AI Models is faster and cheaper than you probably think, Y Combinator, https://www.ycombinator.com/blog/building-ai-models
- Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, 20 Feb 2024 (v2), Large Language Models: A Survey, https://arxiv.org/abs/2402.06196
- Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu, 23 Feb 2024, MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, https://arxiv.org/abs/2402.15627
- Carlo Nicolini, Jacopo Staiano, Bruno Lepri, Raffaele Marino, 13 Mar 2024, The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models, https://arxiv.org/abs/2403.08739 (Understanding how LLM parameters change over time during training.)
- Truong Giang Do, Le Huy Khiem, Quang Pham, TrungTin Nguyen, Thanh-Nam Doan, Binh T. Nguyen, Chenghao Liu, Savitha Ramasamy, Xiaoli Li, Steven HOI, Oct 2023, HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts, EMNLP 2023 Conference, https://openreview.net/forum?id=fL8AKDvELp Code: https://github.com/giangdip2410/hyperrouter
- S Guo, J Xu, LL Zhang, M Yang, Oct 2023, Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models, arXiv preprint arXiv:2310.05015, https://arxiv.org/pdf/2310.05015.pdf Code: https://github.com/microsoft/Moonlit/tree/main/Compresso
- H Woisetschläger, A Isenko, S Wang, R Mayer, 2023, Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly, https://arxiv.org/abs/2310.03150
- Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes. In International Conference on Learning Representations, September 2019. https://openreview.net/forum?id=Syx4wnEtvH
- Shar Narasimhan. NVIDIA Clocks World’s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI, August 2019. https://developer.nvidia.com/blog/training-bert-with-gpus/
- R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/
- R Abdulkadirov, P Lyakhov, N Nagornov, 2023, Survey of Optimization Algorithms in Modern Neural Networks https://www.mdpi.com/2227-7390/11/11/2466 https://www.mdpi.com/2227-7390/11/11/2466/pdf
- David Spuler, March 2024, Chapter 6. Training, Fine-Tuning & RAG, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Kirill Kolodiazhnyi, May 15, 2020, Hands-On Machine Learning with C++: Build, train, and deploy end-to-end machine learning and deep learning pipelines, https://www.amazon.com/Hands-Machine-Learning-end-end/dp/1789955335/
- Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu, 6 Jul 2023 (v2), A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, https://arxiv.org/pdf/2204.09269.pdf
- Adi Gangidi, KR Kishore, Jenya Lee, June 12, 2024, How Meta trains large language models at scale, Meta Research, https://engineering.fb.com/2024/06/12/data-infrastructure/training-large-language-models-at-scale-meta/
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- NVIDIA, June 2024, Nemotron-4 340B Technical Report, https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf (Architecture is decoder-only with GQA, SentencePiece tokenizer, causal attention masks, RoPE, 96 layers, 96 heads, 8 KV heads, 256,000 vocabulary, 18432 internal dimension, context window 4096, and uses squared RELU.)
- Yi Zhou, Dec 16, 2023, Optimizing GenAI: Comparing Model Training, Fine-Tuning, RAG, and Prompt Engineering, https://medium.com/generative-ai-revolution-ai-native-transformation/optimizing-genai-comparing-model-training-fine-tuning-rag-and-prompt-engineering-7a7c6c65e0f0
- Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
- Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng, 25 Jan 2024, Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing, https://arxiv.org/abs/2312.14472 (Dynamic routing based on easy vs hard queries to optimize training.)
- You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
- Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
- Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane, 19 Jul 2024 (v2), The Future of Large Language Model Pre-training is Federated, https://arxiv.org/abs/2405.10853
- Kaizhao Liang, Bo Liu, Lizhang Chen, Qiang Liu, 23 Aug 2024, Memory-Efficient LLM Training with Online Subspace Descent, https://arxiv.org/abs/2408.12857 https://github.com/kyleliang919/Online-Subspace-Descent
- Sophia R. Cunningham,Dominique Archambault,Austin Kung, May 2024, Efficient Training and Inference: Techniques for Large Language Models Using Llama, https://www.techrxiv.org/doi/full/10.36227/techrxiv.171651876.65094225/v1
- Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou, 23 Aug 2024, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 (Training using low-rank matrices to approximate attention.)
- Agarwal, Saurabh, Aug 2024, Minimizing Data Movement in Machine Learning Systems, Ph.D. Thesis, Computer Sciences, University of Wisconsin--Madison, https://digital.library.wisc.edu/1711.dl/MKLIYRPB24A5R9D https://search.library.wisc.edu/digital/AMKLIYRPB24A5R9D PDF: https://asset.library.wisc.edu/1711.dl/QXSTVAIXECHQA8L/R/file-62b54.pdf?dl https://www.proquest.com/openview/c1ae2a92106d7ec681a7296cd163e0c1/1 (Dataflow optimization in training and also "clustered head attention" for memory-efficient inference, an extension of multi-head attention similar to layer-wise head fusion/pruning.)
- Jaime Sevilla Edu Roldán, May 28, 2024, Training Compute of Frontier AI Models Grows by 4-5x per Year, Epoch AI blog, https://epochai.org/blog/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year
- Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu, Dec 2023, Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models, https://arxiv.org/abs/2311.03687 (Benchmarks model speed for training, fine-tuning and inference with various optimizations such as ZeRO, quantization, offloading/recomputation, and Flash Attention.)
- Ari Lotter, Jeffrey Quesnelle, Umer H. Adil, Dillon Rolnick, Esteban La Rocca, A Preliminary Report on Distro, 2024, https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO.pdf https://venturebeat.com/wp-content/uploads/2024/08/A_Preliminary_Report_on_DisTrO.pdf (Reducing the inter-GPU networking bandwidth cost during training.)
- WenZheng Zhang, Yang Hu, Jing Shi, Xiaoying Bai, 22 Aug 2024, Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters, https://arxiv.org/abs/2408.12596
- Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
- Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos, 9 Oct 2024, TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training, https://arxiv.org/abs/2410.06511
- Byron (Pin-Lun)Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, 14 Oct 2024, Liger Kernel: Efficient Triton Kernels for LLM Training, https://arxiv.org/abs/2410.10989 http://github.com/linkedin/Liger-Kernel
- Ankit Singh Rawat, Veeranjaneyulu Sadhanala, Afshin Rostamizadeh, Ayan Chakrabarti, Wittawat Jitkrittum, Vladimir Feinberg, Seungyeon Kim, Hrayr Harutyunyan, Nikunj Saunshi, Zachary Nado, Rakesh Shivanna, Sashank J. Reddi, Aditya Krishna Menon, Rohan Anil, Sanjiv Kumar, 24 Oct 2024, A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs, https://arxiv.org/abs/2410.18779
- Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar, Fartash Faghri, 25 Oct 2024, Computational Bottlenecks of Training Small-scale Large Language Models, https://arxiv.org/abs/2410.19456
- Wasim Rajput, Oct 30, 2024, Developing Large Language Models (LLMs): A Step-by-Step Guide from Concept to Deployment. How LLMs like ChatGPT, Gemini, and Others are Developed, https://medium.com/the-generator/from-concept-to-deployment-a-practical-guide-to-developing-large-language-models-llms-d60b5841cade
- Zehao Xiao, Cees G. M. Snoek, 6 Nov 2024, Beyond Model Adaptation at Test Time: A Survey. https://arxiv.org/abs/2411.03687
- Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
- Sebastian Raschka, October 29, 2024, Build a Large Language Model (From Scratch), Manning, https://github.com/rasbt/LLMs-from-scratch https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167
- Hao Ge, Fangcheng Fu, Haoyang Li, Xuanyu Wang, Sheng Lin, Yujie Wang, Xiaonan Nie, Hailin Zhang, Xupeng Miao, and Bin Cui. 2024. Enabling Parallelism Hot Switching for Efficient Training of Large Language Models. In Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles (SOSP '24). Association for Computing Machinery, New York, NY, USA, 178–194. https://doi.org/10.1145/3694715.3695969 https://dl.acm.org/doi/abs/10.1145/3694715.3695969
- Erik Wijmans, Brody Huval, Alexander Hertzberg, Vladlen Koltun, Philipp Krähenbühl, 13 Nov 2024, Cut Your Losses in Large-Vocabulary Language Models, https://arxiv.org/abs/2411.09009 https://github.com/apple/ml-cross-entropy (Memory-efficient computation of cross-entropy in training.)
- R. Li, D. Fu, C. Shi, Z. Huang and G. Lu, "Efficient LLMs Training and Inference: An Introduction," in IEEE Access, doi: 10.1109/ACCESS.2024.3501358. https://ieeexplore.ieee.org/abstract/document/10756602 https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10756602
- Nir Barazida, Mar 9, 2022, Distributed training of deep learning models: handling stragglers and latency in synchronous training A review of the challenges in Synchronous distributed training and best solutions for stragglers and high latency https://towardsdatascience.com/stragglers-and-latency-in-synchronous-distributed-training-of-deep-learning-models-43783b0266d9
- Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz, 21 Mar 2017 (v3), Revisiting Distributed Synchronous SGD, https://arxiv.org/abs/1604.00981
- Palak (Microsoft Research India), Rohan Gandhi (Microsoft Research India), Karan Tandon (Microsoft Research India), Debopam Bhattacherjee (Microsoft Research India), Venkata N. Padmanabhan (Microsoft Research India), 16 Nov 2024, Improving training time and GPU utilization in geo-distributed language model training, https://arxiv.org/abs/2411.14458
- Chenghao Hu and Baochun Li. 2024. Menos: Split Fine-Tuning Large Language Models with Efficient GPU Memory Sharing. In Proceedings of the 25th International Middleware Conference (MIDDLEWARE '24). Association for Computing Machinery, New York, NY, USA, 185–198. https://doi.org/10.1145/3652892.3700758 https://dlnext.acm.org/doi/10.1145/3652892.3700758 https://iqua.ece.toronto.edu/papers/chenghao-middleware24.pdf
- Carl Franzen, August 27, 2024, ‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency, https://venturebeat.com/ai/this-could-change-everything-nous-research-unveils-new-tool-to-train-powerful-ai-models-with-10000x-efficiency/
- Carl Franzen, December 2, 2024, Nous Research is training an AI model using machines distributed across the internet, https://venturebeat.com/ai/nous-research-is-training-an-ai-model-using-machines-distributed-across-the-internet/
- Haoyang Li, Fangcheng Fu, Sheng Lin, Hao Ge, Xuanyu Wang, Jiawen Niu, Jie Jiang, Bin Cui, 10 Dec 2024, Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences, https://arxiv.org/abs/2412.07894
- Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill, Lester James V. Miranda, Jacob Morrison, Tyler Murray, Crystal Nam, Valentina Pyatkin, Aman Rangapur, Michael Schmitz, Sam Skjonsberg, David Wadden, Christopher Wilhelm, Michael Wilson, Luke Zettlemoyer, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi, 31 Dec 2024, 2 OLMo 2 Furious, https://arxiv.org/abs/2501.00656
- Zongbiao Li , Xiezhao Li , Yinghao Cui , Yijun Chen , Zhixuan Gu , Yuxuan Liu , Wenbo Zhu , Fei Jia , Ke Liu , Qifeng Li , Junyao Zhan , Jiangtao Zhou , Chenxi Zhang , Qike Liu, 31 Dec 2024, Automatically Planning Optimal Parallel Strategy for Large Language Models, https://arxiv.org/abs/2501.00254
- Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
- NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
- Weichen Fan, Chenyang Si, Junhao Song, Zhenyu Yang, Yinan He, Long Zhuo, Ziqi Huang, Ziyue Dong, Jingwen He, Dongwei Pan, Yi Wang, Yuming Jiang, Yaohui Wang, Peng Gao, Xinyuan Chen, Hengjie Li, Dahua Lin, Yu Qiao, Ziwei Liu, 14 Jan 2025, Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models, https://arxiv.org/abs/2501.08453 (Efficient training of text-to-video models.)
- Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
- https://theses.hal.science/tel-04890912/file/ZHAO_XUNYI_2024.pdf Xunyi Zhao. Optimizing Memory Usage when Training Deep Neural Networks. Computer Science [cs]. Université de Bordeaux, France, 2024. English. NNT: 2024BORD0411 . tel-04890912
- Kaiyuan Tian, Linbo Qiao, Baihui Liu, Gongqingjian Jiang, Dongsheng Li, 21 Jan 2025, A Survey on Memory-Efficient Large-Scale Model Training in AI for Science, https://arxiv.org/abs/2501.11847
- Tanya Rodchenko, Natasha Noy, Nino Scherrer, Jennifer Prendki, 23 Jan 2025, Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling, https://arxiv.org/abs/2501.13779
- Tech Fund, Feb 03, 2025, The Winners from DeepSeek, Nvidia, and The Outlook in AI: A tour of the space & AI-exposed stocks, https://www.techinvestments.io/p/the-winners-from-deepseek-nvidia
- Thor Olavsrud, How DeepSeek changes the gen AI equation for CIOs, 30 Jan 2025, https://www.cio.com/article/3813555/what-cios-should-learn-now-that-deepseek-is-here.html (" the future of gen AI lies in innovative, cost-efficient approaches")
- Maxwell Zeff, February 5, 2025, Researchers created an open rival to OpenAI’s o1 ‘reasoning’ model for under $50, https://techcrunch.com/2025/02/05/researchers-created-an-open-rival-to-openais-o1-reasoning-model-for-under-50/
- Kyle Wiggers, January 11, 2025, Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450,https://techcrunch.com/2025/01/11/researchers-open-source-sky-t1-a-reasoning-ai-model-that-can-be-trained-for-less-than-450/
- Di Chai, Pengbo Li, Feiyuan Zhang, Yilun Jin, Han Tian, Junxue Zhang, Kai Chen, 1 Feb 2025, Enhancing Token Filtering Efficiency in Large Language Model Training with Collider, https://arxiv.org/abs/2502.00340 (Token reduction in training.)
- XYZ Labs, Feb 23, 2025, Open Reasoner Zero: A Breakthrough in AI Training Efficiency Matches DeepSeek with Just 1/30th of Training Steps. Major AI Figures Including Kai-Fu Lee, Harry Shum, and Xiangyu Zhang Unveil Revolutionary Open-Source Training Method. https://xyzlabs.substack.com/p/open-reasoner-zero-a-breakthrough
- Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
- J Lin, Z Liu, Y You, J Wang, W Zhang, R Zhao, 2025, WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training, PPoPP ’25, March 1–5, 2025, Las Vegas, NV, USA, https://dl.acm.org/doi/pdf/10.1145/3710848.3710869 https://doi.org/10.1145/3710848.3710869
- Eli Verwimp, Guy Hacohen, Tinne Tuytelaars, 28 Feb 2025, Same accuracy, twice as fast: continuous training surpasses retraining from scratch, https://arxiv.org/abs/2502.21147
- Hao Ge, Junda Feng, Qi Huang, Fangcheng Fu, Xiaonan Nie, Lei Zuo, Haibin Lin, Bin Cui, Xin Liu, 28 Feb 2025, ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs, https://arxiv.org/abs/2502.21231 (Addressing training inefficiencies when training data ranges from short to very long queries, including via hybrid data parallelism and communications optimizations.)
- Zhang, ZX., Wen, YB., Lyu, HQ. et al. AI Computing Systems for Large Language Models Training. J. Comput. Sci. Technol. 40, 6–41 (2025). https://doi.org/10.1007/s11390-024-4178-1 https://link.springer.com/article/10.1007/s11390-024-4178-1
- Dr. Ashish Bamania, Apr 26, 2025, You Don’t Need Backpropagation To Train Neural Networks Anymore: A deep dive into the ‘NoProp’ algorithm that eliminates the need for Forward pass and Backpropagation to train neural networks, and learning to code it from scratch, https://ai.gopubby.com/you-dont-need-backpropagation-to-train-neural-networks-anymore-e989d75564cb
- Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, Xi Wang, Cong Xie, Qi Huang, Wen Heng, Yiyuan Ma, Wenlei Bao, Size Zheng, Yanghua Peng, Haibin Lin, Xuanzhe Liu, Xin Jin, Xin Liu, 19 May 2025 (v2), MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production,https://arxiv.org/abs/2505.11432
- MiniMax: Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, (and many more authors), 16 Jun 2025, MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention, https://arxiv.org/abs/2506.13585 https://github.com/MiniMax-AI/MiniMax-M1 (A 456B MoE reasoning model trained with RL and has various optimizations in training efficiency and attention kernel.)
- Michael Nuñez, July 11, 2025, Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free, https://venturebeat.com/ai/moonshot-ais-kimi-k2-outperforms-gpt-4-in-key-benchmarks-and-its-free/ (One trillion parameters with 32B experts activated each time. Examines new training optimizer MuonClip as more efficient and more stable than variants of AdamW for training.)
- Carles Gelada, Jacob Buckman, Sean Zhang, Txus Bach, 6 Jul 2025, Scaling Context Requires Rethinking Attention, https://arxiv.org/abs/2507.04239
- John Edwards, Jul 22, 2025 7 things you need to know about AI and the data center, https://www.cio.com/article/222623/7-things-to-know-about-ai-in-the-data-center.html
- Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf
- Penghui Qi, Xinyi Wan, Guangxing Huang, Min Lin, 30 Nov 2023, Zero Bubble Pipeline Parallelism, https://arxiv.org/abs/2401.10241 https://github.com/sail-sg/zero-bubble-pipeline-parallelism
- Joel Lamy-Poirier, 6 Jul 2023 (v2), Breadth-First Pipeline Parallelism https://arxiv.org/abs/2211.05953
- MiniMax, 2025, MiniMax-01: Scaling Foundation Models with Lightning Attention, https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf
- Sam McCandlish, Jared Kaplan, Dario Amodei, OpenAI Dota Team, 14 Dec 2018, An Empirical Model of Large-Batch Training, OpenAI, https://arxiv.org/abs/1812.06162
- Cameron R. Wolfe, Ph.D., Apr 28, 2025, Llama 4: The Challenges of Creating a Frontier-Level LLM: The full story behind Llama 4 and Meta's huge pivot in research strategy, https://cameronrwolfe.substack.com/p/llama-4
- Yichun Yin, Wenyong Huang, Kaikai Song, Yehui Tang, Xueyu Wu, Wei Guo, Peng Guo, Yaoyuan Wang, Xiaojun Meng, Yasheng Wang, (many more authors), 11 Apr 2025 (v2), Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs, https://arxiv.org/abs/2504.07866 (135B model trained on 13.2T tokens using Ascend NPUs.)
- Sean Goedecke, Aug 2025, What's the strongest AI model you can train on a laptop in five minutes? https://www.seangoedecke.com/model-on-a-mbp/
- Ben Dickson, August 18, 2025, GEPA optimizes LLMs without costly reinforcement learning, https://venturebeat.com/ai/gepa-optimizes-llms-without-costly-reinforcement-learning/
- Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang and Peng Zhang, 23 Jul 2025, Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning, https://arxiv.org/abs/2507.16802
- Fabian Schaipp, Alexander H\"agele, Adrien Taylor, Umut Simsekli, Francis Bach, 23 Jul 2025, The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training, https://arxiv.org/abs/2501.18965
- Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, Vasiliy Kuznetsov, Christian Puhrsch, Mark Saroufim, Supriya Rao, Thien Tran, Aleksandar Samard\v{z}i\'c, 21 Jul 2025, TorchAO: PyTorch-Native Training-to-Serving Model Optimization, https://arxiv.org/abs/2507.16099
- Philip Zmushko, Aleksandr Beznosikov, Martin Tak\'a\v{c}, Samuel Horv\'ath, 14 Aug 2025, FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training, https://arxiv.org/abs/2411.07837
- Yue Hu and Zanxia Cao and Yingchao Liu, 26 Jul 2025, Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training, https://arxiv.org/abs/2507.19968
- Mayumi Nakano, Yuya Seki, Shuta Kikuchi, Shu Tanaka, 28 Jul 2025, Optimization Performance of Factorization Machine with Annealing under Limited Training Data, https://arxiv.org/abs/2507.21024
- Jiayi Tian, Jinming Lu, Hai Li, Xiangwei Wang, Cong Hao, Ian Young, Zheng Zhang, 6 Aug 2025, Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization, https://arxiv.org/abs/2501.06663
- Jiaqi Zhao, Weili Guan, Ming Li, Miao Zhang, 6 Aug 2025, Boost Post-Training Quantization via Null Space Optimization for Large Language Models, https://arxiv.org/abs/2506.11044
- Ziyin Gu, Jingyao Wang, Ran Zuo, Chuxiong Sun, Zeen Song, Changwen Zheng, Wenwen Qiang, 7 Aug 2025, Group Causal Policy Optimization for Post-Training Large Language Models, https://arxiv.org/abs/2508.05428
- Maxime Heuillet, Rishika Bhagwatkar, Jonas Ngnaw\'e, Yann Pequignot, Alexandre Larouche, Christian Gagn\'e, Irina Rish, Ola Ahmad, Audrey Durand, 12 Aug 2025, A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy, https://arxiv.org/abs/2508.14079
- Samiul Basir Bhuiyan, Md. Sazzad Hossain Adib, Mohammed Aman Bhuiyan, Muhammad Rafsan Kabir, Moshiur Farazi, Shafin Rahman, Nabeel Mohammed, 18 Aug 2025, Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining, https://arxiv.org/abs/2508.15828
- Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao, 22 Aug 2025, Efficient RL Training for Reasoning Models via Length-Aware Optimization, https://arxiv.org/abs/2505.12284
Fine-Tuning
Papers on fine-tuning optimizations:
- Libo Qin, Qiguang Chen, Xiachong Feng, Yang Wu, Yongheng Zhang, Yinghui Li, Min Li, Wanxiang Che, Philip S. Yu, 21 May 2024, Large Language Models Meet NLP: A Survey, https://arxiv.org/abs/2405.12819 (A survey of research into how LLMs, with and without fine-tuning, perform in various NLP use cases, such as mathematical reasoning, dialogue understanding, translation, and more.)
- Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
- Benjue Weng, 13 Apr 2024, Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies, https://arxiv.org/abs/2404.09022 (Reviewing fine-tuning of large models.)
- Tal Peretz, 15 NOV 2023, The Developer's Guide to Production-Grade LLM Apps: Advanced Techniques for Maximizing LLM Performance, https://buildingaistuff.com/p/the-developers-guide-to-production
- Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim, 18 Jan 2024, Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation, https://arxiv.org/abs/2401.08417
- David Spuler, March 2024, Chapter 6. Training, Fine-Tuning & RAG, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
- Pranav Patel, 2024, In-depth guide to fine-tuning LLMs with LoRA and QLoRA, https://www.mercity.ai/blog-post/guide-to-fine-tuning-llms-with-lora-and-qlora
- Kai Lv, Yuqing Yang, Tengxiao Liu, Qinghui Gao, Qipeng Guo, Xipeng Qiu, 6 Jun 2024 (v2), Full Parameter Fine-tuning for Large Language Models with Limited Resources, https://arxiv.org/abs/2306.09782 Code: https://github.com/OpenLMLab/LOMO (Low-memory usage for full-parameter fine-tuning.)
- Louis-François Bouchard, Louie Peters, May 2024, Chapter 10: Fine-Tuning, Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG, https://www.amazon.com/Building-LLMs-Production-Reliability-Fine-Tuning/dp/B0D4FFPFW8/
- Valentina Alto, 2024, Chapter 11: Fine-Tuning Large Language Models, Building LLM-Powered Applications: Create intelligence apps and agents with large language models, Packt Publishing, https://www.amazon.com/Building-LLM-Apps-Intelligent-Language/dp/1835462316/
- Aarushi Kansal, Chapter 5: Fine-Tuning: The Theory, Chapter 6: Fine-Tuning: Hands-On,, Building Generative AI-Powered Apps: A Hands-on Guide for Developers, Apress, https://www.amazon.com/Building-Generative-AI-Powered-Apps-Hands-ebook/dp/B0CTXXP1S4/
- Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang, 27 Jun 2024, From Efficient Multimodal Models to World Models: A Survey, https://arxiv.org/abs/2407.00118 (A survey of multimodal models with coverage of many optimization techniques.)
- Yi Zhou, Dec 16, 2023, Optimizing GenAI: Comparing Model Training, Fine-Tuning, RAG, and Prompt Engineering, https://medium.com/generative-ai-revolution-ai-native-transformation/optimizing-genai-comparing-model-training-fine-tuning-rag-and-prompt-engineering-7a7c6c65e0f0
- Dan Peng, Zhihui Fu, Jun Wang, 1 Jul 2024, PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs, https://arxiv.org/abs/2407.01031 (Running fine-tuning on a smartphone via a low-memory optimization using a "derivative-free" "zeroth-order" technique called MeZo, with advantages such as privacy.)
- OpenAI, August 20, 2024, Fine-tuning now available for GPT-4o, https://openai.com/index/gpt-4o-fine-tuning/
- Judy Hanwen Shen, Inioluwa Deborah Raji, Irene Y. Chen, 8 Aug 2024, The Data Addition Dilemma, https://arxiv.org/abs/2408.04154
- Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen, 28 May 2024 (v3) Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark, https://arxiv.org/abs/2402.11592 Code: https://github.com/ZO-Bench/ZO-LLM
- Junjie Ye, Yuming Yang, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan, 24 Sep 2024, Empirical Insights on Fine-Tuning Large Language Models for Question-Answering, https://arxiv.org/abs/2409.15825
- Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
- Foundry AI, Oct 2024, When Should You Move Beyond Prompting and Start Fine-Tuning? https://thefoundryai.com/blog/fine-tuning
- Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
- Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra, 30 Jan 2024 (v3), RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406
- Towards AI, December 24, 2024, Llm Fine Tuning Guide: Do You Need It and How to Do It https://towardsai.net/p/artificial-intelligence/llm-fine-tuning-guide-do-you-need-it-and-how-to-do-it-4
- Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
- Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
- Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
- Maxime Heuillet, Yufei Cui, Boxing Chen, Audrey Durand, Prasanna Parthasarathi, 13 Aug 2025, Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts, https://arxiv.org/abs/2508.10123
- Tianjun Yuan, Jiaxiang Geng, Pengchao Han, Xianhao Chen, Bing Luo, 14 Aug 2025, Flexible Personalized Split Federated Learning for On-Device Fine-Tuning of Foundation Models, https://arxiv.org/abs/2508.10349
- Dongyue Li and Hongyang R. Zhang, 13 Aug 2025, Improved Regularization and Robustness for Fine-tuning in Neural Networks, https://arxiv.org/abs/2111.04578
- Yanxia Deng, Aozhong Zhang, Selcuk Gurses, Naigang Wang, Zi Yang and Penghang Yin, 14 Aug 2025, CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization, https://arxiv.org/abs/2501.18475
- Suhas G Hegde, Shilpy Kaur, Aruna Tiwari, 14 Aug 2025, VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models, https://arxiv.org/abs/2503.19530
- Andrew P. Berg, Qian Zhang, Mia Y. Wang, 14 Aug 2025, 15,500 Seconds: Lean UAV Classification Using EfficientNet and Lightweight Fine-Tuning, https://arxiv.org/abs/2506.11049
- Sol\`ene Debuys\`ere, Nicolas Trouv\'e, Nathan Letheule, Olivier L\'ev\^eque, Elise Colin, 14 Aug 2025, Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Images, https://arxiv.org/abs/2506.13307
- Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong, 23 Jul 2025, LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning, https://arxiv.org/abs/2506.15606
- Simon Ouellette, 17 Jul 2025, Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning, https://arxiv.org/abs/2507.15877
- Boheng Li, Renjie Gu, Junjie Wang, Leyi Qi, Yiming Li, Run Wang, Zhan Qin, Tianwei Zhang, 22 Jul 2025, Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning, https://arxiv.org/abs/2507.16302
- Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda, 22 Jul 2025, Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning, https://arxiv.org/abs/2507.16795
- Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li, 22 Jul 2025, Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance, https://arxiv.org/abs/2407.17029
- Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang, 22 Jul 2025, Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning, https://arxiv.org/abs/2506.17576
- Binghua Li, Ziqing Chang, Tong Liang, Chao Li, Toshihisa Tanaka, Shigeki Aoki, Qibin Zhao, Zhe Sun, 24 Jul 2025, Parameter-Efficient Fine-Tuning of 3D DDPM for MRI Image Generation Using Tensor Networks, https://arxiv.org/abs/2507.18112
- Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Mi Tian, Hua Huang, 24 Jul 2025, Zeroth-Order Fine-Tuning of LLMs in Random Subspaces, https://arxiv.org/abs/2410.08989
- Tim Rensmeyer, Denis Kramer, Oliver Niggemann, 18 Jul 2025, On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach, https://arxiv.org/abs/2507.13805
- Amro Abdalla, Ismail Shaheen, Dan DeGenaro, Rupayan Mallick, Bogdan Raita, Sarah Adel Bargal, 18 Jul 2025, GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention, https://arxiv.org/abs/2507.13598
- Rafiq Kamel, Filippo Guerranti, Simon Geisler, Stephan G\"unnemann, 15 Jul 2025, SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation, https://arxiv.org/abs/2507.13381
- Qitao Tan, Jun Liu, Zheng Zhan, Caiwei Ding, Yanzhi Wang, Xiaolong Ma, Jaewoo Lee, Jin Lu, Geng Yuan, 18 Jul 2025, Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning, https://arxiv.org/abs/2502.03304
- Harsh Nilesh Pathak and Randy Paffenroth, 18 Jul 2025, Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers, https://arxiv.org/abs/2507.14353
- Fufang Wen and Shichang Zhang, 14 Jul 2025, Retention analysis of edited knowledge after fine-tuning, https://arxiv.org/abs/2507.14198
- Yujia Tong, Jingling Yuan, Tian Zhang, Jianquan Liu, Chuang Hu, 19 Jul 2025, DFQ-ViT: Data-Free Quantization for Vision Transformers without Fine-tuning, https://arxiv.org/abs/2507.14481
- Wooseok Ha, Yuansi Chen, 19 Jul 2025, When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts, https://arxiv.org/abs/2507.14661
- Roy H. Jennings, Genady Paikin, Roy Shaul, Evgeny Soloveichik, 20 Jul 2025, Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression, https://arxiv.org/abs/2507.14997
- Hanyang Zhao, Haoxian Chen, Yucheng Guo, Genta Indra Winata, Tingting Ou, Ziyu Huang, David D. Yao, Wenpin Tang, 19 Jul 2025, Fine-Tuning Diffusion Generative Models via Rich Preference Optimization, https://arxiv.org/abs/2503.11720
- Xingke Yang and Liang Li and Sicong Li and Liwei Guan and Hao Wang and Xiaoqi Qi and Jiang Liu and Xin Fu and Miao Pan, 9 Aug 2025, Fed MobiLLM: Efficient Federated LLM Fine-Tuning over Heterogeneous Mobile Devices via Server Assisted Side-Tuning, https://arxiv.org/abs/2508.06765
- Brendan R. Hogan, Will Brown, Adel Boyarsky, Anderson Schneider, Yuriy Nevmyvaka, 9 Aug 2025, Technical Report: Full-Stack Fine-Tuning for the Q Programming Language, https://arxiv.org/abs/2508.06813
- Amal Saadallah, Abdulaziz Al-Ademi, 11 Aug 2025, Adaptive Fine-Tuning via Pattern Specialization for Deep Time Series Forecasting, https://arxiv.org/abs/2508.07927
- Bujar Raufi, 10 Aug 2025, Fine-Tuning Large Language Models Using EEG Microstate Features for Mental Workload Assessment, https://arxiv.org/abs/2508.07283
- Zhaorui Tan, Tan Pan, Kaizhu Huang, Weimiao Yu, Kai Yao, Chen Jiang, Qiufeng Wang, Anh Nguyen, Xin Guo, Yuan Cheng, Xi Yang, 11 Aug 2025, Exploiting Layer Normalization Fine-tuning in Visual Transformer Foundation Models for Classification, https://arxiv.org/abs/2508.07577
- Vishwas M. Shetty, Jiusi Zheng, Abeer Alwan, 11 Aug 2025, G-IFT: A Gated Linear Unit adapter with Iterative Fine-Tuning for Low-Resource Children's Speaker Verification, https://arxiv.org/abs/2508.07836
- Xingke Yang and Liang Li and Zhiyi Wan and Sicong Li and Xiaoqi Qi and Jiang Liu and Tomoaki Ohtsuki and Xin Fu and Miao Pan, 9 Aug 2025, PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning, https://arxiv.org/abs/2507.01216
- Mohammad Mehdi Rastikerdar, Jin Huang, Hui Guan, Deepak Ganesan, 11 Aug 2025, In-Situ Fine-Tuning of Wildlife Models in IoT-Enabled Camera Traps for Efficient Adaptation, https://arxiv.org/abs/2409.07796
- Qingguo Wang, 10 Aug 2025, Accurate Measles Rash Detection via Vision Transformer Fine-Tuning, https://arxiv.org/abs/2005.09112
- Atharva Nijasure, Tanya Chowdhury, James Allan, 10 Aug 2025, How Relevance Emerges: Interpreting LoRA Fine-Tuning in Reranking LLMs, https://arxiv.org/abs/2504.08780
- Yining Huang,Bin Li,Keke Tang,Meilian Chen, 28 Jul 2025, LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning, https://arxiv.org/abs/2507.20999
- Roman Mach\'a\v{c}ek and Anastasiia Grishina and Max Hort and Leon Moonen, 26 Jul 2025, The Impact of Fine-tuning Large Language Models on Automated Program Repair, https://arxiv.org/abs/2507.19909
- Fabrizio Nunnari, Alakshendra Jyotsnaditya Ramkrishna Singh, Patrick Gebhard, 27 Jul 2025, Color histogram equalization and fine-tuning to improve expression recognition of (partially occluded) faces on sign language datasets, https://arxiv.org/abs/2507.20197
- Wei Lu, Daniel L. Chen, Christian B. Hansen, 28 Jul 2025, Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach, https://arxiv.org/abs/2507.20796
- Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin, 28 Jul 2025, Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards, https://arxiv.org/abs/2505.16789
- Yifu Han and Geo Zhang, 27 Jul 2025, Reinforcement learning fine-tuning of language model for instruction following and math reasoning, https://arxiv.org/abs/2506.21560
- Zixuan Chen and Weikai Lu and Xin Lin and Ziqian Zeng, 27 Jul 2025, SDD: Self-Degraded Defense against Malicious Fine-tuning, https://arxiv.org/abs/2507.21182
- Zengyang Li, Yimeng Li, Binbin Huang, Peng Liang, Ran Mo, Hui Liu, Yutao Ma, 29 Jul 2025, Fine-Tuning Code Language Models to Detect Cross-Language Bugs, https://arxiv.org/abs/2507.21954
- Aly M. Kassem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi, 19 Jun 2025, Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing, https://arxiv.org/abs/2507.21084
- Georg Slamanig, Francesco Corti, Olga Saukh, 31 Jul 2025, From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices, https://arxiv.org/abs/2507.23536
- Sirine Arfa, Bernhard Vogginger, Christian Mayr, 31 Jul 2025, Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform, https://arxiv.org/abs/2507.23562
- Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang and Khaled Ben Letaief, 31 Jul 2025, Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks, https://arxiv.org/abs/2504.10403
- Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong, 31 Jul 2025, H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity, https://arxiv.org/abs/2507.22633
- Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel, 30 Jul 2025, ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology, https://arxiv.org/abs/2503.17564
- Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen, 30 Jul 2025, Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning, https://arxiv.org/abs/2507.22565
- Yebo Wu, Jingguang Li, Zhijiang Guo and Li Li, 31 Jul 2025, Learning Like Humans: Resource-Efficient Federated Fine-Tuning through Cognitive Developmental Stages, https://arxiv.org/abs/2508.00041
- Paul Albert, Frederic Z. Zhang, Hemanth Saratchandran, Anton van den Hengel, Ehsan Abbasnejad, 1 Aug 2025, Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri--Rao Product, https://arxiv.org/abs/2508.00230
- Shayan Jalilian, Abdul Bais, 31 Jul 2025, SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters, https://arxiv.org/abs/2508.00213
- Prerana Ramkumar, 1 Aug 2025, SU-ESRGAN: Semantic and Uncertainty-Aware ESRGAN for Super-Resolution of Satellite and Drone Imagery with Fine-Tuning for Cross Domain Evaluation, https://arxiv.org/abs/2508.00750
- Julian Lemmel, Manuel Kranzl, Adam Lamine, Philipp Neubauer, Radu Grosu, Sophie Neubauer, 1 Aug 2025, Online Fine-Tuning of Carbon Emission Predictions using Real-Time Recurrent Learning for State Space Models, https://arxiv.org/abs/2508.00804
- Derin Cayir, Renjie Tao, Rashi Rungta, Kai Sun, Sean Chen, Haidar Khan, Minseok Kim, Julia Reinspach, Yue Liu, 3 Aug 2025, Refine-n-Judge: Curating High-Quality Preference Chains for LLM-Fine-Tuning, https://arxiv.org/abs/2508.01543
- Yixin Shen, 4 Aug 2025, Kronecker-LoRA: hybrid Kronecker-LoRA adapters for scalable, sustainable fine-tuning, https://arxiv.org/abs/2508.01961
- Amitava Das, Abhilekh Borah, Vinija Jain, Aman Chadha, 4 Aug 2025, AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization, https://arxiv.org/abs/2508.02079
- Yilun Liu, Yunpu Ma, Yuetian Lu, Shuo Chen, Zifeng Ding, Volker Tresp, 4 Aug 2025, Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules, https://arxiv.org/abs/2508.02587
- Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia, 4 Aug 2025, CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning, https://arxiv.org/abs/2508.02219
- Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Aastha Verma, Natraj Raman, Sriram Gopalakrishnan, Niladri Chatterjee, Tanmoy Chakraborty, 3 Aug 2025, Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation, https://arxiv.org/abs/2411.04358
- Yinbin Han, Meisam Razaviyayn, Renyuan Xu, 3 Aug 2025, Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence, https://arxiv.org/abs/2412.18164
- Jack Chen, Fazhong Liu, Naruto Liu, Yuhan Luo, Erqu Qin, Harry Zheng, Tian Dong, Haojin Zhu, Yan Meng, Xiao Wang, 4 Aug 2025, Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs, https://arxiv.org/abs/2505.13026
- Yidong Chai (1 and 2), Yang Liu (1 and 2), Yonghang Zhou (1 and 2), Jiaheng Xie (3), Daniel Dajun Zeng (4) ((1) School of Management, Hefei University of Technology, Hefei, China, (2) Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei, China, (3) Department of Accounting and MIS, Lerner College of Business and Economics, University of Delaware, Newark, Delaware, U.S., (4) Institute of Automation, Chinese Academy of Sciences, Beijing, China), 31 Jul 2025, A Bayesian Hybrid Parameter-Efficient Fine-Tuning Method for Large Language Models, https://arxiv.org/abs/2508.02711
- Jingyi Chen, Ju Seung Byun, Micha Elsner, Pichao Wang, Andrew Perrault, 5 Aug 2025, Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback, https://arxiv.org/abs/2508.03123
- Yutong Chen, Jiandong Gao, Ji Wu, 5 Aug 2025, Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning, https://arxiv.org/abs/2505.17988
- Joel Walsh, Siddarth Mamidanna, Benjamin Nye, Mark Core, and Daniel Auerbach, 6 Aug 2025, Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading, https://arxiv.org/abs/2508.04063
- Ali Taheri Ghahrizjani, Alireza Taban, Qizhou Wang, Shanshan Ye, Abdolreza Mirzaei, Tongliang Liu, Bo Han, 6 Aug 2025, Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning, https://arxiv.org/abs/2508.04329
- Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhihui Cao, Hailiang Pang, Heng Kong, He Yang, Mingxu Chai, Zhilin Gao, Xingyu Liu, Yingnan Fu, Jiaming Liu, Tao Gui, Xuanjing Huang, Yu-Gang Jiang, Qi Zhang, Kang Wang, Yunke Zhang, Yuran Wang, 19 Jul 2025, MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning, https://arxiv.org/abs/2508.03700
- Yanjie Dong, Haijun Zhang, Chengming Li, Song Guo, Victor C. M. Leung, Xiping Hu, 6 Aug 2025, Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches, https://arxiv.org/abs/2408.10691
- Bohao Wu, Qingyun Wang, Yue Guo, 6 Aug 2025, Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning, https://arxiv.org/abs/2505.16227
- Mahdi Nazari Ashani, Ali Asghar Alesheikh, Saba Kazemi, Kimya Kheirkhah, Yasin Mohammadi, Fatemeh Rezaie, Amir Mahdi Manafi, Hedieh Zarkesh, 6 Aug 2025, Fine-Tuning Small Language Models (SLMs) for Autonomous Web-based Geographical Information Systems (AWebGIS), https://arxiv.org/abs/2508.04846
- Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens, 6 Aug 2025, Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning, https://arxiv.org/abs/2508.04848
- Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam, 6 Aug 2025, Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC), https://arxiv.org/abs/2508.04745
- Dai Do, Manh Nguyen, Svetha Venkatesh, Hung Le, 7 Aug 2025, SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models, https://arxiv.org/abs/2508.05015
- Zhongheng Yang, Aijia Sun, Yushang Zhao, Yinuo Yang, Dannier Li, Chengrui Zhou, 7 Aug 2025, RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders, https://arxiv.org/abs/2508.05289
- Younwoo Choi, Muhammad Adil Asif, Ziwen Han, John Willes, Rahul G. Krishnan, 7 Aug 2025, Teaching LLMs How to Learn with Contextual Fine-Tuning, https://arxiv.org/abs/2503.09032
- Jin Khye Tan (Faculty of Computer Science and Information Technology, Universiti Malaya), En Jun Choong, Ethan Jeremiah Chitty, Yan Pheng Choo, John Hsin Yang Wong, Chern Eu Cheah, 4 Aug 2025, Fine-Tuning Vision-Language Models for Markdown Conversion of Financial Tables in Malaysian Audited Financial Reports, https://arxiv.org/abs/2508.05669
- Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng, Zhiying Li, Jian Weng, 6 Aug 2025, DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection, https://arxiv.org/abs/2508.05694
- Han Gao, Timo Hartmann, Botao Zhong, Kai Lia, Hanbin Luo, 5 Aug 2025, Domain-Specific Fine-Tuning and Prompt-Based Learning: A Comparative Study for developing Natural Language-Based BIM Information Retrieval Systems, https://arxiv.org/abs/2508.05676
- Jucheng Hu, Surong Yang, Lijun Wu, Dongzhan Zhou, 8 Aug 2025, DONOD: Efficient and Generalizable Instruction Fine-Tuning for LLMs via Model-Intrinsic Dataset Pruning, https://arxiv.org/abs/2504.14810
- Mahmoud Salhab, Shameed Sait, Mohammad Abusheikh, Hasan Abusheikh, 12 Aug 2025, Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning, https://arxiv.org/abs/2508.08912
- Dong Wang, Haris \v{S}iki\'c, Lothar Thiele, Olga Saukh, 12 Aug 2025, Forget the Data and Fine-Tuning! Just Fold the Network to Compress, https://arxiv.org/abs/2502.10216
- Sajjad Ghiasvand and Haniyeh Ehsani Oskouie and Mahnoosh Alizadeh and Ramtin Pedarsani, 12 Aug 2025, Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models, https://arxiv.org/abs/2505.15130
- Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong, 12 Aug 2025, Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning, https://arxiv.org/abs/2506.03850
- Jan Tauberschmidt, Sophie Fellenz, Sebastian J. Vollmer, Andrew B. Duncan, 5 Aug 2025, Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems, https://arxiv.org/abs/2508.09156
- Bokeng Zheng, Jianqiang Zhong, Jiayi Liu, Xiaoxi Zhang, 13 Aug 2025, Decentralized Rank Scheduling for Energy-Constrained Multi-Task Federated Fine-Tuning in Edge-Assisted IoV Networks, https://arxiv.org/abs/2508.09532
- Zainab Khan, Ahmed Hussain, Mukesh Thakur, Arto Hellas, and Panos Papadimitratos, 12 Aug 2025, NEFMind: Parameter-Efficient Fine-Tuning of Open-Source LLMs for Telecom APIs Automation, https://arxiv.org/abs/2508.09240
- Basile Lewandowski, Robert Birke, Lydia Y. Chen, 14 Aug 2025, Match & Choose: Model Selection Framework for Fine-tuning Text-to-Image Diffusion Models, https://arxiv.org/abs/2508.10993
- Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou, 15 Aug 2025, On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting, https://arxiv.org/abs/2508.11408
- Baihong Qian, Haotian Fan, Wenjie Liao, Yunqiu Wang, Tao Li, and Junhui Cui, 15 Aug 2025, Better Supervised Fine-tuning for VQA: Integer-Only Loss, https://arxiv.org/abs/2508.11170
- Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu, 15 Aug 2025, ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection, https://arxiv.org/abs/2508.11281
- Yuan Li, Zhengzhong Liu, and Eric Xing, 16 Aug 2025, Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models, https://arxiv.org/abs/2508.11953
- Daria Diatlova, Nikita Balagansky, Alexander Varlamov, Egor Spirin, 16 Aug 2025, VARAN: Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks, https://arxiv.org/abs/2508.12061
- Minseon Kim, Jin Myung Kwak, Lama Alssum, Bernard Ghanem, Philip Torr, David Krueger, Fazl Barez, Adel Bibi, 17 Aug 2025, Rethinking Safety in LLM Fine-tuning: An Optimization Perspective, https://arxiv.org/abs/2508.12531
- Yuhao Zhou, Jindi Lv, Yuxin Tian, Dan Si, Qing Ye, Jiancheng Lv, 18 Aug 2025, Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach, https://arxiv.org/abs/2508.12673
- Manning Zhu, Songtao Guo, Pengzhan Zhou, Yansong Ning, Chang Han, Dewen Qiao, 18 Aug 2025, FedSODA: Federated Fine-tuning of LLMs via Similarity Group Pruning and Orchestrated Distillation Alignment, https://arxiv.org/abs/2508.12727
- Julia Sammartino, Libby Barak, Jing Peng, Anna Feldman, 15 Aug 2025, When Does Language Transfer Help? Sequential Fine-Tuning for Cross-Lingual Euphemism Detection, https://arxiv.org/abs/2508.11831
- Shiwei Li, Xiandi Luo, Xing Tang, Haozhao Wang, Hao Chen, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics, https://arxiv.org/abs/2505.23194
- Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu, 15 Aug 2025, GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation, https://arxiv.org/abs/2311.11319
- Keyu Chen, Wenchao Sun, Hao Cheng, Sifa Zheng, 18 Aug 2025, RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation, https://arxiv.org/abs/2505.03344
- Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee, 19 Aug 2025, Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation, https://arxiv.org/abs/2508.14031
- Hassan Barmandah, 19 Aug 2025, Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation, https://arxiv.org/abs/2508.13525
- Eric Nuertey Coleman, Luigi Quarantiello, Ziyue Liu, Qinwen Yang, Samrat Mukherjee, Julio Hurtado and Vincenzo Lomonaco, 19 Aug 2025, Parameter-Efficient Continual Fine-Tuning: A Survey, https://arxiv.org/abs/2504.13822
- Yajie Zhou and Xiaoyi Pang and Zhibo Wang, 20 Aug 2025, AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption, https://arxiv.org/abs/2505.24773
- Xujia Wang, Yunjia Qi, Bin Xu, 20 Aug 2025, LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization, https://arxiv.org/abs/2507.04487
- Mayla R. Boguslav, Adam Kiehl, David Kott, G. Joseph Strecker, Tracy Webb, Nadia Saklou, Terri Ward, Michael Kirby, 20 Aug 2025, Fine-tuning foundational models to code diagnoses from veterinary health records, https://arxiv.org/abs/2410.15186
- Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang, 22 Aug 2025, AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs, https://arxiv.org/abs/2508.16153
- Sungmin Kang, Jisoo Kim, Salman Avestimehr, Sunwoo Lee, 22 Aug 2025, GEM: A Scale-Aware and Distribution-Sensitive Sparse Fine-Tuning Framework for Effective Downstream Adaptation, https://arxiv.org/abs/2508.16191
- Hangzhan Jin, Sicheng Lv, Sifan Wu, Mohammad Hamdaqa, 22 Aug 2025, RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs, https://arxiv.org/abs/2508.16546
- Wenqiao Zhu, Ji Liu, Rongjuncheng Zhang, Haipang Wu, Yulun Zhang, 21 Aug 2025, CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning, https://arxiv.org/abs/2508.15868
- Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani, 21 Aug 2025, Decentralized Low-Rank Fine-Tuning of Large Language Models, https://arxiv.org/abs/2501.15361
- Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao and Wenpin Tang, 21 Aug 2025, Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning, https://arxiv.org/abs/2502.01819
- Jack Youstra, Mohammed Mahfoud, Yang Yan, Henry Sleight, Ethan Perez, Mrinank Sharma, 23 Aug 2025, Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks, https://arxiv.org/abs/2508.17158
- Wenhong Zhu, Ruobing Xie, Rui Wang, Xingwu Sun, Di Wang, Pengfei Liu, 25 Aug 2025, Proximal Supervised Fine-Tuning, https://arxiv.org/abs/2508.17784
- Bin Pan, Shiyu Shen, Zongbin Wang, Zhenwei Shi and Xia Xu, 23 Aug 2025, Preserving Domain Generalization in Fine-Tuning via Joint Parameter Selection, https://arxiv.org/abs/2508.16976
- Haojie Zhang, 24 Aug 2025, DropLoRA: Sparse Low-Rank Adaptation for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2508.17337
Data Sets
Research papers on datasets used for training:
- Sean Williams, James Huckle, 30 May 2024, Easy Problems That LLMs Get Wrong, https://arxiv.org/abs/2405.19616 Code: https://github.com/autogenai/easy-problems-that-llms-get-wrong
- Raghav Jain, Daivik Sojitra, Arkadeep Acharya, Sriparna Saha, Adam Jatowt, Sandipan Dandapat, December 2023, Do Language Models Have a Common Sense regarding Time? Revisiting Temporal Commonsense Reasoning in the Era of Large Language Models, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2023.emnlp-main.418/ PDF: https://aclanthology.org/2023.emnlp-main.418.pdf
- Gayathri Saranathan, Mahammad Parwez Alam, James Lim, Suparna Bhattacharya, Soon Yee Wong, Foltin Martin & Cong Xu, 2024, DELE: Data Efficient LLM Evaluation, Hewlett Packard Labs, Navigating and Addressing Data Problems for Foundation Models (DPFM) Workshop, ICLR 2024, https://openreview.net/pdf?id=I8bsxPWLNF
- You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
- Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, Jan 2024, Understanding LLMs: A Comprehensive Overview from Training to Inference https://arxiv.org/abs/2401.02038
- Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, Nov 2023, A Survey of Large Language Models, https://arxiv.org/abs/2303.18223
- Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan, 26 Feb 2024, MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT, https://arxiv.org/abs/2402.16840 Code: https://github.com/mbzuai-oryx/MobiLlama
- Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly, 29 Jan 2024, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling, https://arxiv.org/abs/2401.16380
- Cobus Greyling, Dec 2023, A Comprehensive Survey of Large Language Models (LLMs), https://cobusgreyling.medium.com/a-comprehensive-survey-of-large-language-models-llms-946a30d9288e
- Ankit Patel, June 14, 2024, NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models, https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
- Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Yejin Choi, Ludwig Schmidt, 17 Jun 2024, MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens, https://arxiv.org/abs/2406.11271
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- NVIDIA, June 2024, Nemotron-4 340B Technical Report, https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf (Architecture is decoder-only with GQA, SentencePiece tokenizer, causal attention masks, RoPE, 96 layers, 96 heads, 8 KV heads, 256,000 vocabulary, 18432 internal dimension, context window 4096, and uses squared RELU.)
- Piotr Skalski, June 20, 2024, Florence-2: Open Source Vision Foundation Model by Microsoft, https://blog.roboflow.com/florence-2/
- Sharon Goldman, August 24, 2024, The hidden reason AI costs are soaring—and it’s not because Nvidia chips are more expensive, https://fortune.com/2024/08/23/data-labeling-ai-scaleai-snorkel-costs/ (The high cost of data labeling.)
- Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
- Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, 20 Feb 2024 (v2), Large Language Models: A Survey, https://arxiv.org/abs/2402.06196
- Reddit Signs AI Content Licensing Deal Ahead of IPO, https://www.bloomberg.com/news/articles/2024-02-16/reddit-is-said-to-sign-ai-content-licensing-deal-ahead-of-ipo?srnd=undefined&sref=b0SdE1lu&tpcc=NL_Marketing
- Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn, Jun 06, 2024, Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data, Epoch AI, https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
- Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
- Georgia Argyro, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou, 10 Sep 2024, Prompt2Fashion: An automatically generated fashion dataset, https://arxiv.org/abs/2409.06442
- Qinzhuo Wu, Weikai Xu, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Shuo Shang, 23 Sep 2024, MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding, https://arxiv.org/abs/2409.14818
- Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
- Pierre-Carl Langlais, Anastasia Stasenko, Catherine Arnett, November 13, 2024, Releasing the largest multilingual open pretraining dataset, https://huggingface.co/blog/Pclanglais/two-trillion-tokens-open
- Arindam Mitra , Ahmed Awadallah , Yash Lara , November 14, 2024, Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/orca-agentinstruct-agentic-flows-can-be-effective-synthetic-data-generators/
- Paul Sawers, Dec 2024, Harvard and Google to release 1 million public-domain books as AI training dataset, https://techcrunch.com/2024/12/12/harvard-and-google-to-release-1-million-public-domain-books-as-ai-training-dataset/
- Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
- Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
- Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
- Ali Forootani, 22 Mar 2025, A Survey on Mathematical Reasoning and Optimization with Large Language Models, https://arxiv.org/abs/2503.17726
- Cameron R. Wolfe, Ph.D., May 19, 2025, A Guide for Debugging LLM Training Data: Data-centric techniques and tools that anyone should use when training an LLM, https://cameronrwolfe.substack.com/p/llm-debugging
- Yi Dong, Yusuke Muraoka, Scott Shi, and Yi Zhang, 14 Aug 2025, MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance, https://arxiv.org/abs/2508.10429
- Haydn Thomas Jones, Natalie Maus, Josh Magnus Ludan, Maggie Ziyu Huan, Jiaming Liang, Marcelo Der Torossian Torres, Jiatao Liang, Zachary Ives, Yoseph Barash, Cesar de la Fuente-Nunez, Jacob R. Gardner, Mark Yatskar, 14 Aug 2025, A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design, https://arxiv.org/abs/2508.10899
- Ziye Deng, Ruihan He, Jiaxiang Liu, Yuan Wang, Zijie Meng, Songtao Jiang, Yong Xie, Zuozhu Liu, 14 Aug 2025, Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset, https://arxiv.org/abs/2508.10528
- Feiran Li, Qianqian Xu, Shilong Bao, Boyu Han, Zhiyong Yang, Qingming Huang, 14 Aug 2025, Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation, https://arxiv.org/abs/2508.10672
- Yuzhuo Xiao, Zeyu Han, Yuhan Wang, Huaizu Jiang, 4 Aug 2025, XFacta: Contemporary, Real-World Dataset and Evaluation for Multimodal Misinformation Detection with Multimodal LLMs, https://arxiv.org/abs/2508.09999
- Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, Kyoobin Lee, 14 Aug 2025, GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes, https://arxiv.org/abs/2504.06866
- Quang-Trung Truong, Yuk-Kwan Wong, Vo Hoang Kim Tuyen Dang, Rinaldi Gotama, Duc Thanh Nguyen, Sai-Kit Yeung, 14 Aug 2025, MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning, https://arxiv.org/abs/2508.04549
- Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma, 23 Jul 2025, Dataset Distillation as Data Compression: A Rate-Utility Perspective, https://arxiv.org/abs/2507.17221
- Md Min-Ha-Zul Abedin and Tazqia Mehrub, 22 Jul 2025, Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset, https://arxiv.org/abs/2507.16952
- Mashiro Toyooka, Kiyoharu Aizawa and Yoko Yamakata, 23 Jul 2025, A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task, https://arxiv.org/abs/2507.17232
- Yuanchen Shi, Biao Ma, Longyin Zhang, and Fang Kong, 23 Jul 2025, Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and Baseline, https://arxiv.org/abs/2405.08427
- David Kurtenbach, Lior Shamir, 15 Jul 2025, An open dataset of neural networks for hypernetwork research, https://arxiv.org/abs/2507.15869
- Morad Tukan, Loay Mualem, Eitan Netzer, Liran Sigalat, 22 Jul 2025, Improving Model Classification by Optimizing the Training Dataset, https://arxiv.org/abs/2507.16729
- Yasser Ashraf, Ahmed Sharshar, Velibor Bojkovic, Bin Gu, 22 Jul 2025, SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities, https://arxiv.org/abs/2507.16151
- Aaron Ho (1), Lorenzo Zanisi (2), Bram de Leeuw (3), Vincent Galvan (1), Pablo Rodriguez-Fernandez (1), Nathaniel T. Howard (1) ((1) MIT Plasma Science and Fusion Center, Cambridge, USA, (2) UKAEA Culham Centre for Fusion Energy, Abingdon, UK, (3) Radboud University, Nijmegen, Netherlands), 21 Jul 2025, Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models, https://arxiv.org/abs/2507.15976
- Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum, 22 Jul 2025, Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning, https://arxiv.org/abs/2507.16746
- Fateme Nateghi Haredasht, Fatemeh Amrollahi, Manoj Maddali, Nicholas Marshall, Stephen P. Ma, Lauren N. Cooper, Andrew O. Johnson, Ziming Wei, Richard J. Medford, Sanjat Kanjilal, Niaz Banaei, Stanley Deresinski, Mary K. Goldstein, Steven M. Asch, Amy Chang, Jonathan H. Chen, 21 Jul 2025, Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs, https://arxiv.org/abs/2503.07664
- Daniel Grimm, Ahmed Abouelazm, J. Marius Z\"ollner, 24 Jul 2025, Goal-based Trajectory Prediction for improved Cross-Dataset Generalization, https://arxiv.org/abs/2507.18196
- Paulo Mendes (1), Eva Maia (1), Isabel Pra\c{c}a (1) ((1) GECAD, ISEP, Polytechnic of Porto, Portugal), 23 Jul 2025, MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection, https://arxiv.org/abs/2507.17978
- Maria Vlachou, 24 Jul 2025, Fashion-AlterEval: A Dataset for Improved Evaluation of Conversational Recommendation Systems with Alternative Relevant Items, https://arxiv.org/abs/2507.18017
- Xuebo Jin, Longfei Gao, Anshuo Tong, Zhengyang Chen, Jianlei Kong, Ning Sun, Huijun Ma, Qiang Wang, Yuting Bai, Tingli Su, 24 Jul 2025, TCM-Tongue: A Standardized Tongue Image Dataset with Pathological Annotations for AI-Assisted TCM Diagnosis, https://arxiv.org/abs/2507.18288
- Minje Park, Jeonghwa Lim, Taehyung Yu, and Sunghoon Joo, 24 Jul 2025, A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation, https://arxiv.org/abs/2507.18323
- Baoyao Yang, Wanyun Li, Dixin Chen, Junxiang Chen, Wenbin Yao, Haifeng Lin, 24 Jul 2025, VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding, https://arxiv.org/abs/2507.18552
- Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim, 24 Jul 2025, SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning, https://arxiv.org/abs/2507.18616
- Sam Gordon James, Miranda Elaine Glynis Armstrong, Aisling Ann O'Kane, Harry Emerson and Zahraa S. Abdallah, 7 May 2025, BrisT1D Dataset: Young Adults with Type 1 Diabetes in the UK using Smartwatches, https://arxiv.org/abs/2507.17757
- Gabriel Jarry, Ramon Dalmau, Philippe Very, Franck Ballerini, Stephania-Denisa Bocu, 24 Jul 2025, GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences, https://arxiv.org/abs/2507.18330
- Charvi Rastogi, Tian Huey Teh, Pushkar Mishra, Roma Patel, Ding Wang, Mark D\'iaz, Alicia Parrish, Aida Mostafazadeh Davani, Zoe Ashwood, Michela Paganini, Vinodkumar Prabhakaran, Verena Rieser, Lora Aroyo, 15 Jul 2025, Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models, https://arxiv.org/abs/2507.13383
- Paul E. Calzada, Zahin Ibnat, Tanvir Rahman, Kamal Kandula, Danyu Lu, Sujan Kumar Saha, Farimah Farahmandi, Mark Tehranipoor, 9 Jul 2025, VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation, https://arxiv.org/abs/2507.13369
- Xiao Wang, Qian Zhu, Shujuan Wu, Bo Jiang, Shiliang Zhang, Yaowei Wang, Yonghong Tian, Bin Luo, 18 Jul 2025, When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework, https://arxiv.org/abs/2507.13659
- Morteza Bodaghi, Majid Hosseini, Raju Gottumukkala, Ravi Teja Bhupatiraju, Iftikhar Ahmad, Moncef Gabbouj, 16 Jul 2025, UL-DD: A Multimodal Drowsiness Dataset Using Video, Biometric Signals, and Behavioral Data, https://arxiv.org/abs/2507.13403
- Hengjie Yu, Kenneth A. Dawson, Haiyun Yang, Shuya Liu, Yan Yan, Yaochu Jin, 18 Jul 2025, A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions, https://arxiv.org/abs/2507.14245
- Daniel Fein, Gabriela Aranguiz-Dias, 18 Jul 2025, Influence Functions for Preference Dataset Pruning, https://arxiv.org/abs/2507.14344
- Refik Samet, Nooshin Nemati, Emrah Hancer, Serpil Sak, Bilge Ayca Kirmizi, Zeynep Yildirim, 18 Jul 2025, MiDeSeC: A Dataset for Mitosis Detection and Segmentation in Breast Cancer Histopathology Images, https://arxiv.org/abs/2507.14271
- Refik Samet, Nooshin Nemati, Emrah Hancer, Serpil Sak, Bilge Ayca Kirmizi, 18 Jul 2025, NuSeC: A Dataset for Nuclei Segmentation in Breast Cancer Histopathology Images, https://arxiv.org/abs/2507.14272
- Deyun Zhang, Xiang Lan, Shijia Geng, Qinghao Zhao, Sumei Fan, Mengling Feng, and Shenda Hong, 21 Jul 2025, MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations, https://arxiv.org/abs/2507.15255
- Shuo Tang, Jian Xu, Jiadong Zhang, Yi Chen, Qizhao Jin, Lingdong Shen, Chenglin Liu, Shiming Xiang, 9 Aug 2025, MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction, https://arxiv.org/abs/2508.06859
- Keyu Li, Mohan Jiang, Dayuan Fu, Yunze Wu, Xiangkun Hu, Dequan Wang, Pengfei Liu, 9 Aug 2025, DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery, https://arxiv.org/abs/2508.06960
- Naseem Machlovi, Maryam Saleki, Innocent Ababio, Ruhul Amin, 9 Aug 2025, Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach, https://arxiv.org/abs/2508.07063
- Xiaoyuan Zhu, Muru Zhang, Ollie Liu, Robin Jia, Willie Neiswanger, 8 Aug 2025, LLM Unlearning Without an Expert Curated Dataset, https://arxiv.org/abs/2508.06595
- Xiaobo Zhang (1 and 2), Congqing He (2), Ying He (1 and 2), Jian Peng (1), Dajie Fu (1), Tien-Ping Tan (2) ((1) School of Information Engineering, Jiangxi Vocational College of Finance & Economics, Jiujiang, China, (2) School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia), 9 Aug 2025, ESNERA: Empirical and semantic named entity alignment for named entity dataset merging, https://arxiv.org/abs/2508.06877
- Muhammad Dehan Al Kautsar, Aswin Candra, Muhammad Alif Al Hakim, Maxalmina Satria Kahfi, Fajri Koto, Alham Fikri Aji, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Genta Indra Winata, 9 Aug 2025, SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages, https://arxiv.org/abs/2508.07069
- Licheng Zhang, Bach Le, Naveed Akhtar, Tuan Ngo, 11 Aug 2025, DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models, https://arxiv.org/abs/2508.07714
- Jinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang and Yanbin Hao, 11 Aug 2025, UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models, https://arxiv.org/abs/2508.07766
- Vojt\v{e}ch Stan\v{e}k, Karel Srna, Anton Firc, Kamil Malinka, 11 Aug 2025, SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis, https://arxiv.org/abs/2508.07944
- Unisha Joshi, 6 Aug 2025, Age-Diverse Deepfake Dataset: Bridging the Age Gap in Deepfake Detection, https://arxiv.org/abs/2508.06552
- Mohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, Nagendra Kumar, 7 Aug 2025, ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos, https://arxiv.org/abs/2508.06570
- Anurag Tripathi, Vaibhav Patle, Abhinav Jain, Ayush Pundir, Sairam Menon, Ajeet Kumar Singh, Dorien Herremans, 11 Aug 2025, End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation, https://arxiv.org/abs/2508.06387
- Bermet Burkanova, Payam Jome Yazdian, Chuxuan Zhang, Trinity Evans, Paige Tutt\"os\'i, Angelica Lim, 25 Jul 2025, Salsa as a Nonverbal Embodied Language -- The CoMPAS3D Dataset and Benchmarks, https://arxiv.org/abs/2507.19684
- Yazeed Alrubyli, Omar Alomeir, Abrar Wafa, Di\'ana Hidv\'egi, Hend Alrasheed, Mohsen Bahrami, 25 Jul 2025, NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology, https://arxiv.org/abs/2507.19697
- Tan-Minh Nguyen, Hoang-Trung Nguyen, Trong-Khoi Dao, Xuan-Hieu Phan, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong, 26 Jul 2025, VLQA: The First Comprehensive, Large, and High-Quality Vietnamese Dataset for Legal Question Answering, https://arxiv.org/abs/2507.19995
- Adrien Bazoge, 28 Jul 2025, MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation, https://arxiv.org/abs/2507.20917
- Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks and Amir Abdullah, 27 Jul 2025, TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research, https://arxiv.org/abs/2503.12730
- Yutong Liu, Ziyue Zhang, Ban Ma-bao, Yuqing Cai, Yongbin Yu, Renzeng Duojie, Xiangxiang Wang, Fan Gao, Cheng Huang, Nyima Tashi, 27 Jul 2025, FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation, https://arxiv.org/abs/2505.14351
- Robin Burchard and Kristof Van Laerhoven, 28 Jul 2025, Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset, https://arxiv.org/abs/2505.20788
- Shenghe Zheng, Qianjia Cheng, Junchi Yao, Mengsong Wu, Haonan He, Ning Ding, Yu Cheng, Shuyue Hu, Lei Bai, Dongzhan Zhou, Ganqu Cui, Peng Ye, 28 Jul 2025, Scaling Physical Reasoning with the PHYSICS Dataset, https://arxiv.org/abs/2506.00022
- Andreas Spilz, Heiko Oppel, Jochen Werner, Kathrin Stucke-Straub, Felix Capanni and Michael Munz, 6 Jun 2025, GAITEX: Human motion dataset from impaired gait and rehabilitation exercises of inertial and optical sensor data, https://arxiv.org/abs/2507.21069
- Ariel E. Stassi, Yanina Boria, J. Mat\'ias Di Martino and Gregory Randall, 7 Jul 2025, iLSU-T: an Open Dataset for Uruguayan Sign Language Translation, https://arxiv.org/abs/2507.21104
- Sheng-Feng Yu, Jia-Jiun Yao, and Wei-Chen Chiu, 29 Jul 2025, Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation, https://arxiv.org/abs/2507.21455
- Basak Demirok, Mucahid Kutlu, Selin Mergen, 29 Jul 2025, MultiAIGCD: A Comprehensive dataset for AI Generated Code Detection Covering Multiple Languages, Models,Prompts, and Scenarios, https://arxiv.org/abs/2507.21693
- Mohammed Baharoon, Luyang Luo, Michael Moritz, Abhinav Kumar, Sung Eun Kim, Xiaoman Zhang, Miao Zhu, Mahmoud Hussain Alabbad, Maha Sbayel Alhazmi, Neel P. Mistry, Kent Ryan Kleinschmidt, Brady Chrisler, Sathvik Suryadevara, Sri Sai Dinesh Jaliparthi, Noah Michael Prudlo, Mark David Marino, Jeremy Palacio, Rithvik Akula, Hong-Yu Zhou, Ibrahim Ethem Hamamci, Scott J. Adams, Hassan Rayhan AlOmaish, Pranav Rajpurkar, 29 Jul 2025, ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports, https://arxiv.org/abs/2507.22030
- Salvatore Sinno, Markus Bertl, Arati Sahoo, Bhavika Bhalgamiya, Thomas Gro{\ss}, Nicholas Chancellor, 29 Jul 2025, Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing, https://arxiv.org/abs/2502.03086
- Xiaoyi Feng, Kaifeng Zou, Caichun Cen, Tao Huang, Hui Guo, Zizhou Huang, Yingli Zhao, Mingqing Zhang, Ziyuan Zheng, Diwei Wang, Yuntao Zou, Dagang Li, 29 Jul 2025, LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering, https://arxiv.org/abs/2506.02733
- Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Ruixing Liang, Yuxin Chen, Adi Chola Venkatesh, Jason Culman, Tiantian Wu, Lirong Shao, Wenqing Sun, Cong Gao, Hallie McNamara, Jingpei Lu, Omid Mohareri, 29 Jul 2025, SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures, https://arxiv.org/abs/2507.00209
- Zhangcheng Qiang, Kerry Taylor, Weiqing Wang, Jing Jiang, 25 Mar 2025, OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching, https://arxiv.org/abs/2503.21813
- Bastien Le Guellec, Kokou Adambounou, Lisa C Adams, Thibault Agripnidis, Sung Soo Ahn, Radhia Ait Chalal, Tugba Akinci D Antonoli, Philippe Amouyel, Henrik Andersson, Raphael Bentegeac, Claudio Benzoni, Antonino Andrea Blandino, Felix Busch, Elif Can, Riccardo Cau, Armando Ugo Cavallo, Christelle Chavihot, Erwin Chiquete, Renato Cuocolo, Eugen Divjak, Gordana Ivanac, Barbara Dziadkowiec Macek, Armel Elogne, Salvatore Claudio Fanni, Carlos Ferrarotti, Claudia Fossataro, Federica Fossataro, Katarzyna Fulek, Michal Fulek, Pawel Gac, Martyna Gachowska, Ignacio Garcia Juarez, Marco Gatti, Natalia Gorelik, Alexia Maria Goulianou, Aghiles Hamroun, Nicolas Herinirina, Krzysztof Kraik, Dominik Krupka, Quentin Holay, Felipe Kitamura, Michail E Klontzas, Anna Kompanowska, Rafal Kompanowski, Alexandre Lefevre, et al. (43 additional authors not shown), 25 Jul 2025, PARROT: An Open Multilingual Radiology Reports Dataset, https://arxiv.org/abs/2507.22939
- Yuto Haneji, Taichi Nishimura, Hirotaka Kameko, Keisuke Shirai, Tomoya Yoshida, Keiya Kajimura, Koki Yamamoto, Taiyu Cui, Tomohiro Nishimoto, Shinsuke Mori, 31 Jul 2025, EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts, https://arxiv.org/abs/2410.05343
- Eylon Caplan, Tania Chakraborty, Dan Goldwasser, 31 Jul 2025, Splits! A Flexible Dataset and Evaluation Framework for Sociocultural Linguistic Investigation, https://arxiv.org/abs/2504.04640
- Thomas Sugg, Kyle O'Brien, Lekh Poudel, Alex Dumouchelle, Michelle Jou, Marc Bosch, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani, 30 Jul 2025, Accenture-NVS1: A Novel View Synthesis Dataset, https://arxiv.org/abs/2503.18711
- Feng Zhu, Zihang Zhang, Kangcheng Teng, Abduhelil Yakup and Xiaohong Zhang, 31 Jul 2025, SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation Research, https://arxiv.org/abs/2507.19079
- Hongjie Chen, Akshay Mehra, Josh Kimball, Ryan A. Rossi, 29 Jul 2025, Measuring Time-Series Dataset Similarity using Wasserstein Distance, https://arxiv.org/abs/2507.22189
- Vanessa Rebecca Wiyono, David Anugraha, Ayu Purwarianti, Genta Indra Winata, 29 Jul 2025, IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian, https://arxiv.org/abs/2507.22159
- Evgeniy I. Sosnin, Yuriy L. Vasilev, Roman A. Solovyev, Aleksandr L. Stempkovskiy, Dmitry V. Telpukhov, Artem A. Vasilev, Aleksandr A. Amerikanov, Aleksandr Y. Romanov, 30 Jul 2025, AlphaDent: A dataset for automated tooth pathology detection, https://arxiv.org/abs/2507.22512
- Lucas Correia, Jan-Christoph Goos, Thomas B\"ack, Anna V. Kononova, 31 Jul 2025, PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series, https://arxiv.org/abs/2411.13951
- Kejia Gao, Liguo Zhou, Mingjun Liu, Alois Knoll, 1 Aug 2025, E2E Parking Dataset: An Open Benchmark for End-to-End Autonomous Parking, https://arxiv.org/abs/2504.10812
- Zihan Zheng, Tianle Cui, Chuwen Xie, Jiahui Zhang, Jiahui Pan, Lewei He, Qianglong Chen, 2 Aug 2025, NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset, https://arxiv.org/abs/2508.01330
- Xuan Liu, Siru Ouyang, Xianrui Zhong, Jiawei Han, Huimin Zhao, 1 Aug 2025, FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models, https://arxiv.org/abs/2508.01055
- Ali Forootani, Raffaele Iervolino, 3 Aug 2025, Asynchronous Federated Learning with non-convex client objective functions and heterogeneous dataset, https://arxiv.org/abs/2508.01675
- Zhihao Zhu, Jiale Han, Yi Yang, 27 Jul 2025, HoneyImage: Verifiable, Harmless, and Stealthy Dataset Ownership Verification for Image Models, https://arxiv.org/abs/2508.00892
- Huyu Wu, Duo Su, Junjie Hou, Guang Li, 2 Aug 2025, Dataset Condensation with Color Compensation, https://arxiv.org/abs/2508.01139
- Han Wang, Zhuoran Wang, Roy Ka-Wei Lee, 3 Aug 2025, HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection, https://arxiv.org/abs/2508.01712
- Runkai Zheng, Vishnu Asutosh Dasu, Yinong Oliver Wang, Haohan Wang, Fernando De la Torre, 3 Aug 2025, Improving Noise Efficiency in Privacy-preserving Dataset Distillation, https://arxiv.org/abs/2508.01749
- Junyi Mo, Jiayu Li, Duo Zhang, Elynn Chen, 3 Aug 2025, ACT-Tensor: Tensor Completion Framework for Financial Dataset Imputation, https://arxiv.org/abs/2508.01861
- Fan Gao, Cheng Huang, Nyima Tashi, Yutong Liu, Xiangxiang Wang, Thupten Tsering, Ban Ma-bao, Renzeg Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, Xiao Feng, Hao Wang, Yongbin Yu, 4 Aug 2025, TIBSTC-CoT: A Multi-Domain Instruction Dataset for Chain-of-Thought Reasoning in Language Models, https://arxiv.org/abs/2508.01977
- Raviraj Joshi, Rakesh Paul, Kanishk Singla, Anusha Kamath, Michael Evans, Katherine Luna, Shaona Ghosh, Utkarsh Vaidya, Eileen Long, Sanjay Singh Chauhan, Niranjan Wartikar, 3 Aug 2025, CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications, https://arxiv.org/abs/2508.01710
- Cuno Sankey-Olsen, Rasmus Hvass Olesen, Tobias Oliver Eberhard, Andreas Triantafyllopoulos, Bj\"orn Schuller, Ilhan Aslan, 4 Aug 2025, Detecting COPD Through Speech Analysis: A Dataset of Danish Speech and Machine Learning Approach, https://arxiv.org/abs/2508.02354
- Nazmun N Khan, Taylor Sweet, Chase A Harvey, Calder Knapp, Dean J. Krusienski, David E Thompson, 4 Aug 2025, The Role of Review Process Failures in Affective State Estimation: An Empirical Investigation of DEAP Dataset, https://arxiv.org/abs/2508.02417
- Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue, 2 Aug 2025, CityNav: A Large-Scale Dataset for Real-World Aerial Navigation, https://arxiv.org/abs/2406.14240
- Zedong Peng, Zeju Li, Mingzhe Gao, Qiang Xu, Chen Zhang, Jieru Zhao, 4 Aug 2025, ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis, https://arxiv.org/abs/2507.03255
- Shaofeng Yin, Ting Lei, Yang Liu, 5 Aug 2025, ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools, https://arxiv.org/abs/2508.03284
- Sai Ma, Zhuang Li, John A Taylor, 5 Aug 2025, Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery, https://arxiv.org/abs/2508.03127
- Kaiwen Zhao, Bharathan Balaji, Stephen Lee, 5 Aug 2025, CF-RAG: A Dataset and Method for Carbon Footprint QA Using Retrieval-Augmented Generation, https://arxiv.org/abs/2508.03489
- Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy L\"owe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl, 5 Aug 2025, The Open DAC 2025 Dataset for Sorbent Discovery in Direct Air Capture, https://arxiv.org/abs/2508.03162
- Abdul Basit, Nouhaila Innan, Muhammad Haider Asif, Minghao Shao, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique, 5 Aug 2025, PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset, https://arxiv.org/abs/2503.02497
- Chenxi Wang, Jizhan Fang, Xiang Chen, Bozhong Tian, Ziwen Xu, Huajun Chen, Ningyu Zhang, 5 Aug 2025, ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems, https://arxiv.org/abs/2503.20756
- Mei Jiang, Houping Yue, Bingdong Li, Hao Hao, Ying Qian, Bo Jiang, and Aimin Zhou, 6 Aug 2025, SID: Benchmarking Guided Instruction Capabilities in STEM Education with a Socratic Interdisciplinary Dialogues Dataset, https://arxiv.org/abs/2508.04563
- Shengchao Chen, Guodong Long, Jing Jiang, 6 Aug 2025, FeDaL: Federated Dataset Learning for Time Series Foundation Models, https://arxiv.org/abs/2508.04045
- Se Won Oh, Hyuntae Jeong, Seungeun Chung, Jeong Mook Lim, Kyoung Ju Noh, Sunkyung Lee, Gyuwon Jung, 18 Jul 2025, Understanding Human Daily Experience Through Continuous Sensing: ETRI Lifelog Dataset 2024, https://arxiv.org/abs/2508.03698
- Xiao Wang, Xufeng Lou, Shiao Wang, Ju Huang, Lan Chen, Bo Jiang, 6 Aug 2025, Long-Term Visual Object Tracking with Event Cameras: An Associative Memory Augmented Tracker and A Benchmark Dataset, https://arxiv.org/abs/2403.05839
- Naba Rizvi, Harper Strickland, Daniel Gitelman, Tristan Cooper, Alexis Morales-Flores, Michael Golden, Aekta Kallepalli, Akshat Alurkar, Haaset Owens, Saleha Ahmedi, Isha Khirwadkar, Imani Munyaka, Nedjma Ousidhoum, 6 Aug 2025, AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context, https://arxiv.org/abs/2410.16520
- Sung-Yeon Park, Can Cui, Yunsheng Ma, Ahmadreza Moradipari, Rohit Gupta, Kyungtae Han, Ziran Wang, 5 Aug 2025, NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models, https://arxiv.org/abs/2503.12772
- Xiao Wang, Haiyang Wang, Shiao Wang, Qiang Chen, Jiandong Jin, Haoyu Song, Bo Jiang, Chenglong Li, 6 Aug 2025, RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework, https://arxiv.org/abs/2504.10018
- Pouyan Navard, Yasemin Ozkut, Srikar Adhikari, Elaine Situ-LaCasse, Josie Acu\~na, Adrienne Yarnish, Alper Yilmaz, 5 Aug 2025, ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound, https://arxiv.org/abs/2508.04735
- Changle Qu, Sunhao Dai, Ke Guo, Liqin Zhao, Yanan Niu, Xiao Zhang, Jun Xu, 7 Aug 2025, KuaiLive: A Real-time Interactive Dataset for Live Streaming Recommendation, https://arxiv.org/abs/2508.05633
- Vladimir Frants, Sos Agaian, 6 Aug 2025, Quaternion-Hadamard Network: A Novel Defense Against Adversarial Attacks with a New Dataset, https://arxiv.org/abs/2502.10452
- Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen, 6 Aug 2025, A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation, https://arxiv.org/abs/2404.03253
- Ingo Ziegler, Abdullatif K\"oksal, Desmond Elliott, Hinrich Sch\"utze, 6 Aug 2025, CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation, https://arxiv.org/abs/2409.02098
- Zekun Liu, Xiaowen Huang, Jitao Sang, 1 Aug 2025, ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations, https://arxiv.org/abs/2508.05667
- Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, Jingkuan Song, 8 Aug 2025, Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation, https://arxiv.org/abs/2508.06426
- Jucheng Hu, Surong Yang, Lijun Wu, Dongzhan Zhou, 8 Aug 2025, DONOD: Efficient and Generalizable Instruction Fine-Tuning for LLMs via Model-Intrinsic Dataset Pruning, https://arxiv.org/abs/2504.14810
- Nikolaos Dionelis, Alessandra Feliciotti, Mattia Marconcini, Devis Peressutti, Nika Oman Kadunc, JaeWan Park, Hagai Raja Sinulingga, Steve Andreas Immanuel, Ba Tran, Caroline Arnold, Nicolas Long\'ep\'e, 8 Aug 2025, Building Age Estimation: A New Multi-Modal Benchmark Dataset and Community Challenge, https://arxiv.org/abs/2502.13818
- Zhenhui Ou, Dawei Li, Zhen Tan, Wenlin Li, Huan Liu, Siyuan Song, 9 Aug 2025, Building Safer Sites: A Large-Scale Multi-Level Dataset for Construction Safety Research, https://arxiv.org/abs/2508.09203
- Yuxiao Wang, Yu Lei, Wolin Liang, Weiying Xue, Zhenao Wei, Nan Zhuang, Qi Liu, 13 Aug 2025, What-Meets-Where: Unified Learning of Action and Contact Localization in a New Dataset, https://arxiv.org/abs/2508.09428
- Amir Hosseinian, Ashkan Dehghani Zahedani, Umer Mansoor, Noosheen Hashemi, Mark Woodward, 13 Aug 2025, January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis, https://arxiv.org/abs/2508.09966
- Grigor Bezirganyan, Sana Sellami, Laure Berti-\'Equille, S\'ebastien Fournier, 13 Aug 2025, LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data, https://arxiv.org/abs/2406.09864
- Chunan Liu, Aurelien Pelissier, Yanjun Shao, Lilian Denzler, Andrew C.R. Martin, Brooks Paige and Mar\'ia Rodr\'iguez Mart\'inez, 13 Aug 2025, AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking, https://arxiv.org/abs/2506.17857
- Angela John, Selvyn Allotey, Till Koebe, Alexandra Tyukavina, Ingmar Weber, 15 Aug 2025, A Global Dataset of Location Data Integrity-Assessed Reforestation Efforts, https://arxiv.org/abs/2508.11349
- Wentao Li, Yonghu He, Kun Gao, Qing Liu and Yali Zheng, 7 Aug 2025, Collaborative Learning-Enhanced Lightweight Models for Predicting Arterial Blood Pressure Waveform in a Large-scale Perioperative Dataset, https://arxiv.org/abs/2508.11669
- Manish Shukla, 17 Aug 2025, Interpreting Time Series Forecasts with LIME and SHAP: A Case Study on the Air Passengers Dataset, https://arxiv.org/abs/2508.12253
- Ananya Singha, Harshita Sahijwani, Walt Williams, Emmanuel Aboah Boateng, Nick Hausman, Miguel Di Luca, Keegan Choudhury, Chaya Binet, Vu Le, Tianwei Chen, Oryan Rokeah Chen, Sulaiman Vesal, Sadid Hasan, 14 Aug 2025, Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs, https://arxiv.org/abs/2508.11715
- Marcel Gregoriadis, Jingwei Kang, Johan Pouwelse, 17 Aug 2025, A Large-Scale Web Search Dataset for Federated Online Learning to Rank, https://arxiv.org/abs/2508.12353
- Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos and Frank Kargl, 19 Aug 2025, Assessing Trustworthiness of AI Training Dataset using Subjective Logic -- A Use Case on Bias, https://arxiv.org/abs/2508.13813
- Jonathan A. Karr Jr., Benjamin F. Herbst, Ting Hua, Matthew Hauenstein, Georgina Curto, Nitesh V. Chawla, 14 Aug 2025, Combating Homelessness Stigma with LLMs: A New Multi-Modal Dataset for Bias Detection, https://arxiv.org/abs/2508.13187
- Hunter McNichols, Fareya Ikram, Andrew Lan, 19 Aug 2025, The StudyChat Dataset: Student Dialogues With ChatGPT in an Artificial Intelligence Course, https://arxiv.org/abs/2503.07928
- Anirudh Sundar, Christopher Richardson, Adar Avsian, Larry Heck, 19 Aug 2025, iTBLS: A Dataset of Interactive Conversations Over Tabular Information, https://arxiv.org/abs/2404.12580
- Chinmoy Biswas, Nafis Faisal, Vivek Chowdhury, Abrar Al-Shadid Abir, Sabir Mahmud, Mithon Rahman, Shaikh Anowarul Fattah, Hafiz Imtiaz, 12 Aug 2025, Load Forecasting on A Highly Sparse Electrical Load Dataset Using Gaussian Interpolation, https://arxiv.org/abs/2508.14069
- Chanyeol Choi, Jihoon Kwon, Alejandro Lopez-Lira, Chaewoon Kim, Minjae Kim, Juneha Hwang, Jaeseon Ha, Hojun Choi, Suyeol Yun, Yongjin Kim, and Yongjae Lee, 7 Aug 2025, FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering, https://arxiv.org/abs/2508.14052
- Sujit Roy, Dinesha V. Hegde, Johannes Schmude, Amy Lin, Vishal Gaur, Rohit Lal, Kshitiz Mandal, Talwinder Singh, Andr\'es Mu\~noz-Jaramillo, Kang Yang, Chetraj Pandey, Jinsu Hong, Berkay Aydin, Ryan McGranaghan, Spiridon Kasapis, Vishal Upendran, Shah Bahauddin, Daniel da Silva, Marcus Freitag, Iksha Gurung, Nikolai Pogorelov, Campbell Watson, Manil Maskey, Juan Bernabe-Moreno, Rahul Ramachandran, 18 Aug 2025, SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction, https://arxiv.org/abs/2508.14107
- Yuzhuo Li, Di Zhao, Tingrui Qiao, Yihao Wu, Bo Pang, Yun Sing Koh, 20 Aug 2025, MetaWild: A Multimodal Dataset for Animal Re-Identification with Environmental Metadata, https://arxiv.org/abs/2501.13368
- Rabeeh Karimi Mahabadi, Sanjeev Satheesh, Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, 20 Aug 2025, Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset, https://arxiv.org/abs/2508.15096
- Manuel Serna-Aguilera, Fiona L. Goggin, Aranyak Goswami, Alexander Bucksch, Suxing Liu, Khoa Luu, 19 Aug 2025, AGP: A Novel Arabidopsis thaliana Genomics-Phenomics Dataset and its HyperGraph Baseline Benchmarking, https://arxiv.org/abs/2508.14934
- Laura De Grazia, Pol Pastells, Mauro V\'azquez Chas, Desmond Elliott, Danae S\'anchez Villegas, Mireia Farr\'us, Mariona Taul\'e, 21 Aug 2025, MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos, https://arxiv.org/abs/2504.11169
- Ruiqi Wu, Yuang Yao, Tengfei Ma, Chenran Zhang, Na Su, Tao Zhou, Geng Chen, Wen Fan, Yi Zhou, 22 Aug 2025, Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning, https://arxiv.org/abs/2508.16129
- Andreas Loizou and Dimitrios Tsoumakos, 22 Aug 2025, Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning, https://arxiv.org/abs/2508.16255
- Anyu Ying, Natarajan Balaji Shankar, Chyi-Jiunn Lin, Mohan Shi, Pu Wang, Hye-jin Shim, Siddhant Arora, Hugo Van hamme, Abeer Alwan, and Shinji Watanabe, 22 Aug 2025, Benchmarking Training Paradigms, Dataset Composition, and Model Scaling for Child ASR in ESPnet, https://arxiv.org/abs/2508.16576
- Jerry Cao-Xue, Tien Comlekoglu, Keyi Xue, Guanliang Wang, Jiang Li, Gordon Laurie, 21 Aug 2025, Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset, https://arxiv.org/abs/2508.15986
- Boran Zhao, Hetian Liu, Zihang Yuan, Li Zhu, Fan Yang, Lina Xie Tian Xia, Wenzhe Zhao, Pengju Ren, 19 Aug 2025, AdapSNE: Adaptive Fireworks-Optimized and Entropy-Guided Dataset Sampling for Edge DNN Training, https://arxiv.org/abs/2508.16647
- Syed Nazmus Sakib, Nafiul Haque, Mohammad Zabed Hossain, and Shifat E. Arman, 23 Aug 2025, PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science, https://arxiv.org/abs/2508.17117
- Siying Zhou, Yiquan Wu, Hui Chen, Xavier Hu, Kun Kuang, Adam Jatowt, Ming Hu, Chunyan Zheng, Fei Wu, 24 Aug 2025, ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation, https://arxiv.org/abs/2508.17234
- Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Joao Manoel Herrera Pinheiro, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker, 24 Aug 2025, A Synthetic Dataset for Manometry Recognition in Robotic Applications, https://arxiv.org/abs/2508.17468
- Yan Cathy Hua, Paul Denny, J\"org Wicker, Katerina Taskova, 23 Aug 2025, EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks, https://arxiv.org/abs/2508.17008
- Dakuan Lu, Xiaoyu Tan, Rui Xu, Tianchu Yao, Chao Qu, Wei Chu, Yinghui Xu, Yuan Qi, 24 Aug 2025, SCP-116K: A High-Quality Problem-Solution Dataset and a Generalized Pipeline for Automated Extraction in the Higher Education Science Domain, https://arxiv.org/abs/2501.15587
- Hua Li, Shijie Lian, Zhiyuan Li, Runmin Cong, Chongyi Li, Laurence T. Yang, Weidong Zhang, Sam Kwong, 25 Aug 2025, Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation, https://arxiv.org/abs/2505.15581
- Andy Bonnetto and Haozhe Qi and Franklin Leong and Matea Tashkovska and Mahdi Rad and Solaiman Shokur and Friedhelm Hummel and Silvestro Micera and Marc Pollefeys and Alexander Mathis, 25 Aug 2025, EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models, https://arxiv.org/abs/2506.01608
- Mika Leo Hube, Filip Lemic, Ethungshan Shitiri, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa P\'erez, 22 Aug 2025, Set Transformer Architectures and Synthetic Data Generation for Flow-Guided Nanoscale Localization, https://arxiv.org/abs/2508.16200
- Rafael Ayll\'on-Gavil\'an, David Guijo-Rubio, Antonio Manuel G\'omez-Orellana, David Guijo-Rubio, Francisco B\'erchez-Moreno, V\'ictor Manuel Vargas-Yun and Pedro A. Guti\'errez, 23 Jul 2025, TOC-UCO: a comprehensive repository of tabular ordinal classification datasets, https://arxiv.org/abs/2507.17348
- Run-Ze Fan and Zengzhi Wang and Pengfei Liu, 22 Jul 2025, MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning, https://arxiv.org/abs/2507.16812
- Varsha Ramineni, Hossein A. Rahmani, Emine Yilmaz, David Barber, 24 Jul 2025, Beyond Internal Data: Constructing Complete Datasets for Fairness Testing, https://arxiv.org/abs/2507.18561
- Ruizhe Chen, Zhiting Fan, Tianze Luo, Heqing Zou, Zhaopeng Feng, Guiyang Xie, Hansheng Zhang, Zhuochen Wang, Zuozhu Liu, Huaijian Zhang, 24 Jul 2025, Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning, https://arxiv.org/abs/2507.18100
- Juhwan Choi, Junehyoung Kwon, JungMin Yun, Seunguk Yu, YoungBin Kim, 24 Jul 2025, VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks, https://arxiv.org/abs/2407.19795
- Xin Gu, Gautam Kamath, Zhiwei Steven Wu, 23 Jul 2025, Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance, https://arxiv.org/abs/2303.01256
- Temiloluwa Prioleau, Baiying Lu, Yanjun Cui, 18 Jul 2025, Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions, https://arxiv.org/abs/2507.14077
- Joanna Komorniczak, 20 Jul 2025, Transforming Datasets to Requested Complexity with Projection-based Many-Objective Genetic Algorithm, https://arxiv.org/abs/2507.15132
- Zihang Ma and Qitian Yin, 21 Jul 2025, Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets, https://arxiv.org/abs/2507.15784
- Mohammed Alkhowaiter, Norah Alshahrani, Saied Alshahrani, Reem I. Masoud, Alaa Alzahrani, Deema Alnuhait, Emad A. Alghamdi, Khalid Almubarak, 19 Jul 2025, Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations, https://arxiv.org/abs/2507.14688
- Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, and Kuk-Jin Yoon, 21 Jul 2025, Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning, https://arxiv.org/abs/2507.04790
- Bartlomiej Chybowski, Shima Abdullateef, Hollan Haule, Alfredo Gonzalez-Sulser, Javier Escudero, 10 Aug 2025, PySeizure: A single machine learning classifier framework to detect seizures in diverse datasets, https://arxiv.org/abs/2508.07253
- Cem Ata Baykara, Saurav Raj Pandey, Ali Burak \"Unal, Harlin Lee, and Mete Akg\"un, 11 Aug 2025, Federated Learning for Epileptic Seizure Prediction Across Heterogeneous EEG Datasets, https://arxiv.org/abs/2508.08159
- Cristian Cosentino, Annamaria Defilippo, Marco Dossena, Christopher Irwin, Sara Joubbi and Pietro Li\`o, 10 Aug 2025, HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways, https://arxiv.org/abs/2508.07308
- Sarina Penquitt, Jonathan Klees, Rinor Cakaj, Daniel Kondermann, Matthias Rottmann, Lars Schmarje, 6 Aug 2025, From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets, https://arxiv.org/abs/2508.06556
- Yuya Kawakami, Daniel Cayan, Dongyu Liu, and Kwan-Liu Ma, 8 Aug 2025, ClimateSOM: A Visual Analysis Workflow for Climate Ensemble Datasets, https://arxiv.org/abs/2508.06732
- Sajjad Rezvani Boroujeni, Hossein Abedi, Tom Bush, 29 Jul 2025, Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control, https://arxiv.org/abs/2505.03134
- Nicolas Lapautre, Maria Marchenko, Carlos Miguel Pati\~no, Xin Zhou, 14 Aug 2025, Natively Trainable Sparse Attention for Hierarchical Point Cloud Datasets, https://arxiv.org/abs/2508.10758
- Yuchang Zhu, Huizhe Zhang, Bingzhe Wu, Jintang Li, Zibin Zheng, Peilin Zhao, Liang Chen, Yatao Bian, 14 Aug 2025, Measuring Diversity in Synthetic Datasets, https://arxiv.org/abs/2502.08512
- Fabrizio Nunnari, Alakshendra Jyotsnaditya Ramkrishna Singh, Patrick Gebhard, 27 Jul 2025, Color histogram equalization and fine-tuning to improve expression recognition of (partially occluded) faces on sign language datasets, https://arxiv.org/abs/2507.20197
- Gabriel Downer, Sean Craven, Damian Ruck, Jake Thomas, 28 Jul 2025, Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models, https://arxiv.org/abs/2507.20704
- Aria Salari, Abtin Djavadifar, Xiangrui Liu, Homayoun Najjaran, 30 Jul 2025, Object Recognition Datasets and Challenges: A Review, https://arxiv.org/abs/2507.22361
- Farid Ariai, Joel Mackenzie and Gianluca Demartini, 30 Jul 2025, Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges, https://arxiv.org/abs/2410.21306
- Maziyar Panahi, 3 Aug 2025, OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets, https://arxiv.org/abs/2508.01630
- Kenneth Enevoldsen, Kristian N{\o}rgaard Jensen, Jan Kostkan, Bal\'azs Szab\'o, M\'arton Kardos, Kirten Vad, Andrea Blasi N\'u\~nez, Gianluca Barmina, Jacob Nielsen, Rasmus Larsen, Peter Vahlstrup, Per M{\o}ldrup Dalum, Desmond Elliott, Lukas Galke, Peter Schneider-Kamp, Kristoffer Nielbo, 4 Aug 2025, Dynaword: From One-shot to Continuously Developed Datasets, https://arxiv.org/abs/2508.02271
- Bhavesh Neekhra, Debayan Gupta, Partha Pratim Chakravarti, 5 Aug 2025, On the (In)Significance of Feature Selection in High-Dimensional Datasets, https://arxiv.org/abs/2508.03593
- J. Alex Hurt, Trevor M. Bajkowski, Grant J. Scott, Curt H. Davis, 4 Aug 2025, Evaluation and Analysis of Deep Neural Transformers and Convolutional Neural Networks on Modern Remote Sensing Datasets, https://arxiv.org/abs/2508.02871
- Wesley Brewer, Murali Meena Gopalakrishnan, Matthias Maiterth, Aditya Kashi, Jong Youl Choi, Pei Zhang, Stephen Nichols, Riccardo Balin, Miles Couchman, Stephen de Bruyn Kops, P.K. Yeung, Daniel Dotson, Rohini Uma-Vaideswaran, Sarp Oral, Feiyi Wang, 5 Aug 2025, Intelligent Sampling of Extreme-Scale Turbulence Datasets for Accurate and Efficient Spatiotemporal Model Training, https://arxiv.org/abs/2508.03872
- Wei Liu, Zhongyu Niu, Lang Gao, Zhiying Deng, Jun Wang, Haozhao Wang, Ruixuan Li, 6 Aug 2025, Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets, https://arxiv.org/abs/2505.02118
- Burak Can Kaplan, Hugo Cesar De Castro Carneiro, Stefan Wermter, 7 Aug 2025, Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?, https://arxiv.org/abs/2508.05474
- Minwoo Oh, Minsu Park, Eunil Park, 8 Aug 2025, Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline, https://arxiv.org/abs/2504.21772
- Connor Wilhelm, Dan Ventura, 12 Aug 2025, Distilling Reinforcement Learning into Single-Batch Datasets, https://arxiv.org/abs/2508.09283
- Viacheslav Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller, 13 Aug 2025, Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?, https://arxiv.org/abs/2508.09888
- Simon Kl\"uttermann, Emmanuel M\"uller, 13 Aug 2025, Rare anomalies require large datasets: About proving the existence of anomalies, https://arxiv.org/abs/2508.09894
- Aishik Mandal, Prottay Kumar Adhikary, Hiba Arnaout, Iryna Gurevych, Tanmoy Chakraborty, 13 Aug 2025, A Comprehensive Survey of Datasets for Clinical Mental Health AI Systems, https://arxiv.org/abs/2508.09809
- Lingyu Chen, Yawen Zeng, Yue Wang, Peng Wan, Guo-chen Ning, Hongen Liao, Daoqiang Zhang, Fang Chen, 13 Aug 2025, COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets, https://arxiv.org/abs/2508.09886
- Sai Krishna Mendu, Harish Yenala, Aditi Gulati, Shanu Kumar, Parag Agrawal, 12 Aug 2025, Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs, https://arxiv.org/abs/2505.02009
- Gauri Jain, Dominik Rothenh\"ausler, Kirk Bansak, Elisabeth Paulson, 15 Aug 2025, CTRL Your Shift: Clustered Transfer Residual Learning for Many Small Datasets, https://arxiv.org/abs/2508.11144
- SeungBum Ha, Taehwan Lee, Jiyoun Lim, Sung Whan Yoon, 17 Aug 2025, Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation, https://arxiv.org/abs/2412.10436
- Mizuki Ohira, Toshimichi Saito, 17 Aug 2025, A Recurrent Neural Network based Clustering Method for Binary Data Sets in Education, https://arxiv.org/abs/2508.13224
- Wanjun Hu, 19 Aug 2025, Typed Topological Structures Of Datasets, https://arxiv.org/abs/2508.14008
- Haohang Xu, Chengjie Liu, Qihang Wang, Wenhao Huang, Yongjian Xu, Weiyu Chen, Anlan Peng, Zhijun Li, Bo Li, Lei Qi, Jun Yang, Yuan Du, and Li Du, 27 Jun 2025, Image2Net: Datasets, Benchmark and Hybrid Framework to Convert Analog Circuit Diagrams into Netlists, https://arxiv.org/abs/2508.13157
- Qian Zhanga, Ruilin Zhang, Jun Xiao, Yifan Liu and Zhe Wang, 12 Aug 2025, MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets, https://arxiv.org/abs/2508.14073
- Ishaan Mahapatra and Nihar R. Mahapatra, 14 Aug 2025, Systematic FAIRness Assessment of Open Voice Biomarker Datasets for Mental Health and Neurodegenerative Diseases, https://arxiv.org/abs/2508.14089
- Corinna Coupette and Jeremy Wayland and Emily Simons and Bastian Rieck, 20 Aug 2025, No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets, https://arxiv.org/abs/2502.02379
- Sen Yan, Chinmaya Kaundanya, Noel E. O'Connor, Suzanne Little, Mingming Liu, 22 Aug 2025, Machine Learning in Micromobility: A Systematic Review of Datasets, Techniques, and Applications, https://arxiv.org/abs/2508.16135
- Sridevi Bonthu, S.Rama Sree, M.H.M. Krishna Prasad, 19 Aug 2025, Statistical Comparative Analysis of Semantic Similarities and Model Transferability Across Datasets for Short Answer Grading, https://arxiv.org/abs/2508.15837
- Julian Oestreich and Lydia M\"uller, 21 Aug 2025, Evaluating Structured Decoding for Text-to-Table Generation: Evidence from Three Datasets, https://arxiv.org/abs/2508.15910
- Andreas Loizou and Dimitrios Tsoumakos, 22 Aug 2025, Analytics Modelling over Multiple Datasets using Vector Embeddings, https://arxiv.org/abs/2502.17060
- Aaron Rodrigues, Mahmood Hegazy and Azzam Naeem, 22 Aug 2025, Enhancing and Scaling Search Query Datasets for Recommendation Systems, https://arxiv.org/abs/2505.11176
- Nikolaos Pavlidis, Vasilis Perifanis, Symeon Symeonidis, Pavlos S. Efraimidis, 24 Aug 2025, Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets, https://arxiv.org/abs/2508.17391
- Prashant Gupta, 23 Aug 2025, Learning ON Large Datasets Using Bit-String Trees, https://arxiv.org/abs/2508.17083
- Sarina Penquitt, Tobias Riedlinger, Timo Heller, Markus Reischl, Matthias Rottmann, 25 Aug 2025, Learning to Detect Label Errors by Making Them: A Method for Segmentation and Object Detection Datasets, https://arxiv.org/abs/2508.17930
- Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl and Tobias R\"oddiger, 12 Aug 2025, WHAR Datasets: An Open Source Library for Wearable Human Activity Recognition, https://arxiv.org/abs/2508.16604
- Milad Hoseinpour, Vladimir Dvorkin, 25 Aug 2025, Constrained Diffusion Models for Synthesizing Representative Power Flow Datasets, https://arxiv.org/abs/2506.11281
Synthetic Data
Research paper on LLM-generated synthetic data for training:
- Skurzhanskyi, O.H., Marchenko, O.O. & Anisimov, A.V., 2024, Specialized Pre-Training of Neural Networks on Synthetic Data for Improving Paraphrase Generation. Cybern Syst Anal 2024 https://doi.org/10.1007/s10559-024-00658-7 https://link.springer.com/article/10.1007/s10559-024-00658-7
- Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly, 29 Jan 2024, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling, https://arxiv.org/abs/2401.16380
- André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, Ian Foster, 4 Jan 2024, Comprehensive Exploration of Synthetic Data Generation: A Survey https://arxiv.org/abs/2401.02524
- Ankit Patel, June 14, 2024, NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models, https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
- David Spuler, March 2024, Chapter 45. Knowledge Distillation, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- A Gudibande, E Wallace, C Snell, X Geng, H Liu 2023, The false promise of imitating proprietary llms, https://arxiv.org/abs/2305.15717
- Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang 2023, Aligning large language models with human: A survey, https://arxiv.org/abs/2307.12966
- Y Gu, L Dong, F Wei, M Huang, 2023, Knowledge Distillation of Large Language Models, https://arxiv.org/abs/2306.08543
- X Wan, R Sun, H Dai, SO Arik, T Pfister, 2023, Better zero-shot reasoning with self-adaptive prompting, https://arxiv.org/abs/2305.14106
- S Horawalavithana, S Munikoti, I Stewart, 2023, SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions, https://arxiv.org/abs/2307.01139
- X Daull, P Bellot, E Bruno, V Martin, 2023, Complex QA and language models hybrid architectures, Survey, https://arxiv.org/abs/2302.09051
- Z Yuan, J Liu, Q Zi, M Liu, X Peng, Y Lou, 2023, Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation, https://arxiv.org/abs/2308.01240
- W AlShikh, M Daaboul, K Goddard, B Imel, 2023, Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning, https://arxiv.org/abs/2307.03692
- Z He, Z Xie, R Jha, H Steck, D Liang, Y Feng, 2023, Large Language Models as Zero-Shot Conversational Recommenders, https://arxiv.org/abs/2308.10053
- NVIDIA, June 2024, Nemotron-4 340B Technical Report, https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf (Architecture is decoder-only with GQA, SentencePiece tokenizer, causal attention masks, RoPE, 96 layers, 96 heads, 8 KV heads, 256,000 vocabulary, 18432 internal dimension, context window 4096, and uses squared RELU.)
- Michael Nuñez, July 18, 2024, Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling, https://venturebeat.com/ai/groq-open-source-llama-ai-model-tops-leaderboard-outperforming-gpt-4o-and-claude-in-function-calling/
- Louie Peters, Aug 27, 2024, Two Paths to Small LMs? Synthetic Data (Phi 3.5) vs Pruning & Distillation (Llama-3.1-Minitron), https://newsletter.towardsai.net/p/114-two-paths-to-small-lms-synthetic
- Aatish Bhatia, Aug. 25, 2024, When A.I.’s Output Is a Threat to A.I. Itself: As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results, NY Times, https://www.nytimes.com/interactive/2024/08/26/upshot/ai-synthetic-data.html
- Shumailov, I., Shumaylov, Z., Zhao, Y. et al. 2024, AI models collapse when trained on recursively generated data. Nature 631, 755–759. https://doi.org/10.1038/s41586-024-07566-y https://www.nature.com/articles/s41586-024-07566-y
- Damien Ferbach, Quentin Bertrand, Avishek Joey Bose, Gauthier Gidel, 12 Jun 2024, Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences, https://arxiv.org/abs/2407.09499
- Ryan McNeal, Aug 27, 2024, ChatGPT and GPT-4 could get a sweet upgrade this fall with 'strawberry', https://www.androidauthority.com/openai-strawberry-ai-3475682/
- Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai, 10 Aug 2024 (v2), Best Practices and Lessons Learned on Synthetic Data, https://arxiv.org/abs/2404.07503
- Georgia Argyro, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou, 10 Sep 2024, Prompt2Fashion: An automatically generated fashion dataset, https://arxiv.org/abs/2409.06442
- Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli, 12 Sep 2024, Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources, https://arxiv.org/abs/2409.08239
- Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi, 29 Aug 2024, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737
- Ulyana Piterbarg, Lerrel Pinto, Rob Fergus, 3 Oct 2024, Training Language Models on Synthetic Edit Sequences Improves Code Synthesis, https://arxiv.org/abs/2410.02749
- Ke Wang, Jiahui Zhu, Minjie Ren, Zeming Liu, Shiwei Li, Zongye Zhang, Chenkai Zhang, Xiaoyu Wu, Qiqi Zhan, Qingjie Liu, Yunhong Wang, 16 Oct 2024, A Survey on Data Synthesis and Augmentation for Large Language Models, https://arxiv.org/abs/2410.12896
- Ran Xu, Hui Liu, Sreyashi Nag, Zhenwei Dai, Yaochen Xie, Xianfeng Tang, Chen Luo, Yang Li, Joyce C. Ho, Carl Yang, Qi He, 23 Oct 2024, SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains, https://arxiv.org/abs/2410.17952
- Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, (and many more authors), 4 Nov 2024, Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, https://arxiv.org/abs/2411.02265 https://github.com/Tencent/Hunyuan-Large https://huggingface.co/tencent/Tencent-Hunyuan-Large
- Arindam Mitra , Ahmed Awadallah , Yash Lara , November 14, 2024, Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/orca-agentinstruct-agentic-flows-can-be-effective-synthetic-data-generators/
- Seungone Kim, Juyoung Suk, Xiang Yue, Vijay Viswanathan, Seongyun Lee, Yizhong Wang, Kiril Gashteovski, Carolin Lawrence, Sean Welleck, Graham Neubig, 4 Dec 2024, Evaluating Language Models as Synthetic Data Generators, https://arxiv.org/abs/2412.03679
- Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
- Xiang Huang, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang, Yuzhong Qu, 27 Dec 2024, TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data, https://arxiv.org/abs/2412.19544?
- Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
- FZ Subah, Oct 2025, Mitigating and Assessing Bias and Fairness in Large Language Model-Generated Synthetic Tabular Data, Masters Thesis, Department of Engineering, University of Cambridge, https://www.mlmi.eng.cam.ac.uk/files/2023-2024/fzs21_mitigating_2024.pdf
- Chetan Harsha, Karmvir Singh Phogat, Sridhar Dasaratha, Sai Akhil Puranam, Shashishekar Ramakrishna, Jan 2025, Synthetic Data Generation Using Large Language Models for Financial Question Answering, Proceedings of the Joint Workshop of the 9th FinNLP, the 6th FNP, and the 1st LLMFinLegal, pages 76–95 January 19–20, 2025, Association for Computational Linguistics, https://aclanthology.org/2025.finnlp-1.7.pdf
- Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen, 25 Jan 2025, LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion, https://arxiv.org/abs/2501.15089
- Minsang Kim, Seungjun Baek, 6 Feb 2025, Syntriever: How to Train Your Retriever with Synthetic Data from LLMs, https://arxiv.org/abs/2502.03824
- Hanmeng Liu, Zhizhang Fu, Mengru Ding, Ruoxi Ning, Chaoli Zhang, Xiaozhang Liu, Yue Zhang, 13 Feb 2025, Logical Reasoning in Large Language Models: A Survey, https://arxiv.org/abs/2502.09100
- Joshua Ong Jun Leang, Giwon Hong, Wenda Li, Shay B. Cohen, 18 Feb 2025, Theorem Prover as a Judge for Synthetic Data Generation, https://arxiv.org/abs/2502.13137
- Maria Korolov, Jun 25, 2025, 7 ways synthetic data creates business value, https://www.cio.com/article/4003262/7-ways-synthetic-data-creates-business-value.html
- Ali Zolnour, Hossein Azadmaleki, Yasaman Haghbin, Fatemeh Taherinezhad, Mohamad Javad Momeni Nezhad, Sina Rashidi, Masoud Khani, AmirSajjad Taleban, Samin Mahdizadeh Sani, Maryam Dadkhah, James M. Noble, Suzanne Bakken, Yadollah Yaghoobzadeh, Abdol-Hossein Vahabie, Masoud Rouhizadeh, Maryam Zolnoori, 8 Aug 2025, LLMCARE: Alzheimer's Detection via Transformer Models Enhanced by LLM-Generated Synthetic Data, https://arxiv.org/abs/2508.10027
- Nitin Rai, Nathan S. Boyd, Gary E. Vallad, Arnold W. Schumann, 13 Aug 2025, Improving watermelon (Citrullus lanatus) disease classification with generative artificial intelligence (GenAI)-based synthetic and real-field images via a custom EfficientNetV2-L model, https://arxiv.org/abs/2508.10156
- Yuchang Zhu, Huizhe Zhang, Bingzhe Wu, Jintang Li, Zibin Zheng, Peilin Zhao, Liang Chen, Yatao Bian, 14 Aug 2025, Measuring Diversity in Synthetic Datasets, https://arxiv.org/abs/2502.08512
- Jessup Byun, Xiaofeng Lin, Joshua Ward, Guang Cheng, 22 Jul 2025, Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation, https://arxiv.org/abs/2507.17066
- \'Alvaro Ruiz-R\'odenas, Jaime Pujante S\'aez, Daniel Garc\'ia-Algora, Mario Rodr\'iguez B\'ejar, Jorge Blasco and Jos\'e Luis Hern\'andez-Ramos, 21 Jul 2025, SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique Mapping, https://arxiv.org/abs/2507.16852
- Rishemjit Kaur, Arshdeep Singh Bhankhar, Surangika Ranathunga, Jashanpreet Singh Salh, Sudhir Rajput, Vidhi, Kashish Mahendra, Bhavika Berwal, Ritesh Kumar, 22 Jul 2025, Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain, https://arxiv.org/abs/2507.16974
- Yifan Wang, Runjin Chen, Bolian Li, David Cho, Yihe Deng, Ruqi Zhang, Tianlong Chen, Zhangyang Wang, Ananth Grama, Junyuan Hong, 22 Jul 2025, More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment, https://arxiv.org/abs/2504.02193
- Shreya Saxena, Siva Prasad, Zishan Ahmad, Vishal Vaddina, 22 Jul 2025, ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training, https://arxiv.org/abs/2507.16478
- Ivona Krchova, Michael Platzer, Paul Tiwald, 22 Jul 2025, Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling, https://arxiv.org/abs/2507.16419
- Alireza Dizaji, Benedict Aaron Tjandra, Mehrab Hamidi, Shenyang Huang, Guillaume Rabusseau, 22 Jul 2025, T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs, https://arxiv.org/abs/2507.10183
- Hoyeon Lee, Sejung Son, Ye-Eun Kang, Jong-Hwan Kim, 24 Jul 2025, Synthetic Data Generation for Phrase Break Prediction with Large Language Model, https://arxiv.org/abs/2507.18044
- Basel Alshaikhdeeb, Ahmed Abdelmonem Hemedan, Soumyabrata Ghosh, Irina Balaur, and Venkata Satagopam, 24 Jul 2025, Generation of Synthetic Clinical Text: A Systematic Review, https://arxiv.org/abs/2507.18451
- Zhengyun Zhao, Huaiyuan Ying, Yue Zhong, Sheng Yu, 24 Jul 2025, DR.EHR: Dense Retrieval for Electronic Health Record with Knowledge Injection and Synthetic Data, https://arxiv.org/abs/2507.18583
- Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim, 24 Jul 2025, SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning, https://arxiv.org/abs/2507.18616
- Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim, 24 Jul 2025, SIDA: Synthetic Image Driven Zero-shot Domain Adaptation, https://arxiv.org/abs/2507.18632
- Tevin Atwal, Chan Nam Tieu, Yefeng Yuan, Zhan Shi, Yuhong Liu, Liang Cheng, 24 Jul 2025, Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs, https://arxiv.org/abs/2507.18055
- Yefeng Yuan, Yuhong Liu, Liang Cheng, 24 Jul 2025, A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models, https://arxiv.org/abs/2404.14445
- Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp, 24 Jul 2025, Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation, https://arxiv.org/abs/2506.11790
- Keito Inoshita, Rushia Harada, 15 Jul 2025, Persona-Based Synthetic Data Generation Using Multi-Stage Conditioning with Large Language Models for Emotion Recognition, https://arxiv.org/abs/2507.13380
- Junsu Kim, Yunhoe Ku, Seungryul Baek, 18 Jul 2025, Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning, https://arxiv.org/abs/2507.13739
- Matthew A. Chan, Casey J. Pellizzari, Christopher A. Metzler, 17 Jul 2025, Inverse Synthetic Aperture Fourier Ptychography, https://arxiv.org/abs/2507.03733
- Claudio Giusti, Luca Guarnera, Mirko Casu, Sebastiano Battiato, 19 Jul 2025, Fraud is Not Just Rarity: A Causal Prototype Attention Approach to Realistic Synthetic Oversampling, https://arxiv.org/abs/2507.14706
- Anh Nguyen, Sam Schafft, Nicholas Hale, John Alfaro, 21 Jul 2025, FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs, https://arxiv.org/abs/2507.15839
- Pan Peng, Hangyu Xu, 20 Jul 2025, Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts, https://arxiv.org/abs/2507.14835
- Zijian Ding, Tung Nguyen, Weikai Li, Aditya Grover, Yizhou Sun, Jason Cong, 19 Jul 2025, Iceberg: Enhancing HLS Modeling with Synthetic Data, https://arxiv.org/abs/2507.09948
- Rohit Kundu, Shan Jia, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury, 19 Jul 2025, TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data, https://arxiv.org/abs/2503.15867
- Yewon Byun, Shantanu Gupta, Zachary C. Lipton, Rachel Leah Childers, Bryan Wilder, 8 Aug 2025, Using Imperfect Synthetic Data in Downstream Inference Tasks, https://arxiv.org/abs/2508.06635
- Andrey Sidorenko and Paul Tiwald, 8 Aug 2025, Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN, https://arxiv.org/abs/2508.06647
- Sabrina Namazova, Alessandra Brondetta, Younes Strittmatter, Matthew Nassar, Sebastian Musslick, 11 Aug 2025, Not Yet AlphaFold for the Mind: Evaluating Centaur as a Synthetic Participant, https://arxiv.org/abs/2508.07887
- Raunak Narwal and Syed Abbas, 10 Aug 2025, BIGBOY1.2: Generating Realistic Synthetic Data for Disease Outbreak Modelling and Analytics, https://arxiv.org/abs/2508.07239
- Ethan Lo and Dan C. Lo, 18 Jul 2025, Exoplanet Detection Using Machine Learning Models Trained on Synthetic Light Curves, https://arxiv.org/abs/2507.19520
- Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Zexue He, Shafiq Abedin, Jennifer Sun, Ben Wiesel, Eli Schwartz, Ahmed Nassar, Bo Wu, Assaf Arbelle, Aude Oliva, Dan Gutfreund, Leonid Karlinsky, Rogerio Feris, 31 May 2025, ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation, https://arxiv.org/abs/2507.19492
- Tao Lian, Jose L. G\'omez, Antonio M. L\'opez, 26 Jul 2025, FedS2R: One-Shot Federated Domain Generalization for Synthetic-to-Real Semantic Segmentation in Autonomous Driving, https://arxiv.org/abs/2507.19881
- Pavel Korshunov, Ketan Kotwal, Christophe Ecabert, Vidit Vidit, Amir Mohammadi, and Sebastien Marcel, 28 Jul 2025, Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data, https://arxiv.org/abs/2507.20782
- Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, 25 Jul 2025, Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task, https://arxiv.org/abs/2310.09336
- Yixin Wu, Feiran Zhang, Tianyuan Shi, Ruicheng Yin, Zhenghua Wang, Zhenliang Gan, Xiaohua Wang, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 28 Jul 2025, Explainable Synthetic Image Detection through Diffusion Timestep Ensembling, https://arxiv.org/abs/2503.06201
- Satyananda Kashyap, Sola Shirai, Nandana Mihindukulasooriya, Horst Samulowitz, 28 Jul 2025, StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation, https://arxiv.org/abs/2507.21340
- Yida Tao, Yen-Chia Hsu, 29 Jul 2025, Bridging Synthetic and Real-World Domains: A Human-in-the-Loop Weakly-Supervised Framework for Industrial Toxic Emission Segmentation, https://arxiv.org/abs/2507.22002
- Ping Yu, Jack Lanchantin, Tianlu Wang, Weizhe Yuan, Olga Golovneva, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Jing Xu, 31 Jul 2025, CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks, https://arxiv.org/abs/2507.23751
- Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen, 31 Jul 2025, Continual Learning with Synthetic Boundary Experience Blending, https://arxiv.org/abs/2507.23534
- Jessica Bader, Leander Girrbach, Stephan Alaniz, Zeynep Akata, 31 Jul 2025, SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions, https://arxiv.org/abs/2507.23784
- Patricia A. Apell\'aniz and Ana Jim\'enez and Borja Arroyo Galende and Juan Parras and Santiago Zazo, 31 Jul 2025, Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios, https://arxiv.org/abs/2407.03080
- Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi, Boris Ginsburg, 30 Jul 2025, Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning, https://arxiv.org/abs/2502.13820
- Georgi Ganev and Meenatchi Sundaram Muthu Selva Annamalai and Sofiane Mahiou and Emiliano De Cristofaro, 29 Jul 2025, The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data, https://arxiv.org/abs/2504.06923
- Tom Or and Omri Azencot (Ben Gurion University of the Negev), 1 Aug 2025, Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics, https://arxiv.org/abs/2508.00784
- Ivona Krchova, Mariana Vargas Vieyra, Mario Scriminaci, Andrey Sidorenko, 1 Aug 2025, Democratizing Tabular Data Access with an Open$\unicode{x2013}$Source Synthetic$\unicode{x2013}$Data SDK, https://arxiv.org/abs/2508.00718
- Jianwei Wang, Ziming Wu, Fuming Lai, Shaobing Lian, Ziqian Zeng, 1 Aug 2025, SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought, https://arxiv.org/abs/2508.00574
- Abdulmajid Murad, Massimiliano Ruocco, 4 Aug 2025, Pre-Tactical Flight-Delay and Turnaround Forecasting with Synthetic Aviation Data, https://arxiv.org/abs/2508.02294
- Ahmad Rezaie Mianroodi, Amirali Rezaie, Niko Grisel Todorov, Cyril Rakovski, Frank Rudzicz, 2 Aug 2025, MedSynth: Realistic, Synthetic Medical Dialogue-Note Pairs, https://arxiv.org/abs/2508.01401
- Vinicius Lima, Dzung T. Phan, Jayant Kalagnanam, Dhaval Patel, Nianjun Zhou, 5 Aug 2025, Toward a Trustworthy Optimization Modeling Agent via Verifiable Synthetic Data Generation, https://arxiv.org/abs/2508.03117
- Oc\'eane Doremus, Ariel Guerra-Adames, Marta Avalos-Fernandez, Vianney Jouhet, C\'edric Gil-Jardin\'e, Emmanuel Lagarde, 4 Aug 2025, Synthetic medical data generation: state of the art and application to trauma mechanism classification, https://arxiv.org/abs/2508.02771
- Shifeng Xie, Vasilii Feofanov, Marius Alonso, Ambroise Odonnat, Jianfeng Zhang, Themis Palpanas, and Ievgen Redko, 4 Aug 2025, CauKer: classification time series foundation models can be pretrained on synthetic data only, https://arxiv.org/abs/2508.02879
- Yongyi Wang, Lingfeng Li, Bozhou Chen, Ang Li, Hanyu Liu, Qirui Zheng, Xionghui Yang, Wenxin Li, 6 Aug 2025, Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling, https://arxiv.org/abs/2508.04282
- George Bredis, Stanislav Dereka, Viacheslav Sinii, Ruslan Rakhimov, Daniil Gavrilov, 6 Aug 2025, Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success, https://arxiv.org/abs/2508.04280
- Mohd Ashhad and Ricardo Henao, 5 Aug 2025, Generating Accurate Synthetic Survival Data by Conditioning on Outcomes, https://arxiv.org/abs/2405.17333
- Yunbo Long, Liming Xu, Alexandra Brintrup, 7 Aug 2025, LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion, https://arxiv.org/abs/2503.02161
- Ingo Ziegler, Abdullatif K\"oksal, Desmond Elliott, Hinrich Sch\"utze, 6 Aug 2025, CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation, https://arxiv.org/abs/2409.02098
- Alejandro Moreno R., Desale Fentaw, Samuel Palmer, Ra\'ul Salles de Padua, Ninad Dixit, Samuel Mugel, Roman Or\'us, Manuel Radons, Josef Menter, and Ali Abedi, 8 Aug 2025, Synthetic Data Generation and Differential Privacy using Tensor Networks' Matrix Product States (MPS), https://arxiv.org/abs/2508.06251
- Ojonugwa Oluwafemi Ejiga Peter, Akingbola Oluwapemiisin, Amalahu Chetachi, Adeniran Opeyemi, Fahmi Khalifa, and Md Mahmudur Rahman, 8 Aug 2025, Synthetic Data-Driven Multi-Architecture Framework for Automated Polyp Segmentation Through Integrated Detection and Mask Generation, https://arxiv.org/abs/2508.06170
- Pavitra Chauhan, Mohsen Gamal Saad Askar, Kristian Svendsen, Bj{\o}rn Fjukstad, Brita Elvev{\aa}g, Lars Ailo Bongo, Edvard Pedersen, 8 Aug 2025, From research to clinic: Accelerating the translation of clinical decision support systems by making synthetic data interoperable, https://arxiv.org/abs/2308.02613
- Shayan Alahyari, Mike Domaratzki, 8 Aug 2025, SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression, https://arxiv.org/abs/2504.21152
- Arshia Ilaty, Hossein Shirazi, Hajar Homayouni, 11 Aug 2025, SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering, https://arxiv.org/abs/2508.08529
- Audrey Poinsot, Panayiotis Panayiotou, Alessandro Leite, Nicolas Chesneau, \"Ozg\"ur \c{S}im\c{s}ek, Marc Schoenauer, 12 Aug 2025, Position: Causal Machine Learning Requires Rigorous Synthetic Experiments for Broader Adoption, https://arxiv.org/abs/2508.08883
- Farah Atif, Nursultan Askarbekuly, Kareem Darwish, Monojit Choudhury, 4 Aug 2025, Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions, https://arxiv.org/abs/2508.08287
- Vibeke Binz Vallevik, Anne Kjersti C. Befring, Severin Elvatun and Jan Franz Nygaard, 11 Aug 2025, Processing of synthetic data in AI development for healthcare and the definition of personal data in EU law, https://arxiv.org/abs/2508.08353
- Aydin Zaboli and Junho Hong, 12 Aug 2025, Generative AI for Critical Infrastructure in Smart Grids: A Unified Framework for Synthetic Data Generation and Anomaly Detection, https://arxiv.org/abs/2508.08593
- Taedong Yun, Eric Yang, Mustafa Safdari, Jong Ha Lee, Vaishnavi Vinod Kumar, S. Sara Mahdavi, Jonathan Amar, Derek Peyton, Reut Aharony, Andreas Michaelides, Logan Schneider, Isaac Galatzer-Levy, Yugang Jia, John Canny, Arthur Gretton, Maja Matari\'c, 12 Aug 2025, Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions, https://arxiv.org/abs/2502.13135
- Min Tang, Peng Lu, Qing Feng, 6 Aug 2025, Generating Feasible and Diverse Synthetic Populations Using Diffusion Models, https://arxiv.org/abs/2508.09164
- Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li, 13 Aug 2025, Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation, https://arxiv.org/abs/2508.09987
- Shuzheng Si, Haozhe Zhao, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Bofei Gao, Kangyang Luo, Wenhao Li, Yufei Huang, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun, 13 Aug 2025, Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning, https://arxiv.org/abs/2505.16483
- Pratyush Maini, Vineeth Dorna, Parth Doshi, Aldo Carranza, Fan Pan, Jack Urbanek, Paul Burstein, Alex Fang, Alvin Deng, Amro Abbas, Brett Larsen, Cody Blakeney, Charvi Bannur, Christina Baek, Darren Teh, David Schwab, Haakon Mongstad, Haoli Yin, Josh Wills, Kaleigh Mentzer, Luke Merrick, Ricardo Monti, Rishabh Adiga, Siddharth Joshi, Spandan Das, Zhengping Wang, Bogdan Gaza, Ari Morcos, Matthew Leavitt, 14 Aug 2025, BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining, https://arxiv.org/abs/2508.10975
- Liam Chalcroft and Ioannis Pappas and Cathy J. Price and John Ashburner, 15 Aug 2025, Synthetic Data for Robust Stroke Segmentation, https://arxiv.org/abs/2404.01946
- Nitish Nagesh, Salar Shakibhamedan, Mahdi Bagheri, Ziyu Wang, Nima TaheriNejad, Axel Jantsch, Amir M. Rahmani, 15 Aug 2025, FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation, https://arxiv.org/abs/2508.11810
- Jonas van Elburg, Peter van der Putten, Maarten Marx, 15 Aug 2025, Can we Evaluate RAGs with Synthetic Data?, https://arxiv.org/abs/2508.11758
- Ahmet H. G\"uzel, Ilija Bogunovic, Jack Parker-Holder, 17 Aug 2025, Synthetic Data is Sufficient for Zero-Shot Visual Generalization from Offline Data, https://arxiv.org/abs/2508.12356
- Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xinyun Liu, Yulia Tsvetkov, 17 Aug 2025, Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment, https://arxiv.org/abs/2506.00845
- Matey Krastev, Miklos Hamar, Danilo Toapanta, Jesse Brouwers, Yibin Lei, 19 Aug 2025, InPars+: Supercharging Synthetic Data Generation for Information Retrieval Systems, https://arxiv.org/abs/2508.13930
- Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, Giulia Fanti, 19 Aug 2025, POPri: Private Federated Learning using Preference-Optimized Synthetic Data, https://arxiv.org/abs/2504.16438
- Suleyman Olcay Polat, Poli A. Nemkova, Mark V. Albert, 20 Aug 2025, Synthetic Adaptive Guided Embeddings (SAGE): A Novel Knowledge Distillation Method, https://arxiv.org/abs/2508.14783
- Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe, Hasan Kurban, 20 Aug 2025, Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference, https://arxiv.org/abs/2508.14735
- Gaston Gustavo Rios, 20 Aug 2025, HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation, https://arxiv.org/abs/2508.14345
- Saptarshi Neil Sinha and P. Julius Kuehn and Johannes Koppe and Arjan Kuijper and Michael Weinmann, 20 Aug 2025, Neural Restoration of Greening Defects in Historical Autochrome Photographs Based on Purely Synthetic Data, https://arxiv.org/abs/2505.22291
- Bidyapati Pradhan, Surajit Dasgupta, Amit Kumar Saha, Omkar Anustoop, Sriram Puttagunta, Vipul Mittal, Gopal Sarda, 21 Aug 2025, GraSP: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data for SFT and DPO, https://arxiv.org/abs/2508.15432
- Jan Kapar, Kathrin G\"unther, Lori Ann Vallis, Klaus Berger, Nadine Binder, Hermann Brenner, Stefanie Castell, Beate Fischer, Volker Harth, Bernd Holleczek, Timm Intemann, Till Ittermann, Andr\'e Karch, Thomas Keil, Lilian Krist, Berit Lange, Michael F. Leitzmann, Katharina Nimptsch, Nadia Obi, Iris Pigeot, Tobias Pischon, Tamara Schikowski, B\"orge Schmidt, Carsten Oliver Schmidt, Anja M. Sedlmair, Justine Tanoey, Harm Wienbergen, Andreas Wienke, Claudia Wigmann and Marvin N. Wright, 19 Aug 2025, Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI, https://arxiv.org/abs/2508.14936
- Juntao Tan, Liangwei Yang, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Tulika Manoj Awalgaonkar, Jianguo Zhang, Weiran Yao, Ming Zhu, Shirley Kokane, Silvio Savarese, Huan Wang, Caiming Xiong, Shelby Heinecke, 20 Aug 2025, PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data, https://arxiv.org/abs/2502.20616
- Arefeh Kazemi and Sri Balaaji Natarajan Kalaivendan and Joachim Wagner and Hamza Qadeer and Kanishk Verma and Brian Davis, 20 Aug 2025, Synthetic vs. Gold: The Role of LLM Generated Labels and Data in Cyberbullying Detection, https://arxiv.org/abs/2502.15860
- Weijie Niu, Alberto Huertas Celdran, Karoline Siarsky, Burkhard Stiller, 22 Aug 2025, FEST: A Unified Framework for Evaluating Synthetic Tabular Data, https://arxiv.org/abs/2508.16254
- Seyedali Mohammadi, Manas Paldhe, Amit Chhabra, 13 Aug 2025, LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions, https://arxiv.org/abs/2508.15801
- Jerry Cao-Xue, Tien Comlekoglu, Keyi Xue, Guanliang Wang, Jiang Li, Gordon Laurie, 21 Aug 2025, Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset, https://arxiv.org/abs/2508.15986
- Mika Leo Hube, Filip Lemic, Ethungshan Shitiri, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa P\'erez, 22 Aug 2025, Set Transformer Architectures and Synthetic Data Generation for Flow-Guided Nanoscale Localization, https://arxiv.org/abs/2508.16200
- Stefania L. Moroianu, Christian Bluethgen, Pierre Chambon, Mehdi Cherti, Jean-Benoit Delbrouck, Magdalini Paschali, Brandon Price, Judy Gichoya, Jenia Jitsev, Curtis P. Langlotz, Akshay S. Chaudhari, 22 Aug 2025, Improving Performance, Robustness, and Fairness of Radiographic AI Models with Finely-Controllable Synthetic Data, https://arxiv.org/abs/2508.16783
- Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Joao Manoel Herrera Pinheiro, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker, 24 Aug 2025, A Synthetic Dataset for Manometry Recognition in Robotic Applications, https://arxiv.org/abs/2508.17468
- Weikang Wan, Jiawei Fu, Xiaodi Yuan, Yifeng Zhu, Hao Su, 24 Aug 2025, LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations, https://arxiv.org/abs/2508.17547
- Rishikesh Devanathan, Varun Nathan, Ayush Kumar, 25 Aug 2025, Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation, https://arxiv.org/abs/2508.18210
- Melissa Kazemi Rad, Alberto Purpura, Himanshu Kumar, Emily Chen, Mohammad Shahed Sorower, 23 Aug 2025, GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection, https://arxiv.org/abs/2508.17057
- Chenhao Xue, Yuanzhe Jin, Adrian Carrasco-Revilla, Joyraj Chakraborty, Min Chen, 4 Aug 2025, AutoGeTS: Knowledge-based Automated Generation of Text Synthetics for Improving Text Classification, https://arxiv.org/abs/2508.10000
Unnatural Instructions (Synthetic Data)
Research papers on "unnatural instructions," a type of synthetic data for training:
- A Gudibande, E Wallace, C Snell, X Geng, H Liu 2023, The false promise of imitating proprietary llms, https://arxiv.org/abs/2305.15717
- Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang 2023, Aligning large language models with human: A survey, https://arxiv.org/abs/2307.12966
- Y Gu, L Dong, F Wei, M Huang, 2023, Knowledge Distillation of Large Language Models, https://arxiv.org/abs/2306.08543
- X Wan, R Sun, H Dai, SO Arik, T Pfister, 2023, Better zero-shot reasoning with self-adaptive prompting, https://arxiv.org/abs/2305.14106
- S Horawalavithana, S Munikoti, I Stewart, 2023, SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions, https://arxiv.org/abs/2307.01139
- X Daull, P Bellot, E Bruno, V Martin, 2023, Complex QA and language models hybrid architectures, Survey, https://arxiv.org/abs/2302.09051
- Z Yuan, J Liu, Q Zi, M Liu, X Peng, Y Lou, 2023, Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation, https://arxiv.org/abs/2308.01240
- W AlShikh, M Daaboul, K Goddard, B Imel, 2023, Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning, https://arxiv.org/abs/2307.03692
- Z He, Z Xie, R Jha, H Steck, D Liang, Y Feng, 2023, Large Language Models as Zero-Shot Conversational Recommenders, https://arxiv.org/abs/2308.10053
Distributed Training
Distributed training is the optimization of spreading training computations across multiple GPUs or multiple servers. Trillion parameter models are trained on large clusters of 100,000+ GPUs, with complex multi-server multi-GPU architectures. Distributed training can also be performed on much more spread-out architectures with servers communicating over the internet.
Some of the research papers on distributed training:
- Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
- WenZheng Zhang, Yang Hu, Jing Shi, Xiaoying Bai, 22 Aug 2024, Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters, https://arxiv.org/abs/2408.12596
- Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
- Palak (Microsoft Research India), Rohan Gandhi (Microsoft Research India), Karan Tandon (Microsoft Research India), Debopam Bhattacherjee (Microsoft Research India), Venkata N. Padmanabhan (Microsoft Research India), 16 Nov 2024, Improving training time and GPU utilization in geo-distributed language model training, https://arxiv.org/abs/2411.14458
- M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
- Bowen Peng, Jeffrey Quesnelle, Diederik P. Kingma, 29 Nov 2024, DeMo: Decoupled Momentum Optimization, https://arxiv.org/abs/2411.19870 https://github.com/bloc97/DeMo (Extension to ADAM optimizer that greatly reduces network communication in training.)
- Carl Franzen, August 27, 2024, ‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency, https://venturebeat.com/ai/this-could-change-everything-nous-research-unveils-new-tool-to-train-powerful-ai-models-with-10000x-efficiency/
- Carl Franzen, December 2, 2024, Nous Research is training an AI model using machines distributed across the internet, https://venturebeat.com/ai/nous-research-is-training-an-ai-model-using-machines-distributed-across-the-internet/
- Yicheng Feng, Yuetao Chen, Kaiwen Chen, Jingzong Li, Tianyuan Wu, Peng Cheng, Chuan Wu, Wei Wang, Tsung-Yi Ho, Hong Xu, 17 Dec 2024, Echo: Simulating Distributed Training At Scale, https://arxiv.org/abs/2412.12487
- Kaiyuan Tian, Linbo Qiao, Baihui Liu, Gongqingjian Jiang, Dongsheng Li, 21 Jan 2025, A Survey on Memory-Efficient Large-Scale Model Training in AI for Science, https://arxiv.org/abs/2501.11847
- Nir Barazida, Mar 9, 2022, Distributed training of deep learning models: handling stragglers and latency in synchronous training A review of the challenges in Synchronous distributed training and best solutions for stragglers and high latency https://towardsdatascience.com/stragglers-and-latency-in-synchronous-distributed-training-of-deep-learning-models-43783b0266d9
- Zhuang Wang, Zhen Jia, Shuai Zheng, Zhen Zhang, Xinwei Fu, T. S. Eugene Ng, and Yida Wang. 2023. GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints. In Proceedings of the 29th Symposium on Operating Systems Principles (SOSP '23). Association for Computing Machinery, New York, NY, USA, 364–381. https://doi.org/10.1145/3600006.3613145 https://dl.acm.org/doi/10.1145/3600006.3613145 https://www.cs.rice.edu/~eugeneng/papers/SOSP23.pdf (First paper on in-memory checkpointing to CPU memory, and also covers interleaving of checkpointing network traffic with training traffic.)
- Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou, 15 Apr 2024, AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes, https://arxiv.org/abs/2404.09679
- Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang, 28 Jun 2024 (v2), Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training, https://arxiv.org/abs/2406.18820
- Xinyi Liu, Yujie Wang, Shenhan Zhu, Fangcheng Fu, Qingshuo Liu, Guangming Lin, Bin Cui, 30 Apr 2025, Galvatron: An Automatic Distributed System for Efficient Foundation Model Training, https://arxiv.org/abs/2504.21411 https://github.com/PKU-DAIR/Hetu-Galvatron
- Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf
- Zihao Song, Shirantha Welikala, Panos J. Antsaklis and Hai Lin, 22 Jul 2025, Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach, https://arxiv.org/abs/2504.06439
- Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert Ross, Shivaram Venkataraman, 20 Jul 2025, PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training, https://arxiv.org/abs/2507.11683
- Tolga Dimlioglu, Anna Choromanska, 27 Jul 2025, Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning, https://arxiv.org/abs/2507.20424
- Samarth Gupta, Raghudeep Gadde, Rui Chen, Aleix M. Martinez, 20 Aug 2025, Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states, https://arxiv.org/abs/2508.14413
- Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu, 4 Aug 2025, VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo, https://arxiv.org/abs/2508.02317
- Xudong Liao, Yijun Sun, Han Tian, Xinchen Wan, Yilun Jin, Zilong Wang, Zhenghang Ren, Xinyang Huang, Wenxue Li, Kin Fai Tse, Zhizhen Zhong, Guyue Liu, Ying Zhang, Xiaofeng Ye, Yiming Zhang, Kai Chen, 4 Aug 2025, MixNet: A Runtime Reconfigurable Optical-Electrical Fabric for Distributed Mixture-of-Experts Training, https://arxiv.org/abs/2501.03905
Training Costs
Research on the total costs of performing LLM training:
- Will Henshall June 3, 2024, The Billion-Dollar Price Tag of Building AI, Time, https://time.com/6984292/cost-artificial-intelligence-compute-epoch-report/
- Epoch AI, 2024, How Much Does It Cost to Train Frontier AI Models? https://epochai.org/blog/how-much-does-it-cost-to-train-frontier-ai-models
- Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej, David Owen, 31 May 2024, The rising costs of training frontier AI models, https://arxiv.org/abs/2405.21015
- Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
- NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
- Alberto Romero, Jan 2025, DeepSeek, a little-known Chinese startup, released R1 yesterday, https://substack.com/@thealgorithmicbridge/note/c-87664591-
- Maxwell Zeff, February 5, 2025, Researchers created an open rival to OpenAI’s o1 ‘reasoning’ model for under $50, https://techcrunch.com/2025/02/05/researchers-created-an-open-rival-to-openais-o1-reasoning-model-for-under-50/
- Kyle Wiggers, January 11, 2025, Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450,https://techcrunch.com/2025/01/11/researchers-open-source-sky-t1-a-reasoning-ai-model-that-can-be-trained-for-less-than-450/
- Alexandra Sternlicht, June 18, 2025, China’s MiniMax debuts M1 AI model that it says costs 200x less to train than OpenAI’s GPT-4, https://fortune.com/2025/06/18/chinas-minimax-m1-ai-model-200x-less-expensive-to-train-than-openai-gpt-4/
- Epoch AI, July 2025, Large-Scale AI Models: Our Large-Scale AI Models dataset documents over 400 models trained with more than 1023 floating point operations, at the leading edge of scale and capabilities, https://epoch.ai/data/large-scale-ai-models (Training costs at around $10m+; launched 200 large models in 2024.)
Federated Learning
Research on federated learning, a type of distributed training for LLMs:
- Caelin Kaplan, Tareq Si Salem, Angelo Rodio, Chuan Xu, Giovanni Neglia, 7 May 2024, Federated Learning for Cooperative Inference Systems: The Case of Early Exit Networks, https://arxiv.org/abs/2405.04249
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
- Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti, 2024, Fed-EE: Federating Heterogeneous ASR Models using Early-Exit Architectures, PDF: https://cris.fbk.eu/bitstream/11582/343747/1/paper_49.pdf
- H Woisetschläger, A Isenko, S Wang, R Mayer, 2023, Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly, https://arxiv.org/abs/2310.03150
- Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane, 19 Jul 2024 (v2), The Future of Large Language Model Pre-training is Federated, https://arxiv.org/abs/2405.10853
- Jaxpruner: A Concise Library for Sparsity Research, Joo Hyung Lee, Wonpyo Park, Nicole Elyse Mitchell, Jonathan Pilault, Johan Samir Obando Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Woohyun Han, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart J.C. Bik, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci, Conference on Parsimony and Learning, PMLR 234:515-528, 2024. https://proceedings.mlr.press/v234/lee24a.html https://proceedings.mlr.press/v234/lee24a/lee24a.pdf https://openreview.net/forum?id=H2rCZCfXkS https://openreview.net/pdf?id=H2rCZCfXkS
- Eric Samikwa, 2024, Resource-Aware Distributed Machine Learning for Artificial Intelligence of Things, Ph.D. thesis, Faculty of Science, University of Bern, Switzerland, https://boristheses.unibe.ch/5378/1/24samikwa_e_1_.pdf https://doi.org/10.48549/5378 (Multi-edge device with early exit, "micro-split" scheduling, split/federated learning, and distributed inference.)
- Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
- Shengwen Ding, Chenhui Hu, 24 Nov 2024, eFedLLM: Efficient LLM Inference Based on Federated Learning, https://arxiv.org/abs/2411.16003
- Natalie Lang, Alejandro Cohen, Nir Shlezinger, 27 Mar 2024, Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates, https://arxiv.org/abs/2403.18375
- Chengxi Li, Ming Xiao, Mikael Skoglund, 22 Mar 2024, Adaptive Coded Federated Learning: Privacy Preservation and Straggler Mitigation, https://arxiv.org/abs/2403.14905
- Andrew Hard, Antonious M. Girgis, Ehsan Amid, Sean Augenstein, Lara McConnaughey, Rajiv Mathews, Rohan Anil, 14 Mar 2024, Learning from straggler clients in federated learning, https://arxiv.org/abs/2403.09086
- Hongpeng Guo, Haotian Gu, Xiaoyang Wang, Bo Chen, Eun Kyung Lee, Tamar Eilam, Deming Chen, Klara Nahrstedt, 31 Jan 2024, FedCore: Straggler-Free Federated Learning with Distributed Coresets, https://arxiv.org/abs/2402.00219
- Frederico Vicente, Cláudia Soares, Dušan Jakovetić, 13 May 2025, Modular Federated Learning: A Meta-Framework Perspective, https://arxiv.org/abs/2505.08646
- Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu, 14 Aug 2025, A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning, https://arxiv.org/abs/2508.10315
- Kejia Fan, Jianheng Tang, Zhirui Yang, Feijiang Han, Jiaxu Li, Run He, Yajiang Huang, Anfeng Liu, Houbing Herbert Song, Yunhuai Liu, Huiping Zhuang, 14 Aug 2025, APFL: Analytic Personalized Federated Learning via Dual-Stream Least Squares, https://arxiv.org/abs/2508.10732
- Rodrigo Tertulino, 6 Aug 2025, A Robust Pipeline for Differentially Private Federated Learning on Imbalanced Clinical Data using SMOTETomek and FedProx, https://arxiv.org/abs/2508.10017
- Jane Carney, Kushal Upreti, Gaby G. Dagher, Tim Andersen, 11 Aug 2025, FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning, https://arxiv.org/abs/2508.10042
- Tianjun Yuan, Jiaxiang Geng, Pengchao Han, Xianhao Chen, Bing Luo, 14 Aug 2025, Flexible Personalized Split Federated Learning for On-Device Fine-Tuning of Foundation Models, https://arxiv.org/abs/2508.10349
- Wenxuan Ye, Xueli An, Junfan Wang, Xueqiang Yan, Georg Carle, 14 Aug 2025, FedABC: Attention-Based Client Selection for Federated Learning with Long-Term View, https://arxiv.org/abs/2507.20871
- Murtaza Rangwala, KR Venugopal, Rajkumar Buyya, 14 Aug 2025, Blockchain-Enabled Federated Learning, https://arxiv.org/abs/2508.06406
- Mattia Sabella and Monica Vitali, 23 Jul 2025, Eco-Friendly AI: Unleashing Data Power for Green Federated Learning, https://arxiv.org/abs/2507.17241
- Aritz P\'erez, Carlos Echegoyen and Guzm\'an Santaf\'e, 23 Jul 2025, Decentralized Federated Learning of Probabilistic Generative Classifiers, https://arxiv.org/abs/2507.17285
- Amandeep Singh Bhatia, Sabre Kais, 23 Jul 2025, Enhancing Quantum Federated Learning with Fisher Information-Based Optimization, https://arxiv.org/abs/2507.17580
- Dario Fenoglio, Gabriele Dominici, Pietro Barbiero, Alberto Tonda, Martin Gjoreski, Marc Langheinrich, 23 Jul 2025, Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning, https://arxiv.org/abs/2405.15632
- Mehdi Khalaj, Shahrzad Golestani Najafabadi, Julita Vassileva, 23 Jul 2025, Privacy-Preserving Multimodal News Recommendation through Federated Learning, https://arxiv.org/abs/2507.15460
- Binbin Ding, Penghui Yang, Sheng-Jun Huang, 22 Jul 2025, FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons, https://arxiv.org/abs/2408.08655
- Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, Se-Ho Lee, 22 Jul 2025, FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization, https://arxiv.org/abs/2506.23516
- Baran Can G\"ul, Suraksha Nadig, Stefanos Tziampazis, Nasser Jazdi, Michael Weyrich, 22 Jul 2025, FedMultiEmo: Real-Time Emotion Recognition via Multimodal Federated Learning, https://arxiv.org/abs/2507.15470
- Obaidullah Zaland, Chanh Nguyen, Florian T. Pokorny and Monowar Bhuyan, 23 Jul 2025, Federated Learning for Large-Scale Cloud Robotic Manipulation: Opportunities and Challenges, https://arxiv.org/abs/2507.17903
- Ahmad Alhonainy (1), Praveen Rao (1) ((1) University of Missouri, USA), 19 Jul 2025, Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments, https://arxiv.org/abs/2507.17772
- Constantin Philippenko and Aymeric Dieuleveut, 24 Jul 2025, Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning, https://arxiv.org/abs/2308.01358
- Daniel Commey, Kamel Abbad, Garth V. Crosby and Lyes Khoukhi, 18 Jul 2025, FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated Learning, https://arxiv.org/abs/2507.13624
- Sahar Ghoflsaz Ghinani and Elaheh Sadredini, 18 Jul 2025, FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning, https://arxiv.org/abs/2507.13591
- Di Yu, Xin Du, Linshan Jiang, Huijing Zhang, Shuiguang Deng, 18 Jul 2025, Exploiting Label Skewness for Spiking Neural Networks in Federated Learning, https://arxiv.org/abs/2412.17305
- Huan Wang, Haoran Li, Huaming Chen, Jun Yan, Jiahua Shi, Jun Shen, 18 Jul 2025, FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning, https://arxiv.org/abs/2507.06482
- Zhiyong Jin, Runhua Xu, Chao Li, Yizhong Liu, Jianxin Li, 18 Jul 2025, Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.01454
- Nuria Rodr\'iguez-Barroso and Mario Garc\'ia-M\'arquez and M. Victoria Luz\'on and Francisco Herrera, 21 Jul 2025, Challenges of Trustworthy Federated Learning: What's Done, Current Trends and Remaining Work, https://arxiv.org/abs/2507.15796
- Yajiao Dai, Jun Li, Zhen Mei, Yiyang Ni, Shi Jin, Zengxiang Li, Sheng Guo, Wei Xiang, 12 Jul 2025, Semi-Supervised Federated Learning via Dual Contrastive Learning and Soft Labeling for Intelligent Fault Diagnosis, https://arxiv.org/abs/2507.14181
- Md Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain, 18 Jul 2025, FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning, https://arxiv.org/abs/2507.14322
- Tianle Li, Yongzhi Huang, Linshan Jiang, Qipeng Xie, Chang Liu, Wenfeng Du, Lu Wang, and Kaishun Wu, 20 Jul 2025, FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios, https://arxiv.org/abs/2507.14980
- Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang, 20 Jul 2025, Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data, https://arxiv.org/abs/2507.14999
- Huiling Yang, Zhanwei Wang, and Kaibin Huang, 21 Jul 2025, Optimal Batch-Size Control for Low-Latency Federated Learning with Device Heterogeneity, https://arxiv.org/abs/2507.15601
- Juntao Tan, Anran Li, Quanchao Liu, Peng Ran, Lan Zhang, 19 Jul 2025, VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning, https://arxiv.org/abs/2507.14625
- Juntao Tan, Lan Zhang, Zhonghao Hu, Kai Yang, Peng Ran, Bo Li, 19 Jul 2025, VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking, https://arxiv.org/abs/2507.14629
- Khoa Nguyen, Tanveer Khan, Antonis Michalas, 20 Jul 2025, A Privacy-Centric Approach: Scalable and Secure Federated Learning Enabled by Hybrid Homomorphic Encryption, https://arxiv.org/abs/2507.14853
- Zhipeng Wang, Nanqing Dong, Jiahao Sun, William Knottenbelt, Yike Guo, 21 Jul 2025, zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning, https://arxiv.org/abs/2310.02554
- Shunsuke Yoneda, Valdemar \v{S}v\'abensk\'y, Gen Li, Daisuke Deguchi, Atsushi Shimada, 21 Jul 2025, Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features, https://arxiv.org/abs/2505.09287
- Xinglin Zhao, Yanwen Wang, Xiaobo Liu, Yanrong Hao, Rui Cao, Xin Wen, 8 Aug 2025, A Federated Learning Framework for Handling Subtype Confounding and Heterogeneity in Large-Scale Neuroimaging Diagnosis, https://arxiv.org/abs/2508.06589
- Md. Akmol Masud, Md Abrar Jahin, Mahmud Hasan, 8 Aug 2025, Stabilizing Federated Learning under Extreme Heterogeneity with HeteRo-Select, https://arxiv.org/abs/2508.06692
- Yashwant Krishna Pagoti, Arunesh Sinha, Shamik Sural, 10 Aug 2025, Strategic Incentivization for Locally Differentially Private Federated Learning, https://arxiv.org/abs/2508.07138
- Chenchen Lin, Xuehe Wang, 11 Aug 2025, Multi-Hop Privacy Propagation for Differentially Private Federated Learning in Social Networks, https://arxiv.org/abs/2508.07676
- Mohamad Assaad, Zeinab Nehme, Merouane Debbah, 11 Aug 2025, Communication-Efficient Zero-Order and First-Order Federated Learning Methods over Wireless Networks, https://arxiv.org/abs/2508.08013
- Maozhen Zhang, Mengnan Zhao, Bo Wang, 11 Aug 2025, BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models, https://arxiv.org/abs/2508.08040
- Cem Ata Baykara, Saurav Raj Pandey, Ali Burak \"Unal, Harlin Lee, and Mete Akg\"un, 11 Aug 2025, Federated Learning for Epileptic Seizure Prediction Across Heterogeneous EEG Datasets, https://arxiv.org/abs/2508.08159
- Roopkatha Banerjee, Sampath Koti, Gyanendra Singh, Anirban Chakraborty, Gurunath Gurrala, Bhushan Jagyasi and Yogesh Simmhan, 11 Aug 2025, Optimizing Federated Learning for Scalable Power-demand Forecasting in Microgrids, https://arxiv.org/abs/2508.08022
- Zilong Zhao, Robert Birke, Aditya Kunar, Lydia Y. Chen, 11 Aug 2025, Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data, https://arxiv.org/abs/2108.07927
- Dawood Wasif, Dian Chen, Sindhuja Madabushi, Nithin Alluru, Terrence J. Moore, Jin-Hee Cho, 9 Aug 2025, Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI, https://arxiv.org/abs/2503.16233
- Kaveen Hiniduma, Zilinghan Li, Aditya Sinha, Ravi Madduri, Suren Byna, 11 Aug 2025, CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning, https://arxiv.org/abs/2505.23849
- Ali Shakeri, Wei Emma Zhang, Amin Beheshti, Weitong Chen, Jian Yang and Lishan Yang, 22 Jul 2025, FedDPG: An Adaptive Yet Efficient Prompt-tuning Approach in Federated Learning Settings, https://arxiv.org/abs/2507.19534
- Youngjoon Lee, Hyukjoon Lee, Jinu Gong, Yang Cao, Joonhyuk Kang, 26 Jul 2025, Debunking Optimization Myths in Federated Learning for Medical Image Classification, https://arxiv.org/abs/2507.19822
- Liu junkang and Yuanyuan Liu and Fanhua Shang and Hongying Liu and Jin Liu and Wei Feng, 26 Jul 2025, FedSWA: Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging, https://arxiv.org/abs/2507.20016
- Shuaipeng Zhang, Lanju Kong, Yixin Zhang, Wei He, Yongqing Zheng, Han Yu, Lizhen Cui, 28 Jul 2025, DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning, https://arxiv.org/abs/2507.20571
- Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He, 29 Jul 2025, Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning, https://arxiv.org/abs/2507.21494
- Sven Lankester, Manel Slokom, Gustavo de Carvalho Bertoli, Matias Vizcaino, Emmanuelle Beauxis Aussalet, Laura Hollink, 15 Jul 2025, FedFlex: Federated Learning for Diverse Netflix Recommendations, https://arxiv.org/abs/2507.21115
- Xinhai Yan, Libing Wu, Zhuangzhuang Zhang, Bingyi Liu, Lijuan Huo, Jing Wang, 26 Jul 2025, FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning, https://arxiv.org/abs/2507.21177
- Abdelrhman Gaber, Hassan Abd-Eltawab, John Elgallab, Youssif Abuzied, Dineo Mpanya, Turgay Celik, Swarun Kumar, Tamer ElBatt, 30 Jul 2025, FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization, https://arxiv.org/abs/2507.22963
- David J Goetze, Dahlia J Felten, Jeannie R Albrecht, Rohit Bhattacharya, 30 Jul 2025, FLOSS: Federated Learning with Opt-Out and Straggler Support, https://arxiv.org/abs/2507.23115
- Mohammad Karami, Fatemeh Ghassemi, Hamed Kebriaei, Hamid Azadegan, 31 Jul 2025, OptiGradTrust: Byzantine-Robust Federated Learning with Multi-Feature Gradient Analysis and Reinforcement Learning-Based Trust Weighting, https://arxiv.org/abs/2507.23638
- Taeheon Lim, Joohyung Lee, Kyungjae Lee, Jungchan Cho, 31 Jul 2025, Mitigating Resolution-Drift in Federated Learning: Case of Keypoint Detection, https://arxiv.org/abs/2507.23461
- Chen Zhang, Husheng Li, Xiang Liu, Linshan Jiang, Danxin Wang, 30 Jul 2025, Hypernetworks for Model-Heterogeneous Personalized Federated Learning, https://arxiv.org/abs/2507.22330
- Wei Guo, Yiyang Duan, Zhaojun Hu, Yiqi Tong, Fuzhen Zhuang, Xiao Zhang, Jin Dong, Ruofan Wu, Tengfei Liu, Yifan Sun, 30 Jul 2025, Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data, https://arxiv.org/abs/2507.22488
- Zhuocheng Liu, Zhishu Shen, Qiushi Zheng, Tiehua Zhang, Zheng Lei, Jiong Jin, 30 Jul 2025, A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks, https://arxiv.org/abs/2507.22339
- Hongye Wang, Zhaoye Pan, Chang He, Jiaxiang Li, Bo Jiang, 30 Jul 2025, Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach, https://arxiv.org/abs/2507.22855
- Bokun Wang and Axel Berg and Durmus Alp Emre Acar and Chuteng Zhou, 30 Jul 2025, Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point, https://arxiv.org/abs/2407.02610
- Minyeong Choe, Cheolhee Park, Changho Seo, and Hyunil Kim, 30 Jul 2025, SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning, https://arxiv.org/abs/2409.14805
- Hanchi Ren and Jingjing Deng and Xianghua Xie, 1 Aug 2025, Gradient Leakage Defense with Key-Lock Module for Federated Learning, https://arxiv.org/abs/2305.04095
- Honoka Anada, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki, 1 Aug 2025, How to Evaluate Participant Contributions in Decentralized Federated Learning, https://arxiv.org/abs/2505.23246
- Hangyu Li and Hongyue Wu and Guodong Fan and Zhen Zhang and Shizhan Chen and Zhiyong Feng, 1 Aug 2025, Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices, https://arxiv.org/abs/2506.20644
- Jinnan Guo, Kapil Vaswani, Andrew Paverd, Peter Pietzuch, 1 Aug 2025, ExclaveFL: Providing Transparency to Federated Learning using Exclaves, https://arxiv.org/abs/2412.10537
- Xin Chen, Shuaijun Chen, Omid Tavallaie, Nguyen Tran, Shuhuang Xiang, Albert Zomaya, 2 Aug 2025, Convergence Analysis of Aggregation-Broadcast in LoRA-enabled Federated Learning, https://arxiv.org/abs/2508.01348
- Heting Liu, Junzhe Huang, Fang He, Guohong Cao, 3 Aug 2025, Dynamic Clustering for Personalized Federated Learning on Heterogeneous Edge Devices, https://arxiv.org/abs/2508.01580
- Ziru Niu, Hai Dong, A.K. Qin, 3 Aug 2025, Boosting Generalization Performance in Model-Heterogeneous Federated Learning Using Variational Transposed Convolution, https://arxiv.org/abs/2508.01669
- Ali Forootani, Raffaele Iervolino, 3 Aug 2025, Asynchronous Federated Learning with non-convex client objective functions and heterogeneous dataset, https://arxiv.org/abs/2508.01675
- Xiangwang Hou, Jingjing Wang, Fangming Guan, Jun Du, Chunxiao Jiang, Yong Ren, 3 Aug 2025, Energy-Efficient Federated Learning for Edge Real-Time Vision via Joint Data, Computation, and Communication Design, https://arxiv.org/abs/2508.01745
- Ignacy St\k{e}pka, Nicholas Gisolfi, Kacper Tr\k{e}bacz, Artur Dubrawski, 3 Aug 2025, Mitigating Persistent Client Dropout in Asynchronous Decentralized Federated Learning, https://arxiv.org/abs/2508.01807
- Qi Xiong, Hai Dong, Nasrin Sohrabi, Zahir Tari, 4 Aug 2025, FedLAD: A Linear Algebra Based Data Poisoning Defence for Federated Learning, https://arxiv.org/abs/2508.02136
- Mirko Konstantin, Moritz Fuchs and Anirban Mukhopadhyay, 4 Aug 2025, ASMR: Angular Support for Malfunctioning Client Resilience in Federated Learning, https://arxiv.org/abs/2508.02414
- Shunxian Gu, Chaoqun You, Bangbang Ren, Deke Guo, 4 Aug 2025, Communication and Computation Efficient Split Federated Learning in O-RAN, https://arxiv.org/abs/2508.02534
- Junjie Shan, Ziqi Zhao, Jialin Lu, Rui Zhang, Siu Ming Yiu and Ka-Ho Chow, 2 Aug 2025, Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning, https://arxiv.org/abs/2411.14937
- Sota Mashiko, Yuji Kawamata, Tomoru Nakayama, Tetsuya Sakurai, Yukihiko Okada, 1 Aug 2025, Anomaly Detection in Double-entry Bookkeeping Data by Federated Learning System with Non-model Sharing Approach, https://arxiv.org/abs/2501.12723
- Keke Gai, Mohan Wang, Jing Yu, Dongjue Wang, Qi Wu, 3 Aug 2025, Adaptive Prototype Knowledge Transfer for Federated Learning with Mixed Modalities and Heterogeneous Tasks, https://arxiv.org/abs/2502.04400
- Jiahui Bai, Hai Dong, A. K. Qin, 5 Aug 2025, On the Fast Adaptation of Delayed Clients in Decentralized Federated Learning: A Centroid-Aligned Distillation Approach, https://arxiv.org/abs/2508.02993
- Weiyao Zhang, Jinyang Li, Qi Song, Miao Wang, Chungang Lin, Haitong Luo, Xuying Meng, Yujun Zhang, 5 Aug 2025, Heterogeneity-Oblivious Robust Federated Learning, https://arxiv.org/abs/2508.03579
- Hao Di, Yi Yang, Haishan Ye, Xiangyu Chang, 5 Aug 2025, PPFL: A Personalized Federated Learning Framework for Heterogeneous Population, https://arxiv.org/abs/2310.14337
- Hyungbin Kim, Incheol Baek, Yon Dohn Chung, 6 Aug 2025, Decoupled Contrastive Learning for Federated Learning, https://arxiv.org/abs/2508.04005
- Tuan Nguyen, Khoa D Doan, and Kok-Seng Wong, 6 Aug 2025, FLAT: Latent-Driven Arbitrary-Target Backdoor Attacks in Federated Learning, https://arxiv.org/abs/2508.04064
- Jianheng Tang, Zhirui Yang, Jingchao Wang, Kejia Fan, Jinfeng Xu, Huiping Zhuang, Anfeng Liu, Houbing Herbert Song, Leye Wang, Yunhuai Liu, 6 Aug 2025, FedHiP: Heterogeneity-Invariant Personalized Federated Learning Through Closed-Form Solutions, https://arxiv.org/abs/2508.04470
- Borui Li, Li Yan, Junhao Han, Jianmin Liu, Lei Yu, 6 Aug 2025, SenseCrypt: Sensitivity-guided Selective Homomorphic Encryption for Joint Federated Learning in Cross-Device Scenarios, https://arxiv.org/abs/2508.04100
- Borui Li, Li Yan, Jianmin Liu, 6 Aug 2025, SelectiveShield: Lightweight Hybrid Defense Against Gradient Leakage in Federated Learning, https://arxiv.org/abs/2508.04265
- Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang, 5 Aug 2025, Traceable Black-box Watermarks for Federated Learning, https://arxiv.org/abs/2505.13651
- Thinh Nguyen, Le Huy Khiem, Van-Tuan Tran, Khoa D Doan, Nitesh V Chawla, Kok-Seng Wong, 7 Aug 2025, pFedDSH: Enabling Knowledge Transfer in Personalized Federated Learning through Data-free Sub-Hypernetwork, https://arxiv.org/abs/2508.05157
- Mirko Konstantin and Anirban Mukhopadhyay, 7 Aug 2025, Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning, https://arxiv.org/abs/2508.05224
- Qinghua Yao, Xiangrui Xu, Zhize Li, 7 Aug 2025, X-VFL: A New Vertical Federated Learning Framework with Cross Completion and Decision Subspace Alignment, https://arxiv.org/abs/2508.05568
- Sachin Dudda Nagaraju, Ashkan Moradi, Bendik Skarre Abrahamsen, and Mattijs Elschot, 7 Aug 2025, FedGIN: Federated Learning with Dynamic Global Intensity Non-linear Augmentation for Organ Segmentation using Multi-modal Images, https://arxiv.org/abs/2508.05137
- Ce Na, Kai Yang, Dengzhao Fang, Yu Li, Jingtong Gao, Chengcheng Zhu, Jiale Zhang, Xiaobing Sun, Yi Chang, 8 Aug 2025, Graph Federated Learning for Personalized Privacy Recommendation, https://arxiv.org/abs/2508.06208
- Yuze Liu, Tiehua Zhang, Zhishu Shen, Libing Wu, Shiping Chen and Jiong Jin, 1 Aug 2025, Towards Heterogeneity-Aware and Energy-Efficient Topology Optimization for Decentralized Federated Learning in Edge Environment, https://arxiv.org/abs/2508.08278
- Dung T. Tran, Nguyen B. Ha, Van-Dinh Nguyen, Kok-Seng Wong, 11 Aug 2025, SHeRL-FL: When Representation Learning Meets Split Learning in Hierarchical Federated Learning, https://arxiv.org/abs/2508.08339
- Keumseo Ryum, Jinu Gong, and Joonhyuk Kang, 12 Aug 2025, SHEFL: Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning, https://arxiv.org/abs/2508.08552
- Wenyou Guo, Ting Qu, Chunrong Pan, George Q. Huang, 12 Aug 2025, Distributed optimization: designed for federated learning, https://arxiv.org/abs/2508.08606
- Yuvraj Dutta, Soumyajit Chatterjee, Sandip Chakraborty, Basabdatta Palit, 11 Aug 2025, Benchmarking Federated Learning for Throughput Prediction in 5G Live Streaming Applications, https://arxiv.org/abs/2508.08479
- Davide Domini, Gianluca Aguzzi, Lukas Esterle and Mirko Viroli, 12 Aug 2025, FBFL: A Field-Based Coordination Approach for Data Heterogeneity in Federated Learning, https://arxiv.org/abs/2502.08577
- Ratun Rahman, 12 Aug 2025, Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence, https://arxiv.org/abs/2504.17703
- Zhekai Zhou, Shudong Liu, Zhaokun Zhou, Yang Liu, Qiang Yang, Yuesheng Zhu, Guibo Luo, 7 Aug 2025, FedMP: Tackling Medical Feature Heterogeneity in Federated Learning from a Manifold Perspective, https://arxiv.org/abs/2508.09174
- Jinghong Tan, Zhian Liu, Kun Guo, Mingxiong Zhao, 7 Aug 2025, Long-Term Client Selection for Federated Learning with Non-IID Data: A Truthful Auction Approach, https://arxiv.org/abs/2508.09181
- Zikai Zhang, Suman Rath, Jiahao Xu, Tingsong Xiao, 13 Aug 2025, Federated Learning for Smart Grid: A Survey on Applications and Potential Vulnerabilities, https://arxiv.org/abs/2409.10764
- Heqiang Wang, Weihong Yang, Xiaoxiong Zhong, Jia Zhou, Fangming Liu, Weizhe Zhang, 15 Aug 2025, Mitigating Modality Quantity and Quality Imbalance in Multimodal Online Federated Learning, https://arxiv.org/abs/2508.11159
- Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Christopher G. Brinton, Tatiana Likhomanenko, 14 Aug 2025, Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping, https://arxiv.org/abs/2310.00098
- You Hak Lee, Xiaofan Yu, Quanling Zhao, Flavio Ponzina, Tajana Rosing, 16 Aug 2025, FedUHD: Unsupervised Federated Learning using Hyperdimensional Computing, https://arxiv.org/abs/2508.12021
- Zahra Kharaghani, Ali Dadras, Tommy L\"ofstedt, 16 Aug 2025, Fairness Regularization in Federated Learning, https://arxiv.org/abs/2508.12042
- Emmanouil Kritharakis, Dusan Jakovetic, Antonios Makris, Konstantinos Tserpes, 18 Aug 2025, Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering, https://arxiv.org/abs/2508.12672
- Yuhao Zhou, Jindi Lv, Yuxin Tian, Dan Si, Qing Ye, Jiancheng Lv, 18 Aug 2025, Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach, https://arxiv.org/abs/2508.12673
- Beomseok Seo, Kichang Lee, JaeYeon Park, 18 Aug 2025, FedUNet: A Lightweight Additive U-Net Module for Federated Learning with Heterogeneous Models, https://arxiv.org/abs/2508.12740
- Yue Xia, Tayyebeh Jahani-Nezhad and Rawad Bitar, 18 Aug 2025, Fed-DPRoC:Communication-Efficient Differentially Private and Robust Federated Learning, https://arxiv.org/abs/2508.12978
- Xiaojin Zhang, Mingcong Xu, Yiming Li, Wei Chen, Qiang Yang, 16 Aug 2025, Deciphering the Interplay between Attack and Protection Complexity in Privacy-Preserving Federated Learning, https://arxiv.org/abs/2508.11907
- Ratun Rahman, Atit Pokharel, Md Raihan Uddin, and Dinh C. Nguyen, 17 Aug 2025, SimQFL: A Quantum Federated Learning Simulator with Real-Time Visualization, https://arxiv.org/abs/2508.12477
- Jihyun Lim, Junhyuk Jo, Tuo Zhang, Sunwoo Lee, 17 Aug 2025, Enabling Weak Client Participation via On-device Knowledge Distillation in Heterogenous Federated Learning, https://arxiv.org/abs/2503.11151
- Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.23176
- SeungBum Ha, Taehwan Lee, Jiyoun Lim, Sung Whan Yoon, 17 Aug 2025, Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation, https://arxiv.org/abs/2412.10436
- Wenxuan Ye, Xueli An, Onur Ayan, Junfan Wang, Xueqiang Yan, Georg Carle, 19 Aug 2025, Towards a Larger Model via One-Shot Federated Learning on Heterogeneous Client Models, https://arxiv.org/abs/2508.13625
- Wenfei Liang, Yanan Zhao, Rui She, Yiming Li and Wee Peng Tay, 19 Aug 2025, Personalized Subgraph Federated Learning with Sheaf Collaboration, https://arxiv.org/abs/2508.13642
- Jie Shi, Arno P. J. M. Siebes, Siamak Mehrkanoon, 19 Aug 2025, Trans-XFed: An Explainable Federated Learning for Supply Chain Credit Assessment, https://arxiv.org/abs/2508.13715
- Sergey Skorik, Vladislav Dorofeev, Gleb Molodtsov, Aram Avetisyan, Dmitry Bylinkin, Daniil Medyakov, Aleksandr Beznosikov, 19 Aug 2025, Communication-Efficient Federated Learning with Adaptive Number of Participants, https://arxiv.org/abs/2508.13803
- Daniel M. Jimenez-Gutierrez, Yelizaveta Falkouskaya, Jose L. Hernandez-Ramos, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti, 19 Aug 2025, On the Security and Privacy of Federated Learning: A Survey with Attacks, Defenses, Frameworks, Applications, and Future Directions, https://arxiv.org/abs/2508.13730
- Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, Giulia Fanti, 19 Aug 2025, POPri: Private Federated Learning using Preference-Optimized Synthetic Data, https://arxiv.org/abs/2504.16438
- Nazatul Haque Sultan, Yan Bo, Yansong Gao, Seyit Camtepe, Arash Mahboubi, Hang Thanh Bui, Aufeef Chauhan, Hamed Aboutorab, Michael Bewong, Dineshkumar Singh, Praveen Gauravaram, Rafiqul Islam, and Sharif Abuadbba, 19 Aug 2025, Setup Once, Secure Always: A Single-Setup Secure Federated Learning Aggregation Protocol with Forward and Backward Secrecy for Dynamic Users, https://arxiv.org/abs/2502.08989
- Tao Shen, Zexi Li, Didi Zhu, Ziyu Zhao, Chao Wu, Fei Wu, 20 Aug 2025, FedEve: On Bridging the Client Drift and Period Drift for Cross-device Federated Learning, https://arxiv.org/abs/2508.14539
- Yichen Li, Xiuying Wang, Wenchao Xu, Haozhao Wang, Yining Qi, Jiahua Dong, Ruixuan Li, 20 Aug 2025, Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning, https://arxiv.org/abs/2507.10348
- Miha O\v{z}bot, Igor \v{S}krjanc, 21 Aug 2025, Federated Learning based on Self-Evolving Gaussian Clustering, https://arxiv.org/abs/2508.15393
- Bingguang Lu, Hongsheng Hu, Yuantian Miao, Shaleeza Sohail, Chaoxiang He, Shuo Wang, Xiao Chen, 21 Aug 2025, BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning, https://arxiv.org/abs/2508.15541
- Lishan Yang, Wei Emma Zhang, Quan Z. Sheng, Lina Yao, Weitong Chen and Ali Shakeri, 21 Aug 2025, MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning, https://arxiv.org/abs/2505.06911
- Dinh C. Nguyen, Md Raihan Uddin, Shaba Shaon, Ratun Rahman, Octavia Dobre, and Dusit Niyato, 21 Aug 2025, Quantum Federated Learning: A Comprehensive Survey, https://arxiv.org/abs/2508.15998
- Renxuan Tan, Rongpeng Li, Xiaoxue Yu, Xianfu Chen, Xing Xu, and Zhifeng Zhao, 22 Aug 2025, Pareto Actor-Critic for Communication and Computation Co-Optimization in Non-Cooperative Federated Learning Services, https://arxiv.org/abs/2508.16037
- Guangyu Sun, Jingtao Li, Weiming Zhuang, Chen Chen, Chen Chen, Lingjuan Lyu, 22 Aug 2025, Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation, https://arxiv.org/abs/2508.16568
- Bibo Wu, Fang Fang, Ming Zeng and Xianbin Wang, 17 Aug 2025, Straggler-Resilient Federated Learning over A Hybrid Conventional and Pinching Antenna Network, https://arxiv.org/abs/2508.15821
- Zhenan Fan, Huang Fang, Xinglu Wang, Zirui Zhou, Jian Pei, Michael P. Friedlander, Yong Zhang, 21 Aug 2025, Fair and efficient contribution valuation for vertical federated learning, https://arxiv.org/abs/2201.02658
- Xinyu Zhou, Jun Zhao, Huimei Han, Claude Guet, 22 Aug 2025, Joint Optimization of Energy Consumption and Completion Time in Federated Learning, https://arxiv.org/abs/2209.14900
- Seunghun Yu, Jin-Hyun Ahn, Joonhyuk Kang, 22 Aug 2025, FedEFC: Federated Learning Using Enhanced Forward Correction Against Noisy Labels, https://arxiv.org/abs/2504.05615
- Tao Liu, Xuehe Wang, 23 Aug 2025, Degree of Staleness-Aware Data Updating in Federated Learning, https://arxiv.org/abs/2508.16931
- Jiaqi Zhu, Bikramjit Das, Yong Xie, Nikolaos Pappas, and Howard H. Yang, 25 Aug 2025, Rethinking Federated Learning Over the Air: The Blessing of Scaling Up, https://arxiv.org/abs/2508.17697
- Ming Yang, Dongrun Li, Xin Wang, Xiaoyang Yu, Xiaoming Wu, Shibo He, 25 Aug 2025, Choice Outweighs Effort: Facilitating Complementary Knowledge Fusion in Federated Learning via Re-calibration and Merit-discrimination, https://arxiv.org/abs/2508.17954
- Emmanouil Kritharakis and Antonios Makris and Dusan Jakovetic and Konstantinos Tserpes, 25 Aug 2025, FedGreed: A Byzantine-Robust Loss-Based Aggregation Method for Federated Learning, https://arxiv.org/abs/2508.18060
- Po-Hsien Yu, Yu-Syuan Tseng, and Shao-Yi Chien, 24 Aug 2025, FedKLPR: Personalized Federated Learning for Person Re-Identification with Adaptive Pruning, https://arxiv.org/abs/2508.17431
- Bishwamittra Ghosh, Debabrota Basu, Fu Huazhu, Wang Yuan, Renuga Kanagavelu, Jiang Jin Peng, Liu Yong, Goh Siow Mong Rick, and Wei Qingsong, 23 Aug 2025, History-Aware and Dynamic Client Contribution in Federated Learning, https://arxiv.org/abs/2403.07151
- Ruofan Jia, Weiying Xie, Jie Lei, Jitao Ma, Haonan Qin, Leyuan Fang, 25 Aug 2025, HeteroTune: Efficient Federated Learning for Large Heterogeneous Models, https://arxiv.org/abs/2411.16796
- Chao Feng, Yuanzhe Gao, Alberto Huertas Celdran, Gerome Bovet, Burkhard Stiller, 25 Aug 2025, From Models to Network Topologies: A Topology Inference Attack in Decentralized Federated Learning, https://arxiv.org/abs/2501.03119
- Harish Karthikeyan and Antigoni Polychroniadou, 24 Aug 2025, $\mathsf{OPA}$: One-shot Private Aggregation with Single Client Interaction and its Applications to Federated Learning, https://arxiv.org/abs/2410.22303
- Zhengyu Wu, Xunkai Li, Yinlin Zhu, Zekai Chen, Guochen Yan, Yanyu Yan, Hao Zhang, Yuming Ai, Xinmo Jin, Rong-Hua Li, and Guoren Wang, 22 Jul 2025, A Comprehensive Data-centric Overview of Federated Graph Learning, https://arxiv.org/abs/2507.16541
- Minh Ngoc Luu, Minh-Duong Nguyen, Ebrahim Bedeer, Van Duc Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham, 22 Jul 2025, Energy-Efficient and Real-Time Sensing for Federated Continual Learning via Sample-Driven Control, https://arxiv.org/abs/2310.07497
- Zhongzheng Yuan, Lianshuai Guo, Xunkai Li, Yinlin Zhu, Wenyu Wang, Meixia Qu, 24 Jul 2025, FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting, https://arxiv.org/abs/2507.18219
- Xu Zhang, Zhenyuan Yuan, Minghui Zhu, 18 Jul 2025, Byzantine-resilient federated online learning for Gaussian process regression, https://arxiv.org/abs/2507.14021
- Ukjo Hwang, Songnam Hong, 19 Jul 2025, Federated Reinforcement Learning in Heterogeneous Environments, https://arxiv.org/abs/2507.14487
- Yujia Mu, Cong Shen, 21 Jul 2025, Federated Split Learning with Improved Communication and Storage Efficiency, https://arxiv.org/abs/2507.15816
- Zihao Hu (1), Jia Yan (2), Ying-Jun Angela Zhang (1) ((1) The Chinese University of Hong Kong, (2) The Hong Kong University of Science and Technology (Guangzhou)), 6 Aug 2025, Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation, https://arxiv.org/abs/2508.06557
- Jingmao Li, Yuanxing Chen, Shuangge Ma, Kuangnan Fang, 8 Aug 2025, Federated Online Learning for Heterogeneous Multisource Streaming Data, https://arxiv.org/abs/2508.06652
- Abhishek Sawaika, Swetang Krishna, Tushar Tomar, Durga Pritam Suggisetti, Aditi Lal, Tanmaya Shrivastav, Nouhaila Innan, Muhammad Shafique, 15 Jul 2025, A Privacy-Preserving Federated Framework with Hybrid Quantum-Enhanced Learning for Financial Fraud Detection, https://arxiv.org/abs/2507.22908
- Danni Peng, Yuan Wang, Kangning Cai, Peiyan Ning, Jiming Xu, Yong Liu, Rick Siow Mong Goh, Qingsong Wei, Huazhu Fu, 14 Aug 2025, Improving Learning of New Diseases through Knowledge-Enhanced Initialization for Federated Adapter Tuning, https://arxiv.org/abs/2508.10299
- Xinrui Li, Qilin Fan, Tianfu Wang, Kaiwen Wei, Ke Yu, Xu Zhang, 14 Aug 2025, GraphFedMIG: Tackling Class Imbalance in Federated Graph Learning via Mutual Information-Guided Generation, https://arxiv.org/abs/2508.10471
- Zekai Chen, Xunkai Li, Yinlin Zhu, Rong-Hua Li, Guoren Wang, 14 Aug 2025, Rethinking Client-oriented Federated Graph Learning, https://arxiv.org/abs/2504.14188
- Chengzhuo Han, 28 Jul 2025, Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems, https://arxiv.org/abs/2507.20444
- Yebo Wu, Jingguang Li, Zhijiang Guo and Li Li, 31 Jul 2025, Learning Like Humans: Resource-Efficient Federated Fine-Tuning through Cognitive Developmental Stages, https://arxiv.org/abs/2508.00041
- Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King, Yifei Zhang, 2 Aug 2025, Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning, https://arxiv.org/abs/2508.01251
- Cui Miao, Tao Chang, Meihan Wu, Hongbin Xu, Chun Li, Ming Li, Xiaodong Wang, 4 Aug 2025, FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation, https://arxiv.org/abs/2508.02190
- Shuo Wang and Keke Gai and Jing Yu and Liehuang Zhu and Qi Wu, 5 Aug 2025, Vertical Federated Continual Learning via Evolving Prototype Knowledge, https://arxiv.org/abs/2502.09152
- Zihan Tan, Suyuan Huang, Guancheng Wan, Wenke Huang, He Li and Mang Ye, 5 Aug 2025, S2FGL: Spatial Spectral Federated Graph Learning, https://arxiv.org/abs/2507.02409
- Shengchao Chen, Guodong Long, Jing Jiang, 6 Aug 2025, FeDaL: Federated Dataset Learning for Time Series Foundation Models, https://arxiv.org/abs/2508.04045
- Jiansheng Rao, Jiayi Li, Zhizhi Gong, Soummya Kar, Haoxuan Li, 7 Aug 2025, Federated Multi-Objective Learning with Controlled Pareto Frontiers, https://arxiv.org/abs/2508.05424
- Junhyeog Yun, Minui Hong, Gunhee Kim, 8 Aug 2025, FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields, https://arxiv.org/abs/2508.06301
- Fuyao Zhang, Xinyu Yan, Tiantong Wu, Wenjie Li, Tianxiang Chen, Yang Cao, Ran Yan, Longtao Huang, Wei Yang Bryan Lim, Qiang Yang, 12 Aug 2025, Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models, https://arxiv.org/abs/2508.08875
- Hao Yu, Xin Yang, Boyang Fan, Xuemei Cao, Hanlin Gu, Lixin Fan, Qiang Yang, 13 Aug 2025, Large-Small Model Collaborative Framework for Federated Continual Learning, https://arxiv.org/abs/2508.09489
- Lianshuai Guo, Zhongzheng Yuan, Xunkai Li, Yinlin Zhu, Meixia Qu, Wenyu Wang, 15 Aug 2025, DFed-SST: Building Semantic- and Structure-aware Topologies for Decentralized Federated Graph Learning, https://arxiv.org/abs/2508.11530
- Marcel Gregoriadis, Jingwei Kang, Johan Pouwelse, 17 Aug 2025, A Large-Scale Web Search Dataset for Federated Online Learning to Rank, https://arxiv.org/abs/2508.12353
- Dingzhu Wen, Sijing Xie, Xiaowen Cao, Yuanhao Cui, Jie Xu, Yuanming Shi, and Shuguang Cui, 21 Aug 2025, Integrated Sensing, Communication, and Computation for Over-the-Air Federated Edge Learning, https://arxiv.org/abs/2508.15185
- Hamta Sedghani, Abednego Wamuhindo Kambale, Federica Filippini, Francesca Palermo, Diana Trojaniello, Danilo Ardagna, 24 Aug 2025, Federated Reinforcement Learning for Runtime Optimization of AI Applications in Smart Eyewears, https://arxiv.org/abs/2508.17262
- Omar Bekdache and Naresh Shanbhag, 24 Aug 2025, FedERL: Federated Efficient and Robust Learning for Common Corruptions, https://arxiv.org/abs/2508.17381
Mixed-Precision Training
Research on using mixtures of precision in the underlying data values and weights, when performing LLM training:
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
- Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
- Jiahang Zhou, Yanyu Chen, Zicong Hong, Wuhui Chen, Yue Yu, Tao Zhang, Hui Wang, Chuanfu Zhang, Zibin Zheng, 5 Jan 2024, Training and Serving System of Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2401.02643
- Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, Feb 2018, Mixed Precision Training, https://arxiv.org/abs/1710.03740
- 18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
- Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
- Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, 6 Jan 2024 (v2), Understanding LLMs: A Comprehensive Overview from Training to Inference, https://arxiv.org/abs/2401.02038
- Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
- Kaiyuan Tian, Linbo Qiao, Baihui Liu, Gongqingjian Jiang, Dongsheng Li, 21 Jan 2025, A Survey on Memory-Efficient Large-Scale Model Training in AI for Science, https://arxiv.org/abs/2501.11847
- Minhajul Hoque, Jan 4, 2025, DeepSeek V3: How They Achieved Big Results with Small Compute, https://ai.plainenglish.io/deepseek-v3-how-they-achieved-big-results-with-small-compute-fb694606d59a (DeepSeek optimizations included FP8 quantization with outlier handling, attention and KV cache optimization via Multi-Head Latent Attention (MHLA), and multi-token decoding.)
- Nandini Lokesh Reddy, Jan 2025, DeepSeek: Bridging Performance and Efficiency in Modern AI, https://medium.com/@nandinilreddy/deepseek-bridging-performance-and-efficiency-in-modern-ai-106181a85693
- Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf
Model Merging
Model merging is a technique whereby two separate LLMs can be merged together to create a new model with the combined expertise of the two individual models. Surprisingly, the two sets of weights can simply be combined, such as by addition.
Research papers on model merging:
- Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao, 15 Aug 2024 (v2), Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities, https://arxiv.org/abs/2408.07666 Project: https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications (An extensive review of merging two models.)
- Cameron R. Wolfe, Sep 16, 2024, Model Merging: A Survey: From modern LLM applications to the early days of machine learning research, https://cameronrwolfe.substack.com/p/model-merging
- Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu, 2 Oct 2024, Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models, https://arxiv.org/abs/2410.01335
- Yuxuan Zhang, Ruizhe Li, 2 Oct 2024, DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models, https://arxiv.org/abs/2410.01497 https://github.com/MeCuping/DLP-LoRA (Merging multiple LoRA adapters for parallel inference.)
- Sean Michael Kerner, October 23, 2024, Differentiable Adaptive Merging is accelerating SLMs for enterprises, https://venturebeat.com/ai/differentiable-adaptive-merging-is-accelerating-slms-for-enterprises/
- Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang, 18 Dec 2024, Channel Merging: Preserving Specialization for Merged Experts, https://arxiv.org/abs/2412.15283
- Sakana.ai, March 21, 2024, Evolving New Foundation Models: Unleashing the Power of Automating Model Development, https://sakana.ai/evolutionary-model-merge/
- Sakana.ai, December 03, 2024, Population-based Model Merging via Quality Diversity, https://sakana.ai/cycleqd/
- Ayoub Ben Chaliah, Hela Dellagi, 31 Dec 2024, Superposition in Transformers: A Novel Way of Building Mixture of Experts, https://arxiv.org/abs/2501.00530 (Effectively model merging to combine a base model and its fine-tuned version, to avoid catastrophic forgetting.)
- Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
- Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis, 31 Jan 2025, BTS: Harmonizing Specialized Experts into a Generalist LLM, https://arxiv.org/abs/2502.00075 (Combining multiple fine-tuned expert models via "layer stitching").
- Yuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu, 4 Feb 2025 (v2), MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs, https://arxiv.org/abs/2502.00997
- Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu, 11 Feb 2025 (v2), Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing, https://arxiv.org/abs/2502.04411
- Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
- Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan, 16 May 2025, RanDeS: Randomized Delta Superposition for Multi-Model Compression, https://arxiv.org/abs/2505.11204
- Ryota Miyano, Yuki Arase, 30 May 2025, Adaptive LoRA Merge with Parameter Pruning for Low-Resource Generation, https://arxiv.org/abs/2505.24174
- Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li, Vikram Appia, Emad Barsoum, 22 May 2025, Zebra-Llama: Towards Extremely Efficient Hybrid Models, https://arxiv.org/abs/2505.17272 (Merging SSM and LLM.)
- Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
- Jiang, H., Wang, R., Liang, W. et al. Slerp-Opt: merging large language models via adaptive strategies. J Supercomput 81, 1223 (2025). https://doi.org/10.1007/s11227-025-07727-4 https://link.springer.com/article/10.1007/s11227-025-07727-4
- Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao, 14 Aug 2025, LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint, https://arxiv.org/abs/2502.16770
- Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
- Kotaro Yoshida, Yuji Naraki, Takafumi Horie, Ryotaro Shimizu, Hiroki Naganuma, 2 Aug 2025, DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging, https://arxiv.org/abs/2508.01148
- Xin He, Junxi Shen, Zhenheng Tang, Xiaowen Chu, Bo Li, Ivor W. Tsang, Yew-Soon Ong, 3 Aug 2025, RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging, https://arxiv.org/abs/2508.01784
- The-Hai Nguyen, Dang Huu-Tien, Takeshi Suzuki, and Le-Minh Nguyen, 5 Aug 2025, RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging, https://arxiv.org/abs/2508.03121
- Youngeun Kim, Seunghwan Lee, Aecheon Jung, Bogon Ryu, Sungeun Hong, 7 Aug 2025, Task Vector Quantization for Memory-Efficient Model Merging, https://arxiv.org/abs/2503.06921
- Yingfeng Luo, Dingyang Lin, Junxin Wang, Ziqiang Xu, Kaiyan Chang, Tong Zheng, Bei Li, Anxiang Ma, Tong Xiao, Zhengtao Yu, Jingbo Zhu, 8 Aug 2025, One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging, https://arxiv.org/abs/2508.06163
- Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodol\`a, 8 Aug 2025, ATM: Improving Model Merging by Alternating Tuning and Merging, https://arxiv.org/abs/2411.03055
- Ilja Kuzborskij, Yasin Abbasi Yadkori, 20 Aug 2025, Low-rank bias, weight decay, and model merging in neural networks, https://arxiv.org/abs/2502.17340
- Haris Khan, Shumaila Asif, Sadia Asif, 28 Jul 2025, Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition, https://arxiv.org/abs/2507.20997
- Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub, 19 Aug 2025, Rethinking Weight-Averaged Model-merging, https://arxiv.org/abs/2411.09263
Early Dropout
Early dropout is an LLM training optimizations whereby training computations are skipped by "dropping out early" during a training cycle. In some LLM training research, it is simply called "dropout." However, it should not be confused with "early exiting" which is an LLM inference optimization involving skipping of layers.
Research papers on early dropout in LLM training:
- Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Communications of the ACM, Volume 60, Issue 6, June 2017, pp 84–90, https://doi.org/10.1145/3065386 https://dl.acm.org/doi/10.1145/3065386 PDF: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf Code: http://code.google.com/p/cuda-convnet/ (The early paper that introduced a grouped convolution architecture for multi-GPUs, later the basis of AlexNet, which was a famous image recognition CNN.)
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1), 1929–1958 (2014), https://dl.acm.org/doi/abs/10.5555/2627435.2670313
- Srivastava, Nitish. Improving neural networks with dropout. 2013, Master’s thesis, U. Toronto, https://www.semanticscholar.org/paper/Improving-Neural-Networks-with-Dropout-Srivastava/5d5d4f49d6443c8529a6f5ebef5c499d47a869da PDF: http://www.cs.toronto.edu/~nitish/msc_thesis.pdf
- Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu, 29 Apr 2024 (v2), LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding, https://arxiv.org/abs/2404.16710
- Alperen Gormez, 2024, Efficient Neural Network Inference and Training Using Early Exit Strategies, Ph.D. Thesis, Electrical and Computer Engineering, University of Illinois at Chicago, 2024, https://alperengormez.github.io/assets/phd/agormez_phd_thesis.pdf (Early exit in inference, training, and fine-tuning.)
- Ignacy St\k{e}pka, Nicholas Gisolfi, Kacper Tr\k{e}bacz, Artur Dubrawski, 3 Aug 2025, Mitigating Persistent Client Dropout in Asynchronous Decentralized Federated Learning, https://arxiv.org/abs/2508.01807
- Rita Gonz\'alez-M\'arquez, Philipp Berens, Dmitry Kobak, 5 Aug 2025, Cropping outperforms dropout as an augmentation strategy for training self-supervised text embeddings, https://arxiv.org/abs/2508.03453
- Zhihao Guo, Peng Wang, Zidong Chen, Xiangyu Kong, Yan Lyu, Guanyu Gao, Liangxiu Han, 7 Aug 2025, UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS, https://arxiv.org/abs/2508.04968
- Pablo G. Almeida, Guilherme A. L. Silva, Val\'eria Santos, Gladston Moreira, Pedro Silva and Eduardo Luz, 9 Aug 2025, Deep Learning for School Dropout Detection: A Comparison of Tabular and Graph-Based Models for Predicting At-Risk Students, https://arxiv.org/abs/2508.14057
- Xinhua Chen, Sitao Huang, Cong Guo, Chiyue Wei, Yintao He, Jianyi Zhang, Hai "Hellen" Li, Yiran Chen, 19 Aug 2025, DPad: Efficient Diffusion Language Models with Suffix Dropout, https://arxiv.org/abs/2508.14148
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: