Aussie AI

Gradient Optimizer Research

  • Last Updated 25 April, 2026
  • by David Spuler, Ph.D.

ADAM

RMSprop

AdaDelta

AdaGrad

  • Mahesh Chandra Mukkamala, Matthias Hein, 28 Nov 2017 (v2), Variants of RMSProp and Adagrad with Logarithmic Regret Bounds, https://arxiv.org/abs/1706.05507
  • Rachel Ward, Xiaoxia Wu, Leon Bottou, 19 Apr 2021 (v8), AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, https://arxiv.org/abs/1806.01811
  • Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu, 15 May 2023 (v4), A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration, https://arxiv.org/abs/1808.03408
  • Qian Qian, Xiaoyuan Qian, 9 Jun 2019, The Implicit Bias of AdaGrad on Separable Data, https://arxiv.org/abs/1906.03559
  • Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier, 17 Oct 2022 (v3), A Simple Convergence Proof of Adam and Adagrad, https://arxiv.org/abs/2003.02395
  • Peter Kairouz, Mónica Ribero, Keith Rush, Abhradeep Thakurta, 30 Jan 2021 (v2), Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces, https://arxiv.org/abs/2008.06570
  • Cheik Traoré, Edouard Pauwels, 13 Apr 2021 (v3), Sequential convergence of AdaGrad algorithm for smooth convex optimization, https://arxiv.org/abs/2011.12341
  • Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien, 3 Nov 2021 (v2), SVRG Meets AdaGrad: Painless Variance Reduction, https://arxiv.org/abs/2102.09645
  • Kushal Chakrabarti, Nikhil Chopra, 30 Sep 2021 (v2), Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective, https://arxiv.org/abs/2106.00092
  • Luofeng Liao, Li Shen, Jia Duan, Mladen Kolar, Dacheng Tao, 23 Sep 2022 (v2), Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Optimization, https://arxiv.org/abs/2106.10022
  • Ruinan Jin, Yu Xing, Xingkang He, 26 Jan 2022, On the Convergence of mSGD and AdaGrad for Stochastic Optimization, https://arxiv.org/abs/2201.11204
  • Ali Kavis, Kfir Yehuda Levy, Volkan Cevher, 6 Apr 2022, High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize, https://arxiv.org/abs/2204.02833
  • Zijian Liu, Ta Duy Nguyen, Alina Ene, Huy L. Nguyen, 4 Oct 2023 (v4), On the Convergence of AdaGrad(Norm) on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration, https://arxiv.org/abs/2209.14827
  • Amit Attia, Tomer Koren, 11 Jun 2023 (v2), SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance, https://arxiv.org/abs/2302.08783
  • R. Selvaraj, T. Satheesh, V. Suresh, V. Yathavaraj, 30 Apr 2023, Optimized Machine Learning for CHD Detection using 3D CNN-based Segmentation, Transfer Learning and Adagrad Optimization, https://arxiv.org/abs/2305.00411
  • Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen, 28 Sep 2023 (v2), Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions, https://arxiv.org/abs/2305.18471
  • Yusu Hong, Junhong Lin, 13 Sep 2024 (v2), Revisiting Convergence of AdaGrad with Relaxed Assumptions, https://arxiv.org/abs/2402.13794
  • Sayantan Choudhury, Nazarii Tupitsa, Nicolas Loizou, Samuel Horvath, Martin Takac, Eduard Gorbunov, 5 Jun 2024 (v2), Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad, https://arxiv.org/abs/2403.02648
  • Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Wei Lu (LMI), Bruno Portier (LMI), 3 May 2024, A Full Adagrad algorithm with O(Nd) operations, https://arxiv.org/abs/2405.01908
  • Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov, 6 Jun 2024, Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed, https://arxiv.org/abs/2406.04443
  • Anton Rodomanov, Xiaowen Jiang, Sebastian Stich, 10 Jun 2024, Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction, https://arxiv.org/abs/2406.06398
  • Yuxing Liu, Rui Pan, Tong Zhang, 14 Oct 2024 (v2), AdaGrad under Anisotropic Smoothness https://arxiv.org/abs/2406.15244
  • Serge Gratton, Sadok Jerad, Philippe L. Toint, 1 Nov 2024 (v3), Complexity of Adagrad and other first-order methods for nonconvex optimization problems with bounds constraints, https://arxiv.org/abs/2406.15793
  • Ruinan Jin, Xiaoyu Wang, Baoxiang Wang, 8 Sep 2024, Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis, https://arxiv.org/abs/2409.05023
  • R Abdulkadirov, P Lyakhov, N Nagornov - Mathematics, 2023, Survey of optimization algorithms in modern neural networks, https://doi.org/10.3390/math11112466 https://www.mdpi.com/2227-7390/11/11/2466
  • Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
  • Yuanzhe Tao, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu, 27 Dec 2024, Towards Simple and Provable Parameter-Free Adaptive Gradient Methods, https://arxiv.org/abs/2412.19444?
  • Minxin Zhang, Yuxuan Liu, Hayden Schaeffer, 3 Sep 2025, AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates, https://arxiv.org/abs/2509.02981
  • Carlos Heredia, 13 Oct 2025, Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations, https://arxiv.org/abs/2411.09734

AMSGrad

Stochastic Gradient Descent (SGD)

Research on Gradient Optimizers

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: