Aussie AI

Gradient Optimizer Research

  • Last Updated 29 August, 2025
  • by David Spuler, Ph.D.

ADAM

RMSprop

AdaDelta

AdaGrad

  • Mahesh Chandra Mukkamala, Matthias Hein, 28 Nov 2017 (v2), Variants of RMSProp and Adagrad with Logarithmic Regret Bounds, https://arxiv.org/abs/1706.05507
  • Rachel Ward, Xiaoxia Wu, Leon Bottou, 19 Apr 2021 (v8), AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, https://arxiv.org/abs/1806.01811
  • Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu, 15 May 2023 (v4), A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration, https://arxiv.org/abs/1808.03408
  • Qian Qian, Xiaoyuan Qian, 9 Jun 2019, The Implicit Bias of AdaGrad on Separable Data, https://arxiv.org/abs/1906.03559
  • Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier, 17 Oct 2022 (v3), A Simple Convergence Proof of Adam and Adagrad, https://arxiv.org/abs/2003.02395
  • Peter Kairouz, Mónica Ribero, Keith Rush, Abhradeep Thakurta, 30 Jan 2021 (v2), Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces, https://arxiv.org/abs/2008.06570
  • Cheik Traoré, Edouard Pauwels, 13 Apr 2021 (v3), Sequential convergence of AdaGrad algorithm for smooth convex optimization, https://arxiv.org/abs/2011.12341
  • Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien, 3 Nov 2021 (v2), SVRG Meets AdaGrad: Painless Variance Reduction, https://arxiv.org/abs/2102.09645
  • Kushal Chakrabarti, Nikhil Chopra, 30 Sep 2021 (v2), Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective, https://arxiv.org/abs/2106.00092
  • Luofeng Liao, Li Shen, Jia Duan, Mladen Kolar, Dacheng Tao, 23 Sep 2022 (v2), Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Optimization, https://arxiv.org/abs/2106.10022
  • Ruinan Jin, Yu Xing, Xingkang He, 26 Jan 2022, On the Convergence of mSGD and AdaGrad for Stochastic Optimization, https://arxiv.org/abs/2201.11204
  • Ali Kavis, Kfir Yehuda Levy, Volkan Cevher, 6 Apr 2022, High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize, https://arxiv.org/abs/2204.02833
  • Zijian Liu, Ta Duy Nguyen, Alina Ene, Huy L. Nguyen, 4 Oct 2023 (v4), On the Convergence of AdaGrad(Norm) on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration, https://arxiv.org/abs/2209.14827
  • Amit Attia, Tomer Koren, 11 Jun 2023 (v2), SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance, https://arxiv.org/abs/2302.08783
  • R. Selvaraj, T. Satheesh, V. Suresh, V. Yathavaraj, 30 Apr 2023, Optimized Machine Learning for CHD Detection using 3D CNN-based Segmentation, Transfer Learning and Adagrad Optimization, https://arxiv.org/abs/2305.00411
  • Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen, 28 Sep 2023 (v2), Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions, https://arxiv.org/abs/2305.18471
  • Yusu Hong, Junhong Lin, 13 Sep 2024 (v2), Revisiting Convergence of AdaGrad with Relaxed Assumptions, https://arxiv.org/abs/2402.13794
  • Sayantan Choudhury, Nazarii Tupitsa, Nicolas Loizou, Samuel Horvath, Martin Takac, Eduard Gorbunov, 5 Jun 2024 (v2), Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad, https://arxiv.org/abs/2403.02648
  • Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Wei Lu (LMI), Bruno Portier (LMI), 3 May 2024, A Full Adagrad algorithm with O(Nd) operations, https://arxiv.org/abs/2405.01908
  • Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov, 6 Jun 2024, Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed, https://arxiv.org/abs/2406.04443
  • Anton Rodomanov, Xiaowen Jiang, Sebastian Stich, 10 Jun 2024, Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction, https://arxiv.org/abs/2406.06398
  • Yuxing Liu, Rui Pan, Tong Zhang, 14 Oct 2024 (v2), AdaGrad under Anisotropic Smoothness https://arxiv.org/abs/2406.15244
  • Serge Gratton, Sadok Jerad, Philippe L. Toint, 1 Nov 2024 (v3), Complexity of Adagrad and other first-order methods for nonconvex optimization problems with bounds constraints, https://arxiv.org/abs/2406.15793
  • Ruinan Jin, Xiaoyu Wang, Baoxiang Wang, 8 Sep 2024, Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis, https://arxiv.org/abs/2409.05023
  • R Abdulkadirov, P Lyakhov, N Nagornov - Mathematics, 2023, Survey of optimization algorithms in modern neural networks, https://doi.org/10.3390/math11112466 https://www.mdpi.com/2227-7390/11/11/2466
  • Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
  • Yuanzhe Tao, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu, 27 Dec 2024, Towards Simple and Provable Parameter-Free Adaptive Gradient Methods, https://arxiv.org/abs/2412.19444?

AMSGrad

Stochastic Gradient Descent (SGD)

Research on Gradient Optimizers

  • Shohei Taniguchi, Keno Harada, Gouki Minegishi, Yuta Oshima, Seong Cheol Jeong, Go Nagahara, Tomoshi Iiyama, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo, 5 Nov 2024, ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate, Neural Information Processing Systems (NeurIPS 2024), https://arxiv.org/abs/2411.02853 https://github.com/iShohei220/adopt
  • Diederik P. Kingma, Jimmy Ba, 30 Jan 2017 (v9), Adam: A Method for Stochastic Optimization, https://arxiv.org/abs/1412.6980
  • Jun-Kun Wang, Xiaoyun Li, Belhal Karimi, Ping Li, 3 Nov 2020 (v3), An Optimistic Acceleration of AMSGrad for Nonconvex Optimization, https://arxiv.org/abs/1903.01435
  • Tran Thi Phuong, Le Trieu Phong, 31 Oct 2019 (v4), On the Convergence Proof of AMSGrad and a New Version, https://arxiv.org/abs/1904.03590
  • Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, 19 Apr 2019, On the Convergence of Adam and Beyond, https://arxiv.org/abs/1904.09237
  • Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky, 2012, Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude, https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
  • Mahesh Chandra Mukkamala, Matthias Hein, 28 Nov 2017 (v2), Variants of RMSProp and Adagrad with Logarithmic Regret Bounds, https://arxiv.org/abs/1706.05507
  • Thomas Kurbiel, Shahrzad Khaleghian, 6 Aug 2017, Training of Deep Neural Networks based on Distance Measures using RMSProp, https://arxiv.org/abs/1708.01911
  • Mohammad Emtiyaz Khan, Zuozhu Liu, Voot Tangkaratt, Yarin Gal, 4 Dec 2017, Vprop: Variational Inference using RMSprop, https://arxiv.org/abs/1712.01038
  • Soham De, Anirbit Mukherjee, Enayat Ullah, 20 Nov 2018 (v3), Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration https://arxiv.org/abs/1807.06766
  • Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu, 25 Jun 2019 (v3), A Sufficient Condition for Convergences of Adam and RMSProp, https://arxiv.org/abs/1811.09358
  • Huan Li, Zhouchen Lin 15 Apr 2024 (v3), On the O(d√T1/4) Convergence Rate of RMSProp and Its Momentum Extension Measured by ℓ1 Norm, https://arxiv.org/abs/2402.00389
  • Qi Zhang, Yi Zhou, Shaofeng Zou, 3 Apr 2024 (v2), Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance, https://arxiv.org/abs/2404.01436
  • Bilel Bensaid (CEA-CESTA, IMB), Gaël Poëtte (CEA-CESTA), Rodolphe Turpault (IMB), 22 Jul 2024, Convergence of the Iterates for Momentum and RMSProp for Local Smooth Functions: Adaptation is the Key, https://arxiv.org/abs/2407.15471
  • Patrick McNamee, Zahra Nili Ahmadabadi, 18 Sep 2024, Adaptive Extremum Seeking Control via the RMSprop Optimizer, https://arxiv.org/abs/2409.12290
  • Rachel Ward, Xiaoxia Wu, Leon Bottou, 19 Apr 2021 (v8), AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, https://arxiv.org/abs/1806.01811
  • Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu, 15 May 2023 (v4), A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration, https://arxiv.org/abs/1808.03408
  • Qian Qian, Xiaoyuan Qian, 9 Jun 2019, The Implicit Bias of AdaGrad on Separable Data, https://arxiv.org/abs/1906.03559
  • Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier, 17 Oct 2022 (v3), A Simple Convergence Proof of Adam and Adagrad, https://arxiv.org/abs/2003.02395
  • Peter Kairouz, Mónica Ribero, Keith Rush, Abhradeep Thakurta, 30 Jan 2021 (v2), Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces, https://arxiv.org/abs/2008.06570
  • Cheik Traoré, Edouard Pauwels, 13 Apr 2021 (v3), Sequential convergence of AdaGrad algorithm for smooth convex optimization, https://arxiv.org/abs/2011.12341
  • Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien, 3 Nov 2021 (v2), SVRG Meets AdaGrad: Painless Variance Reduction, https://arxiv.org/abs/2102.09645
  • Kushal Chakrabarti, Nikhil Chopra, 30 Sep 2021 (v2), Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective, https://arxiv.org/abs/2106.00092
  • Luofeng Liao, Li Shen, Jia Duan, Mladen Kolar, Dacheng Tao, 23 Sep 2022 (v2), Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Optimization, https://arxiv.org/abs/2106.10022
  • Ruinan Jin, Yu Xing, Xingkang He, 26 Jan 2022, On the Convergence of mSGD and AdaGrad for Stochastic Optimization, https://arxiv.org/abs/2201.11204
  • Ali Kavis, Kfir Yehuda Levy, Volkan Cevher, 6 Apr 2022, High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize, https://arxiv.org/abs/2204.02833
  • Zijian Liu, Ta Duy Nguyen, Alina Ene, Huy L. Nguyen, 4 Oct 2023 (v4), On the Convergence of AdaGrad(Norm) on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration, https://arxiv.org/abs/2209.14827
  • Amit Attia, Tomer Koren, 11 Jun 2023 (v2), SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance, https://arxiv.org/abs/2302.08783
  • R. Selvaraj, T. Satheesh, V. Suresh, V. Yathavaraj, 30 Apr 2023, Optimized Machine Learning for CHD Detection using 3D CNN-based Segmentation, Transfer Learning and Adagrad Optimization, https://arxiv.org/abs/2305.00411
  • Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen, 28 Sep 2023 (v2), Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions, https://arxiv.org/abs/2305.18471
  • Yusu Hong, Junhong Lin, 13 Sep 2024 (v2), Revisiting Convergence of AdaGrad with Relaxed Assumptions, https://arxiv.org/abs/2402.13794
  • Sayantan Choudhury, Nazarii Tupitsa, Nicolas Loizou, Samuel Horvath, Martin Takac, Eduard Gorbunov, 5 Jun 2024 (v2), Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad, https://arxiv.org/abs/2403.02648
  • Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Wei Lu (LMI), Bruno Portier (LMI), 3 May 2024, A Full Adagrad algorithm with O(Nd) operations, https://arxiv.org/abs/2405.01908
  • Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov, 6 Jun 2024, Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed, https://arxiv.org/abs/2406.04443
  • Anton Rodomanov, Xiaowen Jiang, Sebastian Stich, 10 Jun 2024, Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction, https://arxiv.org/abs/2406.06398
  • Yuxing Liu, Rui Pan, Tong Zhang, 14 Oct 2024 (v2), AdaGrad under Anisotropic Smoothness https://arxiv.org/abs/2406.15244
  • Serge Gratton, Sadok Jerad, Philippe L. Toint, 1 Nov 2024 (v3), Complexity of Adagrad and other first-order methods for nonconvex optimization problems with bounds constraints, https://arxiv.org/abs/2406.15793
  • Ruinan Jin, Xiaoyu Wang, Baoxiang Wang, 8 Sep 2024, Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis, https://arxiv.org/abs/2409.05023
  • Matthew D. Zeiler, 22 Dec 2012, ADADELTA: An Adaptive Learning Rate Method, https://arxiv.org/abs/1212.5701
  • Sebastian Bock, Josef Goppold, Martin Weiß, 27 Apr 2018, An improvement of the convergence proof of the ADAM-Optimizer, https://arxiv.org/abs/1804.10587
  • Jiawei Zhang, Fisher B. Gouza, 10 Mar 2019 (v2), GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization, https://arxiv.org/abs/1805.07500
  • Jiawei Zhang, 11 Mar 2019, Gradient Descent based Optimization Algorithms for Deep Learning Models Training, https://arxiv.org/abs/1903.03614
  • Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong, 10 Mar 2019 (v2), On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization, https://arxiv.org/abs/1808.02941
  • Remi Genet, Hugo Inzirillo, 31 Oct 2024, CaAdam: Improving Adam optimizer using connection aware methods, https://arxiv.org/abs/2410.24216
  • R Abdulkadirov, P Lyakhov, N Nagornov - Mathematics, 2023, Survey of optimization algorithms in modern neural networks, https://doi.org/10.3390/math11112466 https://www.mdpi.com/2227-7390/11/11/2466
  • Nir Barazida, Mar 9, 2022, Distributed training of deep learning models: handling stragglers and latency in synchronous training A review of the challenges in Synchronous distributed training and best solutions for stragglers and high latency https://towardsdatascience.com/stragglers-and-latency-in-synchronous-distributed-training-of-deep-learning-models-43783b0266d9
  • Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz, 21 Mar 2017 (v3), Revisiting Distributed Synchronous SGD, https://arxiv.org/abs/1604.00981
  • Bowen Peng, Jeffrey Quesnelle, Diederik P. Kingma, 29 Nov 2024, DeMo: Decoupled Momentum Optimization, https://arxiv.org/abs/2411.19870 https://github.com/bloc97/DeMo (Extension to ADAM optimizer that greatly reduces network communication in training.)
  • Shaowen Wang, Anan Liu, Jian Xiao, Huan Liu, Yuekui Yang, Cong Xu, Qianqian Pu, Suncong Zheng, Wei Zhang, Jian Li, 29 Nov 2024, CAdam: Confidence-Based Optimization for Online Learning, https://arxiv.org/abs/2411.19647
  • Abulikemu Abuduweili, Changliu Liu, 3 Dec 2024, Revisiting the Initial Steps in Adaptive Gradient Descent Optimization, https://arxiv.org/abs/2412.02153
  • Kwangryeol Park, Seulki Lee, 12 Dec 2024, SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization, https://arxiv.org/abs/2412.08894 (Gradient optimizer Adam optimized using low-rank matrix factorization.)
  • Minghao Xu, Lichuan Xiang, Xu Cai, Hongkai Wen, 17 Dec 2024 (v2), No More Adam: Learning Rate Scaling at Initialization is All You Need, https://arxiv.org/abs/2412.11768
  • Wenhan Jiang, Jinlan Liu, Naimin Zhang, Dongpo Xu, DMAdam: Dual averaging enhanced adaptive gradient method for deep neural networks, Knowledge-Based Systems, 2024, 112886, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2024.112886 https://www.sciencedirect.com/science/article/abs/pii/S095070512401520X
  • Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
  • O. F. Razzouki, A. Charroud, Z. E. Allali, A. Chetouani and N. Aslimani, "A Survey of Advanced Gradient Methods in Machine Learning," 2024 7th International Conference on Advanced Communication Technologies and Networking (CommNet), Rabat, Morocco, 2024, pp. 1-7, doi: 10.1109/CommNet63022.2024.10793249. https://ieeexplore.ieee.org/abstract/document/10793249
  • Shubhankar Bhakta, Utpal Nandi, Chiranjit Changdar, Bachchu Paul, Tapas Si, Rajat Kumar Pal, aMacP: An adaptive optimization algorithm for Deep Neural Network, Neurocomputing, Volume 620, 2025, 129242, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2024.129242 https://www.sciencedirect.com/science/article/abs/pii/S0925231224020137
  • Yuanzhe Tao, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu, 27 Dec 2024, Towards Simple and Provable Parameter-Free Adaptive Gradient Methods, https://arxiv.org/abs/2412.19444?
  • Y. Li et al., 2025, "Q-DADAM: A Quantized Distributed Online Optimization Algorithm With Adaptive Momentum," in IEEE Transactions on Control of Network Systems, doi: 10.1109/TCNS.2025.3526555. https://ieeexplore.ieee.org/abstract/document/10830565
  • Jing Wang, Anna Choromanska, 24 Jan 2025, A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization, https://arxiv.org/abs/2501.14458
  • Shuo Xie, Tianhao Wang, Sashank Reddi, Sanjiv Kumar, Zhiyuan Li, 13 Mar 2025, Structured Preconditioners in Adaptive Optimization: A Unified Analysis, https://arxiv.org/abs/2503.10537
  • Michael Nuñez, July 11, 2025, Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free, https://venturebeat.com/ai/moonshot-ais-kimi-k2-outperforms-gpt-4-in-key-benchmarks-and-its-free/ (One trillion parameters with 32B experts activated each time. Examines new training optimizer MuonClip as more efficient and more stable than variants of AdamW for training.)
  • Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, Yanru Chen, Huabin Zheng, Yibo Liu, Shaowei Liu, Bohong Yin, Weiran He, Han Zhu, Yuzhi Wang, Jianzhou Wang, Mengnan Dong, Zheng Zhang, Yongsheng Kang, Hao Zhang, Xinran Xu, Yutao Zhang, Yuxin Wu, Xinyu Zhou, Zhilin Yang, 24 Feb 2025, Muon is Scalable for LLM Training, https://arxiv.org/abs/2502.16982
  • Tim Tsz-Kit Lau, Qi Long, Weijie Su, 2 Aug 2025, PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective, https://arxiv.org/abs/2505.21799
  • Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Christopher G. Brinton, Tatiana Likhomanenko, 14 Aug 2025, Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping, https://arxiv.org/abs/2310.00098

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: