Aussie AI

Transformer Optimization

  • Last Updated 30 August, 2025
  • by David Spuler, Ph.D.

The Transformer was invented at Google in 2017 and open-sourced by their research group. It became the most widely used AI engine architecture, notably being used in GPT-3 by OpenAI's ChatGPT. Since then, optimization research has taken off. There are two basic ways to optimize Transformer models:

There are various ways to optimize a Transformer with code optimizations. Much research has also been conducted on slight modifications to the architecture of the Transformer to improve latency and throughput in both inference and training.

Transformer Inference Optimizations

See also these articles for further information on Transformer inference optimization:

Transformer Kernel Code Optimizations

Some of the specific kernel optimizations of inference engines include:

  • Attention head caching: Precomputing and caching attention head matrices from already-processed tokens (HuggingFace, 2021). This reduces the auto-regression costs when outputting multiple tokens (which is the usual case). See also attention head pruning
  • KV Caching: This optimization is caching the attention head K and V tensor matrix multiplications during decoding (Intel, 2023). This reduces the number of decoder matrix multiplications. See KV caching research.
  • Padding byte optimizations: Removing padding in the Feed Forward Network tensor/matrix computations (Intel, 2023; also in ByteTransformer by Zhai et al. (2023)); see "zero padding removal". This reduces the total number of multiplications.
  • Attention dimensions: Merging Q, K, and V matrices (of identical size) into a single large matrix for better matrix multiplication throughput (Zhai et al., 2023).
  • Operator fusion and reordering: Reordering reshaping and matmul operations (Intel, 2023). This streamlines some of the arithmetic operations to use more compact low-level libraries. See kernel fusion optimizations.

Kernel Optimization Research Papers

Reference papers on some of the specific code optimizations in Transformer engines:

See also general research on code optimizations.

Transformer General Optimizations

Some of the general classes of optimization techniques for the Transformer architecture include:

And here is a long list of the various other optimizations possible:

For even more, see inference optimizations, Transformer architectural optimizations, and a complete list of Transformer optimizations.

Survey Papers on Transformer Optimization

Review and survey papers on faster Transformer engines:

  • Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami, Full stack optimization of transformer inference: a survey, Feb 2023, arXiv:2302.14017, https://arxiv.org/abs/2302.14017
  • Full Stack Optimization of Transformer Inference: a Survey. Part 2 on Transformer Optimization, A Paper Overview, https://www.nebuly.com/blog/full-stack-optimization-of-transformer-inference-a-survey-part-2
  • Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers: A survey (v2). arXiv preprint arXiv:2009.06732, 2022, https://arxiv.org/abs/2009.06732
  • Krishna Teja Chitty-Venkata, Sparsh Mittal, Murali Emani, Venkatram Vishwanath, Arun K. Somani, A Survey of Techniques for Optimizing Transformer Inference, 2023, arxiv.org July 2023, https://arxiv.org/abs/2307.07982
  • L Papa, P Russo, I Amerini, L Zhou, Sep 2023, A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking, arXiv preprint arXiv:2309.02031, 2023, https://arxiv.org/abs/2309.02031
  • Efficient Attention: Breaking The Quadratic Transformer Bottleneck, 2023 (accessed 8/12/23), https://gwern.net/note/attention, (A regularly updated bibliography of transformer attention optimization papers)

Tips for Transformer Optimization

Articles and papers with general tips on optimizing a Transformer:

Research on Specific Fast Transformers

These papers are on new faster Transformer architectures tested by researchers:

General Research on Transformer Optimization

These papers review Transformer optimization techniques in general.

Kernel Optimizations

  • Soroush Ghodrati, Sean Kinzer, Hanyang Xu, Rohan Mahapatra, Yoonsung Kim, Byung Hoon Ahn, Dong Kai Wang, Lavanya Karthikeyan, Amir Yazdanbakhsh, Jongse Park, Nam Sung Kim, Hadi Esmaeilzadeh, April 2024, Tandem processor: Grappling with emerging operators in neural networks, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, April 2024, Pages 1165–1182, https://doi.org/10.1145/3620665.3640365 https://dl.acm.org/doi/abs/10.1145/3620665.3640365 Code: https://actlab-genesys.github.io (Reviews hardware acceleration of all sub-layer kernel operators, with a focus beyond just GEMM/MatMul operators.)
  • Make LLM Fine-tuning 2x faster with Unsloth and HF TRL, January 10, 2023, Daniel Han-Chen, https://huggingface.co/blog/unsloth-trl Code: https://github.com/huggingface/blog/blob/main/unsloth-trl.md (Optimizes some PyTorch kernels for back-propagation and reduces memory usage in fine-tuning; currently works with Llama and Mistral architectures.)
  • H Shen, H Chang, B Dong, Y Luo, H Meng, Nov 2023, Efficient LLM Inference on CPUs, arXiv preprint arXiv:2311.00502, https://arxiv.org/pdf/2311.00502.pdf Code: https://github.com/intel/intel-extension-for-transformers (INT4 weight quantization with 16-bit activations, and highly optimized kernel with support for AVX2, AVX512, AVX512_VNNI and Advanced Matrix Extensions (AMX), and KV caching, tested on LLamam2 3B to 20B with 20-80ms latency per token.)
  • Piotr Kluska, Adri´an Castello, Florian Scheidegger, A. Cristiano I. Malossi, 2024, QAttn: Efficient GPU Kernels for mixed-precision Vision Transformers https://openaccess.thecvf.com/content/CVPR2024W/eLVM/papers/Kluska_QAttn_Efficient_GPU_Kernels_for_Mixed-precision_Vision_Transformers_CVPRW_2024_paper.pdf
  • Christian Szegedy et al., 2015, Going Deeper with Convolutions, http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf (The GoogleNet paper.)
  • Benjamin Charlier, Jean Feydy, Joan Alexis Glaunès, François-David Collin, Ghislain Durif, 8 Apr 2021 (v2), Kernel Operations on the GPU, with Autodiff, without Memory Overflows, https://arxiv.org/abs/2004.11127 Code: https://www.kernel-operations.io/keops/index.html
  • 8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
  • Alejandro Araya-Núñez, Justin Fernández-Badilla, Daniel González-Vargas, Jimena León-Huertas, Erick-Andrés Obregón-Fonseca, Danny Xie-Li, June, 2024, Proposal of an open-source accelerators library for inference of transformer networks in edge devices based on Linux, Tecnología en Marcha. Vol. 37, special issue. IEEE Latin American Electron Devices Conference (LAEDC), pages 118-125, https://doi.org/10.18845/tm.v37i5.7225 PDF: https://revistas.tec.ac.cr/index.php/tec_marcha/article/download/7225/7076
  • Luchang Li, Sheng Qian, Jie Lu, Lunxi Yuan, Rui Wang, Qin Xie, 5 Jul 2024 (v3), Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs, https://arxiv.org/abs/2403.20041
  • Zheming Jin, July 2024, Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL, Oak Ridge National Laboratory, ORNL/TM-2024/3463, https://info.ornl.gov/sites/publications/Files/Pub217394.pdf
  • Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu, 8 May 2024 (v2), Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security, https://arxiv.org/abs/2401.05459 https://github.com/MobileLLM/Personal_LLM_Agents_Survey
  • Intel, 2024, Get Started with Intel® oneAPI Math Kernel Library, https://www.intel.com/content/www/us/en/docs/onemkl/get-started-guide/2023-0/overview.html
  • T Zhao, 2024, Acceleration of Deep Learning Algorithms with Transformers, https://escholarship.org/uc/item/3419t2z6
  • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
  • Shaobo Ma, Chao Fang, Haikuo Shao, Zhongfeng Wang, 26 Sep 2024, Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores, https://arxiv.org/abs/2409.17870
  • Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Guohao Dai, 6 Oct 2024, Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective, https://arxiv.org/abs/2410.04466
  • J. Bi et al., "Efficient and Fast High-performance Library Generation for Deep Learning Accelerators," in IEEE Transactions on Computers, doi: 10.1109/TC.2024.3475575, https://ieeexplore.ieee.org/abstract/document/10707341 (Finding the most efficient kernel.)
  • Wei Zhao, Anand Jayarajan, Gennady Pekhimenko, 9 Oct 2024, Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning Workloads, https://arxiv.org/abs/2410.07381 (Interleaved scheduling layer for GPU workloads.)
  • Byron (Pin-Lun)Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, 14 Oct 2024, Liger Kernel: Efficient Triton Kernels for LLM Training, https://arxiv.org/abs/2410.10989 http://github.com/linkedin/Liger-Kernel
  • Mingcong Song, Xinru Tang, Fengfan Hou, Jing Li, Wei Wei, Yipeng Ma, Runqiu Xiao, Hongjie Si, Dingcheng Jiang, Shouyi Yin, Yang Hu, Guoping Long, 24 Dec 2024, Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels, https://arxiv.org/abs/2412.18106
  • Andrew Chan, Dec 12, 2024, Fast LLM Inference From Scratch: Pushing single-GPU inference throughput to the edge without libraries, https://andrewkchan.dev/posts/yalm.html
  • HF, 2024, TGI v3 overview, https://huggingface.co/docs/text-generation-inference/conceptual/chunking
  • Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng, 7 Dec 2023 (v2), Efficient LLM Inference on CPUs, https://arxiv.org/abs/2311.00502 https://github.com/intel/intel-extension-for-transformers
  • Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
  • Runxin Zhong, Yuyang Jin, Chen Zhang, Kinman Lei, Shuangyu Li, and Jidong Zhai. 2025. FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property. In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP '25). Association for Computing Machinery, New York, NY, USA, 183–196. https://doi.org/10.1145/3710848.3710864 https://dl.acm.org/doi/abs/10.1145/3710848.3710864
  • Burkhard Ringlein, Thomas Parnell, Radu Stoica, 15 May 2025 (v2), GPU Performance Portability needs Autotuning, https://arxiv.org/abs/2505.03780
  • Anne Ouyang and Azalia Mirhoseini and Percy Liang, June 2025, Surprisingly Fast AI-Generated Kernels We Didn’t Mean to Publish (Yet), https://crfm.stanford.edu/2025/05/28/fast-kernels.html
  • Aniruddha Nrusimha, William Brandon, Mayank Mishra, Yikang Shen, Rameswar Panda, Jonathan Ragan-Kelley, Yoon Kim, 28 May 2025, FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference, https://arxiv.org/abs/2505.22758 https://github.com/aninrusimha/flashformer (Optimizing kernels for low latency in a single isolated query, not a batch, via kernel fusion and running all components in one kernel, along with programming techniques like metaprogramming.)
  • Bonwoo Lee, Cheolwoo Park, Jeongyoun Ahn, 23 Jul 2025, Optimal differentially private kernel learning with random projection, https://arxiv.org/abs/2507.17544
  • Zhongzhen Wen, Yinghui Zhang, Zhong Li, Zhongxin Liu, Linna Xie, Tian Zhang, 20 Jul 2025, MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation, https://arxiv.org/abs/2507.17773
  • Kaizheng Wang, 24 Jul 2025, Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift, https://arxiv.org/abs/2302.10160
  • Masaki Adachi, Masahiro Fujisawa, Michael A Osborne, 24 Jul 2025, Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature, https://arxiv.org/abs/2503.06079
  • Daehyeon Baek, Jieun Choi, Jimyoung Son, Kyungmin Bin, Seungbeom Choi, Kihyo Moon, Minsung Jang, Hyojung Lee, 18 Jul 2025, FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration, https://arxiv.org/abs/2505.20839
  • Zikai Xie, Linjiang Chen, 18 Jul 2025, Merge Kernel for Bayesian Optimization on Permutation Space, https://arxiv.org/abs/2507.13263
  • Jie Wang and March Boedihardjo and Yao Xie, 18 Jul 2025, Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances, https://arxiv.org/abs/2405.15441
  • Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi, 19 Jul 2025, Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games, https://arxiv.org/abs/2507.14529
  • Youran Zhou, Mohamed Reda Bouadjenek, Jonathan Wells, Sunil Aryal, 20 Jul 2025, HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation, https://arxiv.org/abs/2501.04300
  • Alexander Rose, Philipp Schaub, Rolf Findeisen, 21 Jul 2025, Safe and High-Performance Learning of Model Predicitve Control using Kernel-Based Interpolation, https://arxiv.org/abs/2410.06771
  • Sachin Garg, Micha{\l} Derezi\'nski, 19 Jul 2025, Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nystr\"om Method, https://arxiv.org/abs/2506.17556
  • Leonardo V. Santoro, Victor M. Panaretos, 11 Aug 2025, Likelihood Ratio Tests by Kernel Gaussian Embedding, https://arxiv.org/abs/2508.07982
  • Martin Rouault, R\'emi Bardenet, Myl\`ene Ma\"ida, 9 Aug 2025, Monte Carlo with kernel-based Gibbs measures: Guarantees for probabilistic herding, https://arxiv.org/abs/2402.11736
  • Shuyin Xia, Yifan Wang, Lifeng Shen, Guoyin Wang, 11 Aug 2025, Granular-Ball-Induced Multiple Kernel K-Means, https://arxiv.org/abs/2506.18637
  • David M. Bossens, Kishor Bharti, and Jayne Thompson, 11 Aug 2025, Quantum Policy Gradient in Reproducing Kernel Hilbert Space, https://arxiv.org/abs/2411.06650
  • Antonin Schrab, 8 Aug 2025, A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD, https://arxiv.org/abs/2503.04820
  • Rajalaxmi Rajagopalan, Yu-Lin Wei, Romit Roy Choudhury, 28 Jul 2025, Kernel Learning for Sample Constrained Black-Box Optimization, https://arxiv.org/abs/2507.20533
  • Jagruti Patel (1), Mikkel Sch\"ottner (1), Thomas A. W. Bolton (1), Patric Hagmann (1) ((1) Department of Radiology, Lausanne University Hospital and University of Lausanne (CHUV-UNIL), Lausanne, Switzerland), 28 Jul 2025, Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions, https://arxiv.org/abs/2507.21016
  • Victor Rielly, Kamel Lahouel, Ethan Lew, Nicholas Fisher, Vicky Haney, Michael Wells, Bruno Jedynak, 25 Jul 2025, MOCK: an Algorithm for Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions, https://arxiv.org/abs/2306.10189
  • Shervin Rahimzadeh Arashloo, 31 Jul 2025, Manifold-regularised Signature Kernel Large-Margin $\ell_p$-SVDD for Multidimensional Time Series Anomaly Detection, https://arxiv.org/abs/2507.23449
  • Piotr Indyk, Michael Kapralov, Kshiteej Sheth, Tal Wagner, 31 Jul 2025, Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions, https://arxiv.org/abs/2507.23539
  • Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, and Emad Barsoum, 31 Jul 2025, Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks, https://arxiv.org/abs/2507.23194
  • Abhinav Das, Stephan Schl\"uter, Lorenz Schneider, 31 Jul 2025, Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression, https://arxiv.org/abs/2412.00123
  • Filippo Utro, Meltem Tolunay, Kahn Rhrissorrakrai, Tanvi P. Gujarati, Jie Shi, Sara Capponi, Mirko Amico, Nate Earnest-Noble, Laxmi Parida, 30 Jul 2025, Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods, https://arxiv.org/abs/2507.22710
  • Erwin de Gelder, Maren Buermann, Olaf Op den Camp, 30 Jul 2025, Comparing Normalizing Flows with Kernel Density Estimation in Estimating Risk of Automated Driving Systems, https://arxiv.org/abs/2507.22429
  • Tianqing Fang, Zhisong Zhang, Xiaoyang Wang, Rui Wang, Can Qin, Yuxuan Wan, Jun-Yu Ma, Ce Zhang, Jiaqi Chen, Xiyun Li, Hongming Zhang, Haitao Mi, Dong Yu, 1 Aug 2025, Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training, https://arxiv.org/abs/2508.00414
  • Rajpreet Singh, Vidhi Kothari, 1 Aug 2025, Composable OS Kernel Architectures for Autonomous Intelligence, https://arxiv.org/abs/2508.00604
  • Joon-Hyun Park, Mujin Cheon, Dong-Yeun Koh, 4 Aug 2025, BOOST: Bayesian Optimization with Optimal Kernel and Acquisition Function Selection Technique, https://arxiv.org/abs/2508.02332
  • Andrea Gayon-Lombardo, Ehecatl A. del Rio-Chanona, Catalina A. Pino-Munoz, Nigel P. Brandon, 7 Jun 2025, Deep Kernel Bayesian Optimisation for Closed-Loop Electrode Microstructure Design with User-Defined Properties based on GANs, https://arxiv.org/abs/2508.00833
  • Haoquan Lu, Hanzhe Liang, Jie Zhang, Chenxi Hu, Jinbao Wang, Can Gao, 2 Aug 2025, C3D-AD: Toward Continual 3D Anomaly Detection via Kernel Attention with Learnable Advisor, https://arxiv.org/abs/2508.01311
  • Sadegh Ebrahimkhani and John Lataire, 2 Aug 2025, Kernel-Based Sparse Additive Nonlinear Model Structure Detection through a Linearization Approach, https://arxiv.org/abs/2508.01453
  • Nicolas Langren\'e, Xavier Warin, Pierre Gruet, 3 Aug 2025, Fast Gaussian process inference by exact Mat\'ern kernel decomposition, https://arxiv.org/abs/2508.01864
  • Qian Tang, Yuwen Gu, Boxiang Wang, 12 Aug 2025, fastkqr: A Fast Algorithm for Kernel Quantile Regression, https://arxiv.org/abs/2408.05393
  • Wouter M. Kouw, 13 Aug 2025, Bayesian autoregression to optimize temporal Mat\'ern kernel Gaussian process hyperparameters, https://arxiv.org/abs/2508.09792
  • Yuan-Hao Wei, Fu-Hao Deng, Lin-Yong Cui, Yan-Jie Sun, 13 Aug 2025, Structured Kernel Regression VAE: A Computationally Efficient Surrogate for GP-VAEs in ICA, https://arxiv.org/abs/2508.09721
  • Xing Liu, Fran\c{c}ois-Xavier Briol, 12 Aug 2025, On the Robustness of Kernel Goodness-of-Fit Tests, https://arxiv.org/abs/2408.05854
  • Paul Dommel and Rajmadan Lakshmanan, 15 Aug 2025, Uniform convergence for Gaussian kernel ridge regression, https://arxiv.org/abs/2508.11274
  • Zhan Yu, Zhongjie Shi, Ding-Xuan Zhou, 15 Aug 2025, Theory of Decentralized Robust Kernel-Based Learning, https://arxiv.org/abs/2506.05215
  • Hongyu Lin, Yuchen Li, Haoran Luo, Kaichun Yao, Libo Zhang, Mingjie Xing, Yanjun Wu, 18 Aug 2025, OS-R1: Agentic Operating System Kernel Tuning with Reinforcement Learning, https://arxiv.org/abs/2508.12551
  • Iam Kim de S. Hermont, Andre R. Flores and Rodrigo C. de Lamare, 18 Aug 2025, Design and Analysis of Robust Adaptive Filtering with the Hyperbolic Tangent Exponential Kernel M-Estimator Function for Active Noise Control, https://arxiv.org/abs/2508.13018
  • Rahul Singh and Suhas Vijaykumar, 18 Aug 2025, Kernel Ridge Regression Inference, https://arxiv.org/abs/2302.06578
  • Hengrui Luo and Yunzhang Zhu, 16 Aug 2025, Asymptotic Optimism of Random-Design Linear and Kernel Regression Models, https://arxiv.org/abs/2502.12999
  • Anabel Yong, 12 Aug 2025, Multi-Objective Bayesian Optimization with Independent Tanimoto Kernel Gaussian Processes for Diverse Pareto Front Exploration, https://arxiv.org/abs/2508.14072
  • Xudong Wang, Ziheng Sun, Chris Ding, Jicong Fan, 20 Aug 2025, Learnable Kernel Density Estimation for Graphs, https://arxiv.org/abs/2505.21285
  • Yijin Ni and Xiaoming Huo, 20 Aug 2025, Kernel-based Equalized Odds: A Quantification of Accuracy-Fairness Trade-off in Fair Representation Learning, https://arxiv.org/abs/2508.15084
  • Reilly Haskins and Benjamin Adams, 21 Aug 2025, KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis, https://arxiv.org/abs/2507.03847
  • Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, Stefano Vigogna, 21 Aug 2025, Neural reproducing kernel Banach spaces and representer theorems for deep networks, https://arxiv.org/abs/2403.08750
  • Pietro Fr\'e, Federico Milanesio, Marcelo Oyarzo, Matteo Santoro and Mario Trigiante, 22 Aug 2025, Tessellation Groups, Harmonic Analysis on Non-compact Symmetric Spaces and the Heat Kernel in view of Cartan Convolutional Neural Networks, https://arxiv.org/abs/2508.16015
  • Jamal Hwaidi and Mohamed Chahine Ghanem, 22 Aug 2025, Motor Imagery EEG Signal Classification Using Minimally Random Convolutional Kernel Transform and Hybrid Deep Learning, https://arxiv.org/abs/2508.16179
  • Martin Andrews, Sam Witteveen, 22 Aug 2025, GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization, https://arxiv.org/abs/2506.20807
  • Ran Yan, Youhe Jiang, Binhang Yuan, 25 Aug 2025, Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel, https://arxiv.org/abs/2508.18224
  • Akira Tamamori, 24 Aug 2025, Kernel Ridge Regression for Efficient Learning of High-Capacity Hopfield Networks, https://arxiv.org/abs/2504.12561
  • Kyung-hwan Lee and Kyung-tae Kim, 21 Jul 2025, Semantic-Aware Gaussian Process Calibration with Structured Layerwise Kernels for Deep Neural Networks, https://arxiv.org/abs/2507.15987
  • Rahul Khorana, 22 Jul 2025, Families of Optimal Transport Kernels for Cell Complexes, https://arxiv.org/abs/2507.16569
  • Jun'ichi Takeuchia, Yoshinari Takeishia, Noboru Muratab, Kazushi Mimurac, Ka Long Keith Hod, Hiroshi Nagaoka, 24 Jul 2025, Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights, https://arxiv.org/abs/2507.18555
  • Yaniv Shulman, 20 Jul 2025, Robust Local Polynomial Regression with Similarity Kernels, https://arxiv.org/abs/2501.10729
  • Jie Hu, Yi-Ting Ma, Do Young Eun, 27 Jul 2025, Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs, https://arxiv.org/abs/2505.18300
  • Roberto Fl\'orez-Ablan, Marco Roth, and Jan Schnabel, 28 Jul 2025, On the similarity of bandwidth-tuned quantum kernels and classical kernels, https://arxiv.org/abs/2503.05602
  • Christian Wald and Gabriele Steidl, 2 Aug 2025, Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans, https://arxiv.org/abs/2501.16839
  • Max Guillen, Philipp Misof, Jan E. Gerken, 15 Aug 2025, Finite-Width Neural Tangent Kernels from Feynman Diagrams, https://arxiv.org/abs/2508.11522
  • Nan-Hong Kuo, Renata Wong, 16 Feb 2025, SVM/SVR Kernels as Quantum Propagators, https://arxiv.org/abs/2502.11153
  • Patrick J.F. Groenen and Michael Greenacre, 21 Aug 2025, Interpretable Kernels, https://arxiv.org/abs/2508.15932
  • Ana Mart\'inez-Sabiote, Michalis Skotiniotis, Jara J. Bermejo-Vega, Daniel Manzano, Carlos Cano, 25 Aug 2025, Entanglement Detection with Quantum-inspired Kernels and SVMs, https://arxiv.org/abs/2508.17909

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: