Aussie AI

Transformer Architectures

Last Updated 22 October, 2025

by David Spuler, Ph.D.

The Transformer was itself a major architectural advance in 2017. Since then, numerous modified Transformer architectures have been tested, and many ways to optimize Transformers have been found. Discussion of the various major architectural changes is given below; see also Transformer code optimizations, inference optimization techniques, and a very long list of Transformer optimizations

Global Transformer Architecture Changes

Since the introduction of the vanilla Transformer in 2017, researchers have been searching for optimizations up and down the Transformer's tech stack.

Global Transformer optimizations: Some of the architectural-level optimizations to Transformer inference engines include:

Quantization: clearly the most popular method, with endless research papers and many practical implementions in modern toolkits; see quantization page.
Pruning: There is depth pruning, width pruning, and length pruning, and then some bright spark thought of combining them, so now there's dual pruning and triple pruning.

Depth-wise optimizations (layers):

Layer pruning / early exit: cutting short the layers in the encoder and/or decoder is a successful optimization strategy; see layer pruning and early exit inference.
Shallow decoder architecture. This idea for modifying the Transformer's decoder architecture uses layer pruning in the decoder to achieve a "deep encoder/shallow decoder" architecture, as reported in several research papers, such as Kasai et al. (2021) and Hsu et al. (2020); see shallow decoder architectures, and also depth pruning.

Width-wise optimizations (attention heads):

Attention head pruning: not all of the attention heads are important, esp. in the decoder (numerous research papers; see head pruning). There's some irony here when you consider the title of the original 2017 Transformer paper!
Flash Attention: Of all the various attention optimizations, Flash Attention (Dao et al., June 2022), and particularly Flash Attention 2 (Dao, July 2023), seems to have emerged as the most popular. See attention optimization methods.
Smaller Attention Head Components. It is possible to use more simplified attention head components, esp. in the decoder, as in Kasai et al. (2021); see approximate attention heads and head pruning. Another method is "weight sharing" for attention heads (or "fused" heads), such as in Zhai et al. (2023).

Lengthwise optimizations (input sequences):

Length pruning. Removing padding in the input vectors for short queries to avoid redundant computations. For example, see ByteTransformer in Zhai et al. (2023). Read more about length pruning and zero padding byte removal.
Auto-regression optimizations such as semi-autoregressive and non-autoregressive Transformer architectures.

Too Much of a Smart Thing: Just like in Highlander, there can be only one. No, wait, that's incorrect! It's called "multi-AI" or "ensemble" AI:

Ensemble architectures. Most architectures with two or more Transformers are aiming to achieve more advanced reasoning (usually at a worse speed), but the "big-small" dual architecture aims to improve inference speed by sending common queries to the smaller model. See enemble architectures.

Component-Level Transformer Architecture Changes

Attention heads are addressed above under width pruning, and layers are depth pruning, but various other Transformer components can be optimized:

Normalization optimizations:

Norm merging (operator fusion). The normalization component can often be merged with another component. This is a type of "kernel fusion" involving the LayerNorm. See "fused LayerNorm" in kernel operator fusion methods.
Norm pruning (removal). Some research also suggests removal of normalization; see pruning normalization components.
Norm placement. See pre-norm vs post-norm.

Activation function optimizations:

Optimizing activations. See overview of activation function optimizations.
Approximate activation functions. See activation function approximation methods.
Fused activations. See "fused RELU" and others in kernel operator fusion methods.

Decoder algorithms:

Faster decoding algorithms. Research in Transformers includes beam search decoding and greedy decoding. There's also aggressive decoding, speculative decoding and collaborative decoding.
Speculative decoding (supervised dual decoding). A parallelization method whereby the decoding occurs in the small model to generate possible tokens. A larger model has the smaller model running ahead, and it confirms or vetos the suggested tokens, which is basically the same plot as Terminator II. No, but, I'm just checking if you're reading this stuff like an AI, rather than scanning and skipping like a human. If the small model is usually correct, this speeds up the overall process compared to only running a large model. This is similar to Big-Little architectures, but differs because both models are still running. See speculative decoding.

Feed-Fordware Network optimizations:

FFN Pruning. Simplified decoders, with FFN removed, as in Kasai et al. (2021), although this may be dependent on the use case; see "FFN pruning" section.

MatMul optimizations: Also known as GEMM and various other dumb names. It's matrix multiplication and vector dot product like you did in High School.

Approximate MatMul. There is much research about using approximate multiplication algorithms.
Matrix mutiplication improvements: See matrix algebra, sparsification and low-rank matrices.

Softmax optimizations: Occurs less frequently than MatMul, but Softmax can still be optimized:

Softmax approximation. The use of simplifed approximate Softmax components.
Softmax removal. See Softmax pruning.
Softmax replacement. See Softmax alternatives and substitutes.

Positional encoding optimizations: Not usually considered a bottleneck, but even the PE can be optimized:

PE Optimizations. See positional encoding optimizations
PE Pruning (Removal). Positional encoding modules may not be as essential as assumed; see positional encoding pruning.

But wait, there's more. And there are more ways to optimize. Refer to the complete list of Transformer optimizations.

Survey Papers on Transformer Architectures

Several papers have surveyed the literature for the latest Transformer ideas:

Tianyang Lin, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. A survey of transformers. AI Open, 2022. https://arxiv.org/abs/2106.04554 (An extensive and useful survey of Transformer architectures.)
Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers: A survey (v2). arXiv preprint arXiv:2009.06732, 2022, https://arxiv.org/abs/2009.06732
Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, Hao Peng, Jianxin Li, Jia Wu, Ziwei Liu, Pengtao Xie, Caiming Xiong, Jian Pei, Philip S. Yu, Lichao Sun, May 2023, A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT, https://arxiv.org/abs/2302.09419
Q Fournier, GM Caron, D Aloise, 2023, A practical survey on faster and lighter transformers, ACM Computing Surveys, https://dl.acm.org/doi/abs/10.1145/3586074, https://arxiv.org/abs/2103.14636
Xipeng Qiu, TianXiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained Models for Natural Language Processing: A Survey. SCIENCE CHINA Technological Sciences 63, 10 (2020), 1872–1897. https://doi.org/10.1007/s11431-020-1647-3, https://arxiv.org/abs/2003.08271 (Good survey of Transformer architectures in 2020.)
Y Chang, X Wang, J Wang, Y Wu, K Zhu, 2023, A survey on evaluation of large language models, arXiv preprint, https://arxiv.org/abs/2307.03109
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, Y Bai, A Chen, T Conerly, et al. 2021. A mathematical framework for transformer circuits. https://transformer-circuits.pub/2021/framework/index.html (Detailed theoretical examination of how various Transformer components work.)
W Li, H Hacid, E Almazrouei, M Debbah, 2023, A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and Techniques, AI 2023, 4(3), 729-786, https://www.mdpi.com/2673-2688/4/3/39 (Extensive survey related to optimizing on edge devices.)
J Zhong, Z Liu, X Chen, Apr 2023, Transformer-based models and hardware acceleration analysis in autonomous driving: A survey, https://arxiv.org/abs/2304.10891
Y Li, S Wang, H Ding, H Chen, 2023, Large Language Models in Finance: A Survey, PDF: https://www.researchgate.net/profile/Yinheng-Li/publication/374546790_Large_Language_Models_in_Finance_A_Survey/links/6523988afc5c2a0c3bc534fc/Large-Language-Models-in-Finance-A-Survey.pdf
Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique, 2020, Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead, https://ieeexplore.ieee.org/iel7/6287639/6514899/09269334.pdf, https://arxiv.org/abs/2012.11233
Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, Erik Cambria, Oct 2023, A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics, https://arxiv.org/abs/2310.05694
Minghao Shao, Abdul Basit, Ramesh Karri, Muhammad Shafique, Architectures: Trends, Benchmarks, and Challenges, https://www.researchgate.net/profile/Minghao_Shao2/publication/383976933Survey of different Large Language Model_Survey_of_different_Large_Language_Model_Architectures_Trends_Benchmarks_and_Challenges/links/66e2d320f84dd1716ce79f85/Survey-of-different-Large-Language-Model-Architectures-Trends-Benchmarks-and-Challenges.pdf

Decoder-Only Architectures

Decoder-only architectures are the modern version of Transformers, such as GPT. It was discovered that the encoder in the older encoder-decoder Transformers from 2017 was not needed, and was actually an inefficiency. Decoder-only models are faster, and need fewer weights.

Research on the decoder-only transformer architectures:

Sathya Krishnan Suresh, Shunmugapriya P, 24 Apr 2024 (v2), Towards smaller, faster decoder-only transformers: Architectural variants and their implications, https://arxiv.org/abs/2404.14462 Code: https://github.com/SkAndMl/gpt-variations (Focuses on three new variants of decoder-only Transformer architectures: ParallelGPT (p-gpt), LinearlyCompressedGPT (lc-gpt), and ConvCompressedGPT (cc-gpt).)
Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari, 22 Apr 2024, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, Apple Research, https://arxiv.org/abs/2404.14619 Code: https://huggingface.co/apple/OpenELM
Jesse Roberts, 2 Feb 2024 (v3), How Powerful are Decoder-Only Transformer Neural Models? https://arxiv.org/abs/2305.17026
Georgy Tyukin, 2 Apr 2024, Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations, Masters Thesis, Data Science and Machine Learning, University College London., https://arxiv.org/abs/2404.05741 (Reviews various model compression and inference optimization techniques, and specifically analyzes layer skipping and sublayer skipping, such as attention head pruning and FFN/MLP pruning.)
Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, Colin Raffel, Apr 2022, What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? https://arxiv.org/abs/2204.05832
Urvashi Khandelwal, Kevin Clark, Dan Jurafsky, Lukasz Kaiser, 21 May 2019, Sample Efficient Text Summarization Using a Single Pre-Trained Transformer, https://arxiv.org/abs/1905.08836
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Łukasz Kaiser, Noam Shazeer, Jan 2018, GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES, ICLR 2018 https://arxiv.org/pdf/1801.10198.pdf
Peter J Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating Wikipedia by Summarizing Long Sequences. In Proceedings of the 6th International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1801.10198
Yumo Bai, Feb 3, 2024 Why are most LLMs decoder-only? Dive into the rabbit hole of recent advancement in Large Language Models, https://medium.com/@yumo-bai/why-are-most-llms-decoder-only-590c903e4789
M Fujitake, 2023 DTrOCR: Decoder-only Transformer for Optical Character Recognition, https://arxiv.org/pdf/2308.15996.pdf
Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi, 26 Feb 2024, Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding, https://arxiv.org/abs/2402.16844
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
Meta, July 23, 2024, Introducing Llama 3.1: Our most capable models to date, https://ai.meta.com/blog/meta-llama-3-1/
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, 6 Jan 2024 (v2), Understanding LLMs: A Comprehensive Overview from Training to Inference, https://arxiv.org/abs/2401.02038
Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Ailiang Lin, Zhuoyun Li, Kotaro Funakoshi, 31 Jul 2025, Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models, https://arxiv.org/abs/2507.23386
Beilong Tang, Bang Zeng, Ming Li, 16 Aug 2025, LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models, https://arxiv.org/abs/2504.07402
Hamed Firooz, Maziar Sanjabi, Adrian Englhardt, Aman Gupta, Ben Levine, Dre Olgiati, Gungor Polatkan, Iuliia Melnychuk, Karthik Ramgopal, Kirill Talanine, Kutta Srinivasan, Luke Simon, Natesh Sivasubramoniapillai, Necip Fazil Ayan, Qingquan Song, Samira Sriram, Souvik Ghosh, Tao Song, Vignesh Kothapalli, Xiaoling Zhai, Ya Xu, Yu Wang, and Yun Dai, 23 Aug 2025, 360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation, https://arxiv.org/abs/2501.16450

Encoder-Decoder Architectures

Encoder-decoder Transformers are the older architecture from 2017. Decoder-only architectures have largely superceded this version, but it is still used in some use cases such as machine translation (foreign language translation).

Research on encoder-decoder architectures:

Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu, 2018, Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation, Advances in Neural Information Processing Systems 31 (NeurIPS 2018) https://papers.nips.cc/paper/2018/hash/4fb8a7a22a82c80f2c26fe6c1e0dcbb3-Abstract.html
Nadeem Vidhya Mar 11, 2021, Encoders-Decoders, Sequence to Sequence Architecture, Analytics Vidhya, Medium, https://medium.com/analytics-vidhya/encoders-decoders-sequence-to-sequence-architecture-5644efbb3392
Yumo Bai, Feb 3, 2024 Why are most LLMs decoder-only? Dive into the rabbit hole of recent advancement in Large Language Models, https://medium.com/@yumo-bai/why-are-most-llms-decoder-only-590c903e4789
Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi, 26 Feb 2024, Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding, https://arxiv.org/abs/2402.16844
João Monteiro, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vázquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian, 23 Apr 2024, XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference, https://arxiv.org/abs/2404.15420
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, 6 Jan 2024 (v2), Understanding LLMs: A Comprehensive Overview from Training to Inference, https://arxiv.org/abs/2401.02038
Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
Anjali Shah, Kshitiz Gupta, Jiahong Liu and Haohang Huang, Dec 11, 2024, NVIDIA TensorRT-LLM Now Accelerates Encoder-Decoder Models with In-Flight Batching, https://developer.nvidia.com/blog/nvidia-tensorrt-llm-now-accelerates-encoder-decoder-models-with-in-flight-batching/
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Wenji Fang, Jing Wang, Yao Lu, Shang Liu, Zhiyao Xie, 6 Aug 2025, GenEDA: Towards Generative Netlist Functional Reasoning via Cross-Modal Circuit Encoder-Decoder Alignment, https://arxiv.org/abs/2504.09485
Zixi Li, 11 Sep 2025, TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms, https://arxiv.org/abs/2509.05550

Encoder-Only Architectures

Encoder-only architectures lack a decoder, and are only used where the output is not a full text sequence. This makes sense in models where the output can be an embedding vector. Most modern LLMs are not using this architecture.

Research on encoder-only architectures:

Ting Hu, Christoph Meinel, Haojin Yang, 2024, A flexible BERT model enabling width- and depth-dynamic inference, Computer Speech & Language 4 April 2024, 101646, https://www.sciencedirect.com/science/article/pii/S0885230824000299 (Dual pruning method with layerwise "neural grafting" that gives dynamic width models, and combined with early exit on the depth dimension.)
Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, 6 Jan 2024 (v2), Understanding LLMs: A Comprehensive Overview from Training to Inference, https://arxiv.org/abs/2401.02038
Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli, 19 Dec 2024 (v2)], Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference, https://arxiv.org/abs/2412.13663 (Encoder-only BERT model updated with modern optimizations including Flash attention, bias removal, RoPE, pre-norm, and GeGLU, a GELU varaint, hybrid local-global attention, and zero padding removal.)
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Francesco Pappone, Ruggero Marino Lazzaroni, Federico Califano, Niccol\`o Gentile, Roberto Marras, 16 Sep 2025, Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO, https://arxiv.org/abs/2509.13081

Hybrid Transformer Architectures

Transformer architectures have been merged with aspects of previous neural network theory to create hybrid architectures. Examples include:

Vision Transformer (ViT)
Transformer-RNN hybrid architectures
Transformer-CNN hybrid architectures

Research papers on hybrid transformer architectures:

Seokju Yun, Dongheon Lee, Youngmin Ro, 4 Jun 2024, MetaMixer Is All You Need, https://arxiv.org/abs/2406.02021
Sathya Krishnan Suresh, Shunmugapriya P, 24 Apr 2024 (v2), Towards smaller, faster decoder-only transformers: Architectural variants and their implications, https://arxiv.org/abs/2404.14462 Code: https://github.com/SkAndMl/gpt-variations (Focuses on three new variants of decoder-only Transformer architectures: ParallelGPT (p-gpt), LinearlyCompressedGPT (lc-gpt), and ConvCompressedGPT (cc-gpt).)
Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
Jiuqiang Li; Yutong Ke, 2024, Hybrid Convolution-Transformer for Lightweight Single Image Super-Resolution, ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/10446977 (Hybrid of convolutions and Transformer architecture in image analysis.)
Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, Ji-Rong Wen, Dec 2023, Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation, https://arxiv.org/abs/2311.09049 Code: https://github.com/RUCAIBox/LC-Rec/
Jamba Team, 22 Aug 2024, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale, https://arxiv.org/abs/2408.12570
Cong Bi, Wenhua Qian, Jinde Cao, Xue Wang, 2024, LightingFormer: Transformer-CNN hybrid network for low-light image enhancement, Computers & Graphics, 104089, ISSN 0097-8493, https://doi.org/10.1016/j.cag.2024.104089 https://www.sciencedirect.com/science/article/abs/pii/S0097849324002243
Weigao Sun, Jiaxi Hu, Yucheng Zhou, Jusen Du, Disen Lan, Kexin Wang, Tong Zhu, Xiaoye Qu, Yu Zhang, Xiaoyu Mo, Daizong Liu, Yuxuan Liang, Wenliang Chen, Guoqi Li, Yu Cheng, 13 Aug 2025, Speed Always Wins: A Survey on Efficient Architectures for Large Language Models, https://arxiv.org/abs/2508.09834

Innovative New Transformer Architecture Research Papers

Since the original Transformer paper in 2017, and various other Transformer milestone papers, there have been numerous architectural variations proposed to alleviate efficiency or accuracy concerns. Research papers on specific modifications to the Transformer architecture include:

Chen, M. X., Firat, O., Bapna, A., Johnson, M., Macherey, W., Foster, G., Jones, L., Schuster, M., Shazeer, N., Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Chen, Z., Wu, Y., and Hughes, M. The best of both worlds: Combining recent advances in neural machine translation. In ACL, 2018, https://arxiv.org/abs/1804.09849 (Hybrid Transformer architectures.)
David So, Quoc Le, and Chen Liang. The evolved transformer. In International Conference on Machine Learning, pages 5877–5886. PMLR, 2019. https://arxiv.org/abs/1901.11117
Piotr Nawrot, Szymon Tworkowski, Michael Tyrolski, Lukasz Kaiser, Yuhuai Wu, Christian Szegedy, and Henryk Michalewski. Hierarchical transformers are more efficient language models. arXiv preprint arXiv:2110.13711, 2021. https://arxiv.org/abs/2110.13711
Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu, 2023, ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs, https://arxiv.org/abs/2210.03052 (An advanced new architecture.)
Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, and Noah A. Smith. 2020. Deep encoder, shallow decoder: Reevaluating the speed-quality tradeoff in machine translation. CoRR, abs/2006.10369. https://arxiv.org/abs/2006.10369 Code: https://github.com/jungokasai/deep-shallow
Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-xl: Attentive language models beyond a fixed-length context,” arXiv, 2019, https://arxiv.org/abs/1901.02860
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. arXiv e-prints (2020), arXiv:2004.05150. https://arxiv.org/abs/2004.05150
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. 2019. Universal Transformers. In ICLR, https://arxiv.org/abs/1807.03819
William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv e-prints (2021), arXiv:2101.03961. https://arxiv.org/abs/2101.03961
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, et al. 2020. Big Bird: Transformers for Longer Sequences. In NeurIPS, Vol. 33. 17283–17297, https://arxiv.org/abs/2007.14062
Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. Proceedings of EMNLP, 2020, https://arxiv.org/abs/2004.05150
Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. Star-transformer. Proceedings of NAACL, 2019, https://arxiv.org/abs/1902.09113
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021. https://arxiv.org/abs/2103.14030
Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, and Song Han. HAT: Hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187, 2020. https://arxiv.org/abs/2005.14187 Code: https://github.com/mit-han-lab/hardware-aware-transformers.git
Zihang Dai, Guokun Lai, Yiming Yang, and Quoc Le. 2020. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. In NeurIPS. https://arxiv.org/abs/2006.03236
A Agrawal, A Panwar, J Mohan, N Kwatra, BS Gulavani, 2023, SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills, arXiv preprint, https://arxiv.org/abs/2308.16369
NVIDIA, NVIDIA FasterTransformer, https://github.com/NVIDIA/FasterTransformer
Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, and Alexander J. Smola. 2020. Transformer on a Diet. arXiv e-prints (2020), arXiv:2002.06170. https://arxiv.org/abs/2002.06170
Aminabadi, R. Y.; Rajbhandari, S.; Zhang, M.; Awan, A. A.; Li, C.; Li, D.; Zheng, E.; Rasley, J.; Smith, S.; Ruwase, O.; and He, Y. 2022. DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. arXiv:2207.00032. https://arxiv.org/abs/2207.00032
Sheng, Y.; Zheng, L.; Yuan, B.; Li, Z.; Ryabinin, M.; Fu, D. Y.; Xie, Z.; Chen, B.; Barrett, C.; Gonzalez, J. E.; Liang, P.; Re, C.; Stoica, I.; and Zhang, C. 2023. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU. arXiv:2303.06865 https://arxiv.org/abs/2303.06865, Code: https://github.com/FMInference/FlexGen
Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu, May 2023, RWKV: Reinventing RNNs for the Transformer Era, https://arxiv.org/pdf/2305.13048.pdf, Code: https://github.com/BlinkDL/RWKV-LM (RWKV transformers are a hybrid RNN-Transformer that replaces QKV attention with Receptance Weighted Key Value (RWKV)).
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, YOLOv4: Optimal Speed and Accuracy of Object Detection, arXiv:2004.10934 [cs, eess], Apr. 2020. https://arxiv.org/abs/2004.10934, Code: https://github.com/AlexeyAB/darknet
Keyu An, Shiliang Zhang, Sep 2023, Exploring RWKV for Memory Efficient and Low Latency Streaming ASR, arXiv preprint arXiv:2309.14758, https://arxiv.org/pdf/2309.14758.pdf (Analysis of the RWKV Transformer-RNN hybrid architecture.)
CC Atabansi, J Nie, H Liu, Q Song, L Yan, X Zhou, 2023, A survey of Transformer applications for histopathological image analysis: New developments and future directions, BioMedical Engineering OnLine, https://link.springer.com/article/10.1186/s12938-023-01157-0 (Massive survey of medical imaging analysis use case, including discussion of various hybrid Transformer models.)
Jan Kocon, Igor Cichecki, Oliwier Kaszyca, Mateusz ´ Kochanek, Dominika Szydło, Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocon, Bartłomiej Koptyra, Wik- ´ toria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radlinski, ´ Konrad Wojtasik, Stanisław Wo´zniak, and Przemysław Kazienko. June 2023. Chatgpt: Jack of all trades, master of none. https://arxiv.org/abs/2302.10724 (A detailed analysis of ChatGPT, including GPT-4, in various uses cases.)
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. March 2022. Training compute-optimal large language models. https://arxiv.org/abs/2203.15556 (This paper presents the 70B Chinchilla model.)
Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Josh Susskind, Sep 2021, An Attention Free Transformer, https://arxiv.org/abs/2105.14103
Chien-Yao Wang, Alexey Bochkovskiy, Hong-Yuan Mark Liao, 2023, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 7464-7475, PDF: http://openaccess.thecvf.com/content/CVPR2023/papers/Wang_YOLOv7_Trainable_Bag-of-Freebies_Sets_New_State-of-the-Art_for_Real-Time_Object_Detectors_CVPR_2023_paper.pdf
Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei, July 2023, Retentive Network: A Successor to Transformer for Large Language Models, https://arxiv.org/abs/2307.08621, Code: https://aka.ms/retnet (Some analysis of KV cache memory usage, but not a primary focus of the paper.) (A proposed new architecture called "Retentive Network" to supercede Transformers.)
Hanting Chen, Yunhe Wang, Jianyuan Guo, and Dacheng Tao, May 2023, Vanillanet: the power of minimalism in deep learning, https://arxiv.org/abs/2305.12972, Code: https://github.com/huawei-noah/VanillaNet, Code: https://gitee.com/mindspore/models/tree/master/research/cv/vanillanet
Tobias Domhan. 2018. How much attention do you need? a granular analysis of neural machine translation architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1799–1808. PDF: https://aclanthology.org/P18-1167.pdf (Examines in detail the various components of the early Transformer architectures, using a pre-norm architecture, based on Tensor2Tensor.)
Davis Yoshida, Allyson Ettinger, and Kevin Gimpel. 2020. Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size. CoRR abs/2008.07027 (2020). arXiv:2008.07027 https://arxiv.org/abs/2008.07027 (Hybrid RNN-Transformer architecture.)
Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar, Oct 2023 ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models, https://arxiv.org/abs/2310.04564 (Recommends reinstating the simpler RELU rather than GELU or SiLU, with a focus on inference efficiency.)
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed, Oct 2023, Mistral 7B, https://arxiv.org/abs/2310.06825, Code: https://mistral.ai/news/announcing-mistral-7b/ (Uses grouped-query attention and sliding window attention for long context handling.)
A Langedijk, H Mohebbi, G Sarti, W Zuidema, J Jumelet, Oct 2023, DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers, https://arxiv.org/abs/2310.03686, https://pure.rug.nl/ws/portalfiles/portal/799424386/2310.03686v1.pdf (Allows the decoder to cross-attend to earlier layers of the encoder, rather than only the final output layer.)
nostalgebraist. 2020. Interpreting GPT: The logit lens. AI Alignment Forum. https://www.alignmentforum.org/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens (Detailed analysis of how Transformers seem to actually work.)
J Alman, Z Song, Oct 2023, How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation, arXiv preprint arXiv:2310.04064, https://arxiv.org/abs/2310.04064 (Uses more advanced QKV attention mechanism with even more computations than vanilla Transformer.)
Sharan Narang, Hyung Won Chung, Yi Tay, Liam Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, ´ Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, and Colin Raffel. 2021, Do transformer modifications transfer across implementations and applications? Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, 7-11 November, 2021, pages 5758–5773. Association for Computational Linguistics, 2021. https://arxiv.org/abs/2102.11972 (Paper examines various Transformer variants.)
Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Yong Wu, Sameh Gobriel, Charlie Tai, Anshumali Shrivastava, Mar 2021, Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More, https://arxiv.org/abs/2103.10891, Code: https://github.com/RUSH-LAB/SLIDE (Fast training on CPUs using AVX-512 and locality-sensitive hashing of vectors.)
Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain, Oct 2023, MatFormer: Nested Transformer for Elastic Inference, https://arxiv.org/abs/2310.07707 (Multiple submodels inside a large model.)
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning, M Xia, T Gao, Z Zeng, D Chen, arXiv preprint arXiv:2310.06694, Oct 2023, https://arxiv.org/pdf/2310.06694.pdf, Code: https://github.com/princeton-nlp/LLM-Shearing
C Wang, 2023, Applied Intelligence volume 53, pages 19990–20006, HCT-Det: a hybrid CNN-transformer architecture for 3D object detection from point clouds, https://link.springer.com/article/10.1007/s10489-023-04570-z, https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12799/127993I/HCT-Det--a-hybrid-CNN-transformer-architecture-for-3D/10.1117/12.3005832.short?SSO=1, Code: https://github.com/yuzh2022/HCT-Net
P Shamsolmoali, M Zareapoor, H Zhou, X Li, Y Lu, Oct 2023, Distance-based Weighted Transformer Network for Image Completion, arXiv preprint arXiv:2310.07440, https://arxiv.org/abs/2310.07440
S Tan, Y Shen, Z Chen, A Courville, C Gan, Oct 2023, Sparse Universal Transformer, arXiv preprint arXiv:2310.07096, https://arxiv.org/pdf/2310.07096.pdf
H Xu, Y Song, Q Liu, J van Genabith, D Xiong, 2024, Rewiring the Transformer with Depth-Wise LSTMs, LREC-COLING 2024, pages 14122–14133, 20-25 May, 2024, https://aclanthology.org/2024.lrec-main.1231.pdf
Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar, 10 May 2024, Linearizing Large Language Models, https://arxiv.org/abs/2405.06640 Code: https://github.com/TRI-ML/linear_open_lm
Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui, 7 May 2024, Acceleration Algorithms in GNNs: A Survey, https://arxiv.org/abs/2405.04114
Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei, 9 May 2024 (v2), You Only Cache Once: Decoder-Decoder Architectures for Language Models, https://arxiv.org/abs/2405.05254 Code: https://aka.ms/YOCO (A novel decoder-decoder architecture with fast KV caching and cross-attention.)
Badri Narayana Patro, Vijay Srinivas Agneeswaran, 24 Apr 2024, Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges, https://arxiv.org/abs/2404.16112
Sathya Krishnan Suresh, Shunmugapriya P, 24 Apr 2024 (v2), Towards smaller, faster decoder-only transformers: Architectural variants and their implications, https://arxiv.org/abs/2404.14462 Code: https://github.com/SkAndMl/gpt-variations (Focuses on three new variants of decoder-only Transformer architectures: ParallelGPT (p-gpt), LinearlyCompressedGPT (lc-gpt), and ConvCompressedGPT (cc-gpt).)
Jianhui Pang, Fanghua Ye, Longyue Wang, Dian Yu, Derek F. Wong, Shuming Shi, Zhaopeng Tu, 17 Jan 2024 (v2), Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models, https://arxiv.org/abs/2401.08350 Code: https://github.com/pangjh3/LLM4MT
Mackenzie Morehead, Apr 16, 2024, Is Attention All You Need? https://www.mackenziemorehead.com/is-attention-all-you-need/
Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas, 11 Apr 2024, RecurrentGemma: Moving Past Transformers for Efficient Open Language Models, Google Research, https://arxiv.org/abs/2404.07839
Panjie Qi; Edwin Hsing-Mean Sha; Qingfeng Zhuge; Hongwu Peng; Shaoyi Hua, 2021, Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization, 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), https://ieeexplore.ieee.org/document/9643586
SwitchGPT: Adapting Large Language Models for Non-Text Outputs X Wang, B Zhuang, Q Wu - arXiv preprint arXiv:2309.07623, 2023, https://arxiv.org/pdf/2309.07623.pdf
Staphord Bengesi, Hoda El-Sayed, Md Kamruzzaman Sarker, Yao Houkpati, John Irungu, Timothy Oladunni, 2023, Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers, 21 Nov 2023, https://arxiv.org/abs/2311.10242
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, Jan 2024, Understanding LLMs: A Comprehensive Overview from Training to Inference https://arxiv.org/abs/2401.02038
Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni, Nov 2023, Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, https://arxiv.org/abs/2311.00871
Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré, Apr 2023, Hyena Hierarchy: Towards Larger Convolutional Language Models, https://arxiv.org/pdf/2302.10866.pdf
Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, Hao Peng, Jianxin Li, Jia Wu, Ziwei Liu, Pengtao Xie, Caiming Xiong, Jian Pei, Philip S. Yu, Lichao Sun, May 2023, A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT, https://arxiv.org/abs/2302.09419
Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2020. DeLighT: Very Deep and Light-weight Transformer. arXiv:2008.00623 https://arxiv.org/abs/2008.00623 (Different Transformer architecture that includes removing attention heads and simplifies the FFN.)
Zihang Dai, Guokun Lai, Yiming Yang, and Quoc Le. 2020. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. In Proceedings of NeurIPS. https://proceedings.neurips.cc/paper/2020/hash/2cd2915e69546904e4e5d4a2ac9e1652-Abstract.html https://arxiv.org/abs/2006.03236 Code: https://github.com/laiguokun/Funnel-Transformer
Xipeng Qiu, TianXiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained Models for Natural Language Processing: A Survey. SCIENCE CHINA Technological Sciences 63, 10 (2020), 1872–1897. https://doi.org/10.1007/s11431-020-1647-3 https://arxiv.org/abs/2003.08271 (Good survey of Transformer architectures in 2020.)
A Ouyang, June 2023, Understanding the Performance of Transformer Inference, Masters Thesis, Electrical Engineering and Computer Science, MIT, https://dspace.mit.edu/handle/1721.1/151543 https://dspace.mit.edu/bitstream/handle/1721.1/151543/ouyang-aouyang-meng-eecs-2023-thesis.pdf?sequence=1&isAllowed=y (Detailed analysis of Transformer performance, including the techniques of KV caching.)
Sandeep Subramanian, Ronan Collobert, Marc’Aurelio Ranzato, and Y-Lan Boureau. Multi-scale transformer language models. arXiv preprint arXiv:2005.00581, 2020. https://arxiv.org/abs/2005.00581
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. Deebert: Dynamic early exiting for accelerating BERT inference. arXiv preprint arXiv:2004.12993, 2020. https://arxiv.org/abs/2004.12993
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019, https://arxiv.org/abs/1910.01108
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. MobileBERT: a compact task-agnostic BERT for resource-limited devices. arXiv preprint arXiv:2004.02984, 2020. https://arxiv.org/abs/2004.02984
Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, and Qi Ju. FastBERT: a self-distilling BERT with adaptive inference time. arXiv preprint arXiv:2004.02178, 2020. https://arxiv.org/abs/2004.02178
Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, and Jingren Zhou. AdaBERT: Task-adaptive BERT compression with differentiable neural architecture search. arXiv preprint arXiv:2001.04246, 2020. https://arxiv.org/abs/2001.04246
Forrest N Iandola, Albert E Shaw, Ravi Krishna, and Kurt W Keutzer. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? arXiv preprint arXiv:2006.11316, 2020. https://arxiv.org/abs/2006.11316
Andrea Santilli, Silvio Severino, Emilian Postolache, Valentino Maiorca, Michele Mancusi, Riccardo Marin, Emanuele Rodolà, May 2023, Accelerating Transformer Inference for Translation via Parallel Decoding, https://arxiv.org/abs/2305.10427
Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà, 2 May 2024 (v2), A Primer on the Inner Workings of Transformer-based Language Models, https://arxiv.org/pdf/2405.00208 (Analyzes the theory of the Transformer architecture, including an interesting separation of the effects of attention versus FFNs on logits to give attributions.)
Simeon Emanuilov, Apr 4, 2024 LLM agent operating system (AIOS) and the future of LLM-powered agents, https://medium.com/@simeon.emanuilov/llm-agent-operating-system-aios-and-the-future-of-llm-powered-agents-3d08b4e91c34 https://unfoldai.com/aios-llm-powered-agents/
CAMERON R. WOLFE, PH.D. MAR 04, 2024, Decoder-Only Transformers: The Workhorse of Generative LLMs, https://cameronrwolfe.substack.com/p/decoder-only-transformers-the-workhorse
Rachel Gordon, Publication Date:March 21, 2024, AI generates high-quality images 30 times faster in a single step, MIT News, https://news.mit.edu/2024/ai-generates-high-quality-images-30-times-faster-single-step-0321 (MIT's new image generation framework called "distribution matching distillation" is faster than diffusion models.)
Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang, 22 Mar 2024, Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference, https://arxiv.org/abs/2403.14520 Code: https://sites.google.com/view/cobravlm (Multimodal version of the new Mamba architecture.)
Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim, 18 Jan 2024, Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation, https://arxiv.org/abs/2401.08417
Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, 1 Dec 2023, The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, https://arxiv.org/abs/2312.00678 Project: https://github.com/tding1/Efficient-LLM-Survey
Jesse Roberts, 2 Feb 2024 (v3), How Powerful are Decoder-Only Transformer Neural Models? https://arxiv.org/abs/2305.17026
Georgy Tyukin, 2 Apr 2024, Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations, Masters Thesis, Data Science and Machine Learning, University College London., https://arxiv.org/abs/2404.05741 (Reviews various model compression and inference optimization techniques, and specifically analyzes layer skipping and sublayer skipping, such as attention head pruning and FFN/MLP pruning.)
Stan Gibson, 03 Jun 2024, Getting infrastructure right for generative AI, CIO, https://www.cio.com/article/2128440/getting-infrastructure-right-for-generative-ai.html
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai, 25 Jan 2024, ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models, https://arxiv.org/abs/2401.14351 Code: https://github.com/ServerlessLLM/ServerlessLLM
Gavin Li, Nov 19, 2023, Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique, AI Advances https://ai.gopubby.com/unbelievable-run-70b-llm-inference-on-a-single-4gb-gpu-with-this-new-technique-93e2057c7eeb
Yumo Bai, Feb 3, 2024 Why are most LLMs decoder-only? Dive into the rabbit hole of recent advancement in Large Language Models, https://medium.com/@yumo-bai/why-are-most-llms-decoder-only-590c903e4789
Sergey Levine, 2023, UC Berkeley Transformers: CS W182/282A (slides), accessed 3rd Oct 2023, PDF Slides: https://cs182sp21.github.io/static/slides/lec-12.pdf
Christopher Wolters, Xiaoxuan Yang, Ulf Schlichtmann, Toyotaro Suzumura, 12 Jun 2024, Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference, https://arxiv.org/abs/2406.08413
Wang, X.; Zhang, L.L.; Wang, Y.; Yang, M. Towards Efficient Vision Transformer Inference: A First Study of Transformers on Mobile Devices. In Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, HotMobile 2022, Orange County, CA, USA, 22–23 February 2022; pp. 1–7. http://dx.doi.org/10.1145/3508396.3512869
Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. LeViT: A Vision Transformer in ConvNet’s Clothing for Faster Inference. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12239–12249. http://dx.doi.org/10.1109/ICCV48922.2021.01204
Roh, B.; Shin, J.; Shin, W.; Kim, S. Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity. arXiv 2021. http://dx.doi.org/10.48550/arXiv.2111.14330
Li, Y.; Yuan, G.; Wen, Y.; Hu, E.; Evangelidis, G.; Tulyakov, S.; Wang, Y.; Ren, J. EfficientFormer: Vision Transformers at MobileNet Speed. arXiv 2022. http://dx.doi.org/10.48550/arXiv.2206.01191
David Spuler, March 2024, Chapter 2. Transformers & LLMs, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, June 2023, A Survey of Large Language Models, https://arxiv.org/abs/2303.18223
Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou, 2023, Revisiting Vision Transformer from the View of Path Ensemble, https://arxiv.org/abs/2308.06548 PDF: https://arxiv.org/pdf/2308.06548.pdf
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention is all you need, 2017, arXive preprint arXiv:1706.03762. https://arxiv.org/abs/1706.03762
azhar, Dec 29, 2023, Decoding Mamba: The Next Big Leap in AI Sequence Modeling, https://medium.com/ai-insights-cobet/decoding-mamba-the-next-big-leap-in-ai-sequence-modeling-ef3908060cb8
kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
Chen, C, 2024, Hardware‑software co‑exploration and optimization for next‑generation learning machines. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/178423 (Extensive coverage of hardware design with multiple contributions to accelerating various neural network types, ranging from acceleration of various single non-linear functions and end-to-end optimization algorithms. Specific topics include data compression, non-maximum suppression, MHA, and MatMul/GEMM optimizations.)
Louis-François Bouchard, Louie Peters, May 2024, Chapter 2: Architectures, Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG, https://www.amazon.com/Building-LLMs-Production-Reliability-Fine-Tuning/dp/B0D4FFPFW8/
Matt Murphy, Tim Tully, Derek Xiao, January 18, 2024, The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures, Menlo Ventures, https://menlovc.com/perspective/the-modern-ai-stack-design-principles-for-the-future-of-enterprise-ai-architectures/ (Various details about the AI tech stack, organizational AI maturity levels, and several interesting facts: inference is 95% of AI cost now, 60% of organizations are using multi-model methods, RAG is the dominant architecture currently, and AI application development teams are primarily made up of non-ML software engineers leveraging on top of AI models.)
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
Chip Huyen, Jul 25, 2024, Building A Generative AI Platform, https://huyenchip.com/2024/07/25/genai-platform.html
Yorick Sens, Henriette Knopp, Sven Peldszus, Thorsten Berger, 12 Aug 2024, A Large-Scale Study of Model Integration in ML-Enabled Software Systems, https://arxiv.org/abs/2408.06226
Rohan Baskar Prabhakar, Hengrui Zhang, David Wentlzaff, 14 Aug 2024, Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference, https://arxiv.org/abs/2408.07802 (Modified Transformer architecture with parallelized sub-layers of attention and FFN.)
Hugo Laurençon, Andrés Marafioti, Victor Sanh, Léo Tronchon, 22 Aug 2024, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637
Tymofii Reizin, 2024, Fast Algorithms for Attention Mechanism, Bachelor Thesis, Department of Applied Mathematics, Charles University, Prague, https://dspace.cuni.cz/bitstream/handle/20.500.11956/192084/130390128.pdf?sequence=1
Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. Dynabert: Dynamic bert with adaptive width and depth. arXiv preprint arXiv:2004.04037, 2020. https://arxiv.org/abs/2004.04037
Zejian Liu, Fanrong Li, Gang Li, and Jian Cheng. 2021, EBERT: Efficient BERT Inference with Dynamic Structured Pruning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4814– 4823, 2021. https://aclanthology.org/2021.findings-acl.425/
Bobby He, Thomas Hofmann, 31 May 2024 (v2), Simplifying Transformer Blocks, https://arxiv.org/abs/2311.01906 (Examines the removal of various Transformer sublayer components including skip connections, projection/value parameters, and normalization.)
Minghao Shao, Abdul Basit, Ramesh Karri, Muhammad Shafique, Architectures: Trends, Benchmarks, and Challenges, https://www.researchgate.net/profile/Minghao_Shao2/publication/383976933Survey of different Large Language Model_Survey_of_different_Large_Language_Model_Architectures_Trends_Benchmarks_and_Challenges/links/66e2d320f84dd1716ce79f85/Survey-of-different-Large-Language-Model-Architectures-Trends-Benchmarks-and-Challenges.pdf
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping, 17 Sep 2024, NVLM: Open Frontier-Class Multimodal LLMs, NVIDIA, https://arxiv.org/abs/2409.11402 https://huggingface.co/nvidia/NVLM-D-72B https://nvlm-project.github.io/
Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo, 17 Oct 2024, Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation, https://arxiv.org/abs/2410.13848 https://github.com/deepseek-ai/Janus?tab=readme-ov-file
Akash Bajwa, Feb 03, 2025, Forward Deployed Engineers: A Means To An End For AI Startups: Capturing Business Logic And Expert Reasoning, https://akashbajwa.substack.com/p/forward-deployed-engineers-a-means (" AI truly is a new way of computing, and that means the better analogies are to computing itself. Transformers are the transistor, and mainframes are today’s models. The GUI is, arguably, still TBD.")
Devansh, Jun 1, 2025, The Costly Open-Source LLM Lie: Open Source LLMs are not Free, https://machine-learning-made-simple.medium.com/the-costly-open-source-llm-lie-f83fdc5d5701

That's a Lot of BERTs!

BERT was an early 2019 Transformer architecture that was significantly innovative. Since then, there have been a great many variants of "BERT" (e.g. FastBERT, MobileBERT, DistilBERT, etc.). Research papers on variants of BERT include:

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, May 2019, https://arxiv.org/abs/1810.04805, Code: https://github.com/google-research/bert (The one BERT to rule them all.)
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. MobileBERT: a compact task-agnostic BERT for resource-limited devices. arXiv preprint arXiv:2004.02984, 2020. https://arxiv.org/abs/2004.02984
Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, and Qi Ju. FastBERT: a self-distilling BERT with adaptive inference time. arXiv preprint arXiv:2004.02178, 2020. https://arxiv.org/abs/2004.02178
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. DeeBERT: Dynamic early exiting for accelerating BERT inference. arXiv preprint arXiv:2004.12993, 2020. https://arxiv.org/abs/2004.12993
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019, https://arxiv.org/abs/1910.01108
Forrest N Iandola, Albert E Shaw, Ravi Krishna, and Kurt W Keutzer. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? arXiv preprint arXiv:2006.11316, 2020. https://arxiv.org/abs/2006.11316
Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, and Jingren Zhou. AdaBERT: Task-adaptive BERT compression with differentiable neural architecture search. arXiv preprint arXiv:2001.04246, 2020. https://arxiv.org/abs/2001.04246
Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. Dynabert: Dynamic bert with adaptive width and depth. arXiv preprint arXiv:2004.04037, 2020. https://arxiv.org/abs/2004.04037
Zejian Liu, Fanrong Li, Gang Li, and Jian Cheng. EBERT: Efficient BERT Inference with Dynamic Structured Pruning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4814– 4823, 2021. https://aclanthology.org/2021.findings-acl.425/
Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized BERT pretraining approach,” CoRR, 2019. https://arxiv.org/abs/1907.11692
Z. Jiang, W. Yu, D. Zhou, Y. Chen, J. Feng, and S. Yan, “Convbert: Improving BERT with span-based dynamic convolution,” in NeurIPS, 2020, https://arxiv.org/abs/2008.02496
H. Bao, L. Dong, S. Piao, and F. Wei, “BEit: BERT pre-training of image transformers,” in International Conference on Learning Representations, 2022. https://arxiv.org/abs/2106.08254
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu, Oct 2020, TinyBERT: Distilling BERT for Natural Language Understanding, https://arxiv.org/abs/1909.10351
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Kester Wong, Sahan Bulathwela and Mutlu Cukurova, 19 Jul 2025, Exploring Human-AI Complementarity in CPS Diagnosis Using Unimodal and Multimodal BERT Models, https://arxiv.org/abs/2507.14579
Kester Wong, Sahan Bulathwela and Mutlu Cukurova, 19 Jul 2025, Explainable Collaborative Problem Solving Diagnosis with BERT using SHAP and its Implications for Teacher Adoption, https://arxiv.org/abs/2507.14584
Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao, 1 Aug 2025, MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations, https://arxiv.org/abs/2508.00760
Tianpei Lu, Bingsheng Zhang, Lekun Peng, Bowen Zheng, Lichun Li, Kui Ren, 3 Aug 2025, Privacy-Preserving Inference for Quantized BERT Models, https://arxiv.org/abs/2508.01636
Zijian Zhao, Fanyi Meng, Zhonghao Lyu, Hang Li, Xiaoyang Li, Guangxu Zhu, 3 Aug 2025, CSI-BERT2: A BERT-inspired Framework for Efficient CSI Prediction and Classification in Wireless Communication and Sensing, https://arxiv.org/abs/2412.06861
Tai Vu, Robert Yang, 14 Aug 2025, BERT-VQA: Visual Question Answering on Plots, https://arxiv.org/abs/2508.13184
Minh Tran, Jeffery C. Chan, Min Li Huang, Maya Kansara, John P. Grady, Christine E. Napier, Subotheni Thavaneswaran, Mandy L. Ballinger, David M. Thomas, Frank P. Lin, 21 Aug 2025, A Robust BERT-Based Deep Learning Model for Automated Cancer Type Extraction from Unstructured Pathology Reports, https://arxiv.org/abs/2508.15149
Kun Liu, Tuozhen Liu, Feifei Wang, and Rui Pan, 13 Aug 2025, A BERT-based Hierarchical Classification Model with Applications in Chinese Commodity Classification, https://arxiv.org/abs/2508.15800
Daniel Frees, Aditri Bhagirath, Moritz Bolling, 25 Aug 2025, Exploring Efficient Learning of Small BERT Networks with LoRA and DoRA, https://arxiv.org/abs/2508.17586
Suramya Jadhav, Abhay Shanbhag, Amogh Thakurdesai, Ridhima Sinare, Ananya Joshi, Raviraj Joshi, 24 Aug 2025, MahaParaphrase: A Marathi Paraphrase Detection Corpus and BERT-based Models, https://arxiv.org/abs/2508.17444
Mayur Shirke, Amey Shembade, Pavan Thorat, Madhushri Wagh, Raviraj Joshi, 2 Sep 2025, Comparative Study of Pre-Trained BERT and Large Language Models for Code-Mixed Named Entity Recognition, https://arxiv.org/abs/2509.02514

Next-Generation Architectures

What comes after Transformers? Maybe the answer is: more Transformers! Certainly, the newer multi-modal Transformers are gaining momentum, and there are other advanced Transformers:

Vision Transformer (ViT)
Multimodal transformer
Ensemble architectures (multi-AI, such as MoE)
Agent architectures (e.g., function calling, autonomous agents)
Advanced RAG architectures
Tool Augmented Language Models (TALM)
Retrieval Augment Language Models (RALM)
Compound AI architectures

However, there are some alternatives to Transformers that have been gathering steam. Here are a few newer architectures already being worked on:

State Space Models (SSMs)
RWKV (Transformer-RNN hybrid)
Mamba (a type of SSM)
Graph Neural Networks and Knowledge Graph extensions
S4 Hyena architecture
Spiking Neural Networks (SNNs) and Spiking Transformers
Weightless Neural Networks (WNNs)
Liquid Neural Networks (LNNs)
Hybrid Transformer-RNN architectures
Hybrid Transformer-CNN architectures

Research papers on next-gen architectures:

Rob Toews, Sep 3, 2023, Transformers Revolutionized AI. What Will Replace Them? Forbes, https://www.forbes.com/sites/robtoews/2023/09/03/transformers-revolutionized-ai-what-will-replace-them/
Badri Narayana Patro, Vijay Srinivas Agneeswaran, 24 Apr 2024, Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges, https://arxiv.org/abs/2404.16112
Sathya Krishnan Suresh, Shunmugapriya P, 24 Apr 2024 (v2), Towards smaller, faster decoder-only transformers: Architectural variants and their implications, https://arxiv.org/abs/2404.14462 Code: https://github.com/SkAndMl/gpt-variations (Focuses on three new variants of decoder-only Transformer architectures: ParallelGPT (p-gpt), LinearlyCompressedGPT (lc-gpt), and ConvCompressedGPT (cc-gpt).)
Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, 22 Apr 2024, A Survey on Efficient Inference for Large Language Models, https://arxiv.org/abs/2404.14294
David Spuler, March 2024, Chapter 43. Overview of AI Research, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Johannes Schneider, 1 Aug 2024, What comes after transformers? -- A selective survey connecting ideas in deep learning, https://arxiv.org/abs/2408.00386
Rohan Baskar Prabhakar, Hengrui Zhang, David Wentlzaff, 14 Aug 2024, Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference, https://arxiv.org/abs/2408.07802 (Modified Transformer architecture with parallelized sub-layers of attention and FFN.)
Cem Dilmegani, Jan 10, 2024, The Future of Large Language Models in 2024, https://research.aimultiple.com/future-of-large-language-models/
Bobby He, Thomas Hofmann, 31 May 2024 (v2), Simplifying Transformer Blocks, https://arxiv.org/abs/2311.01906 (Examines the removal of various Transformer sublayer components including skip connections, projection/value parameters, and normalization.)
Roy Lo, June 13, 2024, Defining AI 2.0: Beyond Generative AI, https://www.linkedin.com/pulse/defining-ai-20-beyond-generative-roy-lo-tbvie/
Ryan McNeal, Aug 27, 2024, ChatGPT and GPT-4 could get a sweet upgrade this fall with 'strawberry', https://www.androidauthority.com/openai-strawberry-ai-3475682/
Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou, 26 May 2024, Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers, https://arxiv.org/abs/2405.16411 (Higher-order attention using tensors to generalize QKV matrices.)
Joanne Chen, July 23, 2024, What’s Next After Transformers, https://foundationcapital.com/whats-next-after-transformers/
Martin_Casado, Aug 31, 2024, Tweet (State of LLMs) https://threadreaderapp.com/thread/1829905130512400775.html
Anil Ananthaswamy, August 30, 2024, A new way to build neural networks could make AI more understandable, https://www.technologyreview.com/2024/08/30/1103385/a-new-way-to-build-neural-networks-could-make-ai-more-understandable/?tpcc=NL_Marketing (About Kolmogorov-Arnold Networks or KANs.)
Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela, 17 Apr 2024 (v2), Generative Representational Instruction Tuning, https://arxiv.org/abs/2402.09906
Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy, 20 Aug 2024, Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model, https://www.arxiv.org/abs/2408.11039 (Merging Transformer architectures with diffusion in training multimodal models.)
Cobus Greyling, Sep 2024, An AI Agent Architecture & Framework Is Emerging, https://cobusgreyling.medium.com/an-ai-agent-architecture-framework-is-emerging-addae3804f23
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping, 17 Sep 2024, NVLM: Open Frontier-Class Multimodal LLMs, NVIDIA, https://arxiv.org/abs/2409.11402 https://huggingface.co/nvidia/NVLM-D-72B https://nvlm-project.github.io/
Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo, 17 Oct 2024, Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation, https://arxiv.org/abs/2410.13848 https://github.com/deepseek-ai/Janus?tab=readme-ov-file
Carl Franzen, October 23, 2024, OpenAI researchers develop new model that speeds up media generation by 50X, https://venturebeat.com/ai/openai-researchers-develop-new-model-that-speeds-up-media-generation-by-50x/
Dr. Ashish Bamania, Nov 2024, XNets Are Here To Outcompete MLPs & KANs A deep dive into XNets, a new neural network architecture that outperforms MLPs, KANs, and PINNs across various benchmarks, along with a guide to building one from scratch. https://levelup.gitconnected.com/xnets-are-here-to-outcompete-mlps-kans-3ff569819165
Xin Li, Zhihong Xia, Hongkun Zhang, 28 Sep 2024, Cauchy activation function and XNet, https://arxiv.org/abs/2409.19221
Felix Petersen, Hilde Kuehne, Christian Borgelt, Julian Welzel, Stefano Ermon, 7 Nov 2024, Convolutional Differentiable Logic Gate Networks, 38th Conference on Neural Information Processing Systems (NeurIPS 2024), https://arxiv.org/abs/2411.04732
From Transformers to the Future: An In-Depth Exploration of Modern Language Model Architectures H Xu, Z Bi, H Tseng, X Song, P Feng, https://osf.io/n8r5j/download
Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, Yingyan Lin, Jan Kautz, Pavlo Molchanov, 20 Nov 2024, Hymba: A Hybrid-head Architecture for Small Language Models, https://arxiv.org/abs/2411.13676
Gil Dibner, Sep 25, 2024, Am I thinking about AI the right way? Angular Ventures, https://medium.com/angularventures/am-i-thinking-about-ai-the-right-way-4513760cd83e
Vincent-Pierre Berges, Barlas Oguz, December 12, 2024, Memory Layers at Scale, Meta, https://ai.meta.com/research/publications/memory-layers-at-scale/ https://github.com/facebookresearch/memory (Augmention of an LLM with an additional key-value associative memory, by replacing some FFNs with a "memory layer".)
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, Yongqin Xian, Jan Eric Lenssen, Liwei Wang, Federico Tombari, Bernt Schiele, 30 Oct 2024, TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters, https://haiyang-w.github.io/tokenformer.github.io/ (Unique novel token-based attention mechanism.)
Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Doing additional processing of the KV cache data to improve accuracy.)
Paul Sawers, January 23, 2025, Meta’s Yann LeCun predicts a ‘new AI architectures paradigm’ within 5 years and ‘decade of robotics’, https://techcrunch.com/2025/01/23/metas-yann-lecun-predicts-a-new-ai-architectures-paradigm-within-5-years-and-decade-of-robotics/
Akash Bajwa, Feb 03, 2025, Forward Deployed Engineers: A Means To An End For AI Startups: Capturing Business Logic And Expert Reasoning, https://akashbajwa.substack.com/p/forward-deployed-engineers-a-means (" AI truly is a new way of computing, and that means the better analogies are to computing itself. Transformers are the transistor, and mainframes are today’s models. The GUI is, arguably, still TBD.")
Marina Temkin, February 26, 2025, Inception emerges from stealth with a new type of AI model, https://techcrunch.com/2025/02/26/inception-emerges-from-stealth-with-a-new-type-of-ai-model/ (This is a "Diffusion Language Model" or DLM.)
Jacinta Bowler, Wed 5 Mar, Melbourne start-up launches 'biological computer' made of human brain cells, ABC Science, https://www.abc.net.au/news/science/2025-03-05/cortical-labs-neuron-brain-chip/104996484 (LOL, human brains strike back!)
Dr. Ashish Bamania, March 3rd, 2025, ‘FANformer’ Is The New Game-Changing Architecture For LLMs: A deep dive into how FANFormer architecture works and what makes it so powerful compared to Transformers, https://levelup.gitconnected.com/fanformer-is-the-new-game-changing-architecture-for-llms-d56999fab7f2
Yihong Dong, Ge Li, Xue Jiang, Yongding Tao, Kechi Zhang, Hao Zhu, Huanyu Liu, Jiazheng Ding, Jia Li, Jinliang Deng, Hong Mei, 28 Feb 2025, FANformer: Improving Large Language Models Through Effective Periodicity Modeling, https://www.arxiv.org/abs/2502.21309
lucalp, 24/06/2025, The Bitter Lesson is coming for Tokenization: a world of LLMs without tokenization is desirable and increasingly possible, https://lucalp.dev/bitter-lesson-tokenization-and-blt/
Dr. Ashish Bamania, Aug 2025, Hierarchical Reasoning Model: An AI Architecture That Beats OpenAI’s ‘o3-mini-high’ Is Here: A deep dive into the Hierarchical Reasoning Model (HRM) to understand its internals that help it outperform powerful reasoning models available to us today, https://ai.gopubby.com/hierarchical-reasoning-model-an-ai-architecture-that-beats-openais-o3-mini-high-is-here-2c3128ba1727
Kenneth Wolters, Aug 12, 2025, No AGI in Sight: What This Means for LLMs, https://kennethwolters.com/posts/no-agi/
Beining Wu, Jun Huang and Shui Yu, 25 Jul 2025, "X of Information'' Continuum: A Survey on AI-Driven Multi-dimensional Metrics for Next-Generation Networked Systems, https://arxiv.org/abs/2507.19657
Ayan Biswas, Terece L. Turton, Nishath Rajiv Ranasinghe, Shawn Jones, Bradley Love, William Jones, Aric Hagberg, Han-Wei Shen, Nathan DeBardeleben and Earl Lawrence, 18 Jul 2025, VizGenie: Toward Self-Refining, Domain-Aware Workflows for Next-Generation Scientific Visualization, https://arxiv.org/abs/2507.21124
Nadja R. Ging-Jehli, Russell K. Childers, Joshua Lu, Robert Gemma, Rachel Zhu, 11 Jul 2025, Gearshift Fellowship: A Next-Generation Neurocomputational Game Platform to Model and Train Human-AI Adaptability, https://arxiv.org/abs/2508.00850
Liangbo Ning, Ziran Liang, Zhuohang Jiang, Haohao Qu, Yujuan Ding, Wenqi Fan, Xiao-yong Wei, Shanru Lin, Hui Liu, Philip S. Yu, Qing Li, 5 Aug 2025, A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models, https://arxiv.org/abs/2503.23350
Fardis Nadimi, Payam Abdisarabshali, Kasra Borazjani, Jacob Chakareski, Seyyedali Hosseinalipour, 5 Aug 2025, Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR, https://arxiv.org/abs/2506.05683
Huan Zhang, Daokun Zhang, Kexin Meng, and Geoffrey I. Webb, 15 Aug 2025, Towards the Next-generation Bayesian Network Classifiers, https://arxiv.org/abs/2508.11145
Suman Saha and Fatemeh Rahbari and Farhan Sadique and Sri Krishna Chaitanya Velamakanni, Mahfuza Farooque and William J. Rothwell, 13 Aug 2025, Next-Gen Education: Enhancing AI for Microlearning, https://arxiv.org/abs/2508.11704
Jesmin Jahan Tithi and Hanjiang Wu and Avishaii Abuhatzera and Fabrizio Petrini, 19 Aug 2025, Scaling Intelligence: Designing Data Centers for Next-Gen Language Models, https://arxiv.org/abs/2506.15006
Pengsong Zhang, Xiang Hu, Guowei Huang, Yang Qi, Heng Zhang, Xiuxu Li, Jiaxing Song, Jiabin Luo, Yijiang Li, Shuo Yin, Chengxiao Dai, Eric Hanchen Jiang, Xiaoyan Zhou, Zhenfei Yin, Boqin Yuan, Jing Dong, Guinan Su, Guanren Qiao, Haiming Tang, Anghong Du, Lili Pan, Zhenzhong Lan, Xinyu Liu, 20 Aug 2025, aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists, https://arxiv.org/abs/2508.15126
Long Jiang, Yang Yang, Ting Fong May Chui, Morgan Thornwell, Hoshin Vijai Gupta, 2 Sep 2025, Knowledge distillation as a pathway toward next-generation intelligent ecohydrological modeling systems, https://arxiv.org/abs/2509.01972
Linyue Cai, Yuyang Cheng, Xiaoding Shao, Huiming Wang, Yong Zhao, Wei Zhang, Kang Li, 16 Sep 2025, A Scenario-Driven Cognitive Approach to Next-Generation AI Memory, https://arxiv.org/abs/2509.13235
Jesse Gardner, Vladimir A. Baulin, 13 Sep 2025, Is the `Agent' Paradigm a Limiting Framework for Next-Generation Intelligent Systems?, https://arxiv.org/abs/2509.10875
Rok Cestnik, Erik A. Martens, 14 Sep 2025, Next-Generation Reservoir Computing for Dynamical Inference, https://arxiv.org/abs/2509.11338

RWKV Architecture

The RWKV architecture is a hybrid Transformer-RNN architecture. Research papers on RWKV include:

Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar, 10 May 2024, Linearizing Large Language Models, https://arxiv.org/abs/2405.06640 Code: https://github.com/TRI-ML/linear_open_lm
Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng, 11 Jun 2024. RWKV-CLIP: A Robust Vision-Language Representation Learner, https://arxiv.org/abs/2406.06973 Code: https://github.com/deepglint/RWKV-CLIP
Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang, 27 Jun 2024, From Efficient Multimodal Models to World Models: A Survey, https://arxiv.org/abs/2407.00118 (A survey of multimodal models with coverage of many optimization techniques.)
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Joanne Chen, July 23, 2024, What’s Next After Transformers, https://foundationcapital.com/whats-next-after-transformers/
Théodor Lemerle, Harrison Vanderbyl, Vaibhav Srivastav, Nicolas Obin, Axel Roebel, 30 Oct 2024, Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis, https://arxiv.org/abs/2410.23320 https://theodorblackbird.github.io/blog/demo_lina/
Akul Datta, 5 Nov 2024, The Evolution of RWKV: Advancements in Efficient Language Modeling, https://arxiv.org/abs/2411.02795
From Transformers to the Future: An In-Depth Exploration of Modern Language Model Architectures H Xu, Z Bi, H Tseng, X Song, P Feng, https://osf.io/n8r5j/download
Wonkyo Choe, Yangfeng Ji, Felix Lin, 14 Dec 2024, RWKV-edge: Deeply Compressed RWKV for Resource-Constrained Devices, https://arxiv.org/abs/2412.10856
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
Sicheng Chen, Tianyi Zhang, Dankai Liao, Dandan Li, Low Chang Han, Yanqin Jiang, Yueming Jin, Shangqing Lyu, 5 Mar 2025, PathRWKV: Enabling Whole Slide Prediction with Recurrent-Transformer, https://arxiv.org/abs/2503.03199
Liu Xiao, Li Zhiyuan, Lin Yueyu, 27 Apr 2025, WuNeng: Hybrid State with Attention, https://arxiv.org/abs/2504.19191
Xiao Wang, Haiyang Wang, Shiao Wang, Qiang Chen, Jiandong Jin, Haoyu Song, Bo Jiang, Chenglong Li, 6 Aug 2025, RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework, https://arxiv.org/abs/2504.10018

State Space Models (SSMs)

Badri Narayana Patro, Vijay Srinivas Agneeswaran, 24 Apr 2024, Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges, https://arxiv.org/abs/2404.16112
8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
Karan Goel, August 27, 2024, The On‑Device Intelligence Update https://cartesia.ai/blog/2024-08-27-on-device (On-device state space models.)
Nicolas Stellwag, 2024, Structured State Space Models, https://nicolasstellwag.com/download/structured_SSMs.pdf
Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Guohao Dai, 6 Oct 2024, Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective, https://arxiv.org/abs/2410.04466
Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai (Helen)Li, Yiran Chen, 8 Oct 2024. A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models, https://arxiv.org/abs/2410.07265
Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, Yingyan Lin, Jan Kautz, Pavlo Molchanov, 20 Nov 2024, Hymba: A Hybrid-head Architecture for Small Language Models, https://arxiv.org/abs/2411.13676
Yash Akhauri, Safeen Huda, Mohamed S. Abdelfattah, 26 Nov 2024, Attamba: Attending To Multi-Token States, https://arxiv.org/abs/2411.17685
Rui Pan, Zhuang Wang, Zhen Jia, Can Karakus, Luca Zancato, Tri Dao, Ravi Netravali, Yida Wang, 28 Nov 2024, Marconi: Prefix Caching for the Era of Hybrid LLMs, https://arxiv.org/abs/2411.19379 (Prefix caching applied to hybrid SSM-Transformer LLMs.)
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Jonas Ulmen, Ganesh Sundaram, and Daniel G\"orges, 14 Aug 2025, Learning State-Space Models of Dynamic Systems from Arbitrary Data using Joint Embedding Predictive Architectures, https://arxiv.org/abs/2508.10489
Xiaochun Lei, Siqi Wu, Weilin Wu, Zetao Jiang, 24 Jul 2025, MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection, https://arxiv.org/abs/2506.03654
Sen Lu, Xiaoyu Zhang, Mingtao Hu, Eric Yeu-Jer Lee, Soohyeon Kim, Wei D. Lu, 18 Jul 2025, State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions, https://arxiv.org/abs/2507.13638
Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon, 19 Jul 2025, Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length, https://arxiv.org/abs/2507.12442
A. Quadir, M. Tanveer, 8 Aug 2025, Hypergraph Neural Network with State Space Models for Node Classification, https://arxiv.org/abs/2508.06587
Yizhuo Wu, Francesco Fioranelli, Chang Gao, 27 Jul 2025, RadMamba: Efficient Human Activity Recognition through Radar-based Micro-Doppler-Oriented Mamba State-Space Model, https://arxiv.org/abs/2504.12039
Shiva Raja, Cansu Demirkiran, Aakash Sarkar, Milos Popovic, Ajay Joshi, 29 Jul 2025, Systolic Array-based Accelerator for State-Space Models, https://arxiv.org/abs/2507.21394
Yifan Yu, Shengjie Xiu, Daniel P. Palomar, 30 Jul 2025, Robust Filtering and Learning in State-Space Models: Skewness and Heavy Tails Via Asymmetric Laplace Distribution, https://arxiv.org/abs/2507.22343
Hiroki Sakamoto and Kazuhiro Sato, 30 Jul 2025, Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction, https://arxiv.org/abs/2507.10078
Julian Lemmel, Manuel Kranzl, Adam Lamine, Philipp Neubauer, Radu Grosu, Sophie Neubauer, 1 Aug 2025, Online Fine-Tuning of Carbon Emission Predictions using Real-Time Recurrent Learning for State Space Models, https://arxiv.org/abs/2508.00804
Joshua Dimasaka, Christian Gei{\ss}, Emily So, 2 Aug 2025, GraphVSSM: Graph Variational State-Space Model for Probabilistic Spatiotemporal Inference of Dynamic Exposure and Vulnerability for Regional Disaster Resilience Assessment, https://arxiv.org/abs/2508.01310
Federico Arangath Joseph, Kilian Konstantin Haefeli, Noah Liniger and Caglar Gulcehre, 3 Aug 2025, HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context, https://arxiv.org/abs/2407.09375
Yiyi Wang, Jian'an Zhang, Hongyi Duan, Haoyang Liu, Qingyang Li, 5 Aug 2025, Rethinking Selectivity in State Space Models: A Minimal Predictive Sufficiency Approach, https://arxiv.org/abs/2508.03158
Leon G\"otz, Marcel Kollovieh, Stephan G\"unnemann, Leo Schwinn, 5 Aug 2025, Efficient Time Series Processing for Transformers and State-Space Models through Token Merging, https://arxiv.org/abs/2405.17951
Yuannuo Feng, Wenyong Zhou, Yuexi Lyu, Hanjie Liu, Zhengwu Liu, Ngai Wong, Wang Kang, 16 Aug 2025, HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware, https://arxiv.org/abs/2508.11935
Chenhui Xu, Dancheng Liu, Yuting Hu, Jiajie Li, Ruiyang Qin, Qingxiao Zheng, Jinjun Xiong, 16 Aug 2025, Sub-Sequential Physics-Informed Learning with State Space Model, https://arxiv.org/abs/2502.00318
Zhihao Zhan, Jianan Zhao, Zhaocheng Zhu, Jian Tang, 16 Aug 2025, Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention, https://arxiv.org/abs/2507.00449
Hongfan Gao, Wangmeng Shen, Xiangfei Qiu, Ronghui Xu, Jilin Hu and Bin Yang, 19 Aug 2025, SSD-TS: Exploring the Potential of Linear State Space Models for Diffusion Models in Time Series Imputation, https://arxiv.org/abs/2410.13338
Peiming Li, Ziyi Wang, Yulin Yuan, Hong Liu, Xiangming Meng, Junsong Yuan, Mengyuan Liu, 20 Aug 2025, UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling, https://arxiv.org/abs/2508.14604
Trinayan Baruah, Kaustubh Shivdikar, Sara Prescott, and David Kaeli, 25 Aug 2025, Characterizing the Behavior of Training Mamba-based State Space Models on GPUs, https://arxiv.org/abs/2508.17679
Eric Alsmann, Martin Lange, 25 Aug 2025, The Computational Complexity of Satisfiability in State Space Models, https://arxiv.org/abs/2508.18162
Xavier Gonzalez, Leo Kozachkov, David M. Zoltowski, Kenneth L. Clarkson, Scott W. Linderman, 22 Aug 2025, Predictability Enables Parallelization of Nonlinear State Space Models, https://arxiv.org/abs/2508.16817
Siddharth Chaudhary, Bennett Browning, 20 Aug 2025, Hydra: A 1.6B-Parameter State-Space Language Model with Sparse Attention, Mixture-of-Experts, and Memory, https://arxiv.org/abs/2508.15099
Behnoush Khavari, Mehran Shakerinava, Jayesh Khullar, Jerry Huang, Fran\c{c}ois Rivest, Siamak Ravanbakhsh, Sarath Chandar, 10 Aug 2025, Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs, https://arxiv.org/abs/2508.07395
Cong Ma, Kayvan Najarian, 4 Sep 2025, Rethinking the long-range dependency in Mamba/SSM and transformer models, https://arxiv.org/abs/2509.04226
Pradeep Singh, Balasubramanian Raman, 4 Sep 2025, Echo State Networks as State-Space Models: A Systems Perspective, https://arxiv.org/abs/2509.04422
Destiny Okpekpe, Antonio Orvieto, 26 Aug 2025, When recalling in-context, Transformers are not SSMs, https://arxiv.org/abs/2508.19029
Ruben Solozabal, Velibor Bojkovic, Hilal AlQuabeh, Kentaro Inui, Martin Tak\'a\v{c}, 28 Aug 2025, Uncovering the Spectral Bias in Diagonal State Space Models, https://arxiv.org/abs/2508.20441
John T. Halloran, Manbir Gulati, Paul F. Roysdon, 29 Aug 2025, Mamba State-Space Models Are Lyapunov-Stable Learners, https://arxiv.org/abs/2406.00209
Stefan-Alexandru Jura, Mihai Udrescu, Alexandru Topirceanu, 29 Aug 2025, Quantum-Optimized Selective State Space Model for Efficient Time Series Prediction, https://arxiv.org/abs/2509.00259
Huaicheng Zhang, Ruoxin Wang, Chenlian Zhou, Jiguang Shi, Yue Ge, Zhoutong Li, Sheng Chang, Hao Wang, Jin He and Qijun Huang, 3 Sep 2025, S2M2ECG: Spatio-temporal bi-directional State Space Model Enabled Multi-branch Mamba for ECG, https://arxiv.org/abs/2509.03066
Tengjie Zheng, Haipeng Chen, Lin Cheng, Shengping Gong, Xu Huang, 3 Sep 2025, Recursive Gaussian Process State Space Model, https://arxiv.org/abs/2411.14679
Takashi Morita, 8 Sep 2025, Emergence of the Primacy Effect in Structured State-Space Models, https://arxiv.org/abs/2502.13729
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma, 5 Sep 2025, Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments, https://arxiv.org/abs/2505.08299
Zhi Qin Tan, Xiatian Zhu, Owen Addison, Yunpeng Li, 15 Sep 2025, U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT, https://arxiv.org/abs/2509.12069
Junzhi She, Xunkai Li, Rong-Hua Li, Guoren Wang, 17 Sep 2025, State Space Models over Directed Graphs, https://arxiv.org/abs/2509.13735

Hyena Architecture

Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih Porikli, 16 Jul 2024, PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer, https://arxiv.org/abs/2407.11306
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
From Transformers to the Future: An In-Depth Exploration of Modern Language Model Architectures H Xu, Z Bi, H Tseng, X Song, P Feng, https://osf.io/n8r5j/download
Yifei Wang, Wenbin Wang, Yong Luo, 12 Sep 2025, DyKen-Hyena: Dynamic Kernel Generation via Cross-Modal Attention for Multimodal Intent Recognition, https://arxiv.org/abs/2509.09940

Mamba Architecture

The Mamba architecture is an advanced AI architecture based on the State Space Model (SSM) architecture. Research papers on Mamba include:

Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar, 10 May 2024, Linearizing Large Language Models, https://arxiv.org/abs/2405.06640 Code: https://github.com/TRI-ML/linear_open_lm
Badri Narayana Patro, Vijay Srinivas Agneeswaran, 24 Apr 2024, Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges, https://arxiv.org/abs/2404.16112
Zeyu Wang, Chen Li, Huiying Xu, Xinzhong Zhu, 9 Jun 2024, Mamba YOLO: SSMs-Based YOLO For Object Detection, https://arxiv.org/abs/2406.05835
Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung, 5 Jun 2024, Audio Mamba: Bidirectional State Space Model for Audio Representation Learning, https://arxiv.org/abs/2406.03344
Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang, 3 Jun 2024, Dimba: Transformer-Mamba Diffusion Models, https://arxiv.org/abs/2406.01159
Radar AI, Mar 2024, An Introduction to the Mamba LLM Architecture: A New Paradigm in Machine Learning, https://www.datacamp.com/tutorial/introduction-to-the-mamba-llm-architecture
Albert Gu, Tri Dao, 31 May 2024 (v2), Mamba: Linear-Time Sequence Modeling with Selective State Spaces, https://arxiv.org/abs/2312.00752
Xiaogang Jia, Qian Wang, Atalay Donat, Bowen Xing, Ge Li, Hongyi Zhou, Onur Celik, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann, 12 Jun 2024, MaIL: Improving Imitation Learning with Mamba, https://arxiv.org/abs/2406.08234
Marko Vidrih, Jun 7, 2024, Mamba-2 is Out: Can it replace Transformers? https://vidrihmarko.medium.com/mamba-2-is-out-can-it-replace-transformers-6cfb3372ea39
Albert Gu, Tri Dao, State Space Duality (Mamba-2) Part I - The Model, May 31, 2024, https://goombalab.github.io/blog/2024/mamba2-part1-model/
azhar, Dec 29, 2023, Decoding Mamba: The Next Big Leap in AI Sequence Modeling, https://medium.com/ai-insights-cobet/decoding-mamba-the-next-big-leap-in-ai-sequence-modeling-ef3908060cb8
Waleffe, Roger ; Byeon, Wonmin ; Riach, Duncan ; Norick, Brandon ; Korthikanti, Vijay ; Dao, Tri ; Gu, Albert ; Hatamizadeh, Ali ; Singh, Sudhakar ; Narayanan, Deepak ; Kulshreshtha, Garvit ; Singh, Vartika ; Casper, Jared ; Kautz, Jan ; Shoeybi, Mohammad ; Catanzaro, Bryan, June 2024, An Empirical Study of Mamba-based Language Models, https://arxiv.org/abs/2406.07887 https://ui.adsabs.harvard.edu/abs/2024arXiv240607887W/abstract
8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang, 27 Jun 2024, From Efficient Multimodal Models to World Models: A Survey, https://arxiv.org/abs/2407.00118 (A survey of multimodal models with coverage of many optimization techniques.)
Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih Porikli, 16 Jul 2024, PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer, https://arxiv.org/abs/2407.11306
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Haohao Qu, Liangbo Ning, Rui An, Wenqi Fan, Tyler Derr, Xin Xu, Qing Li, 2 Aug 2024, A Survey of Mamba, https://arxiv.org/abs/2408.01129
Jingwei Zuo, Maksim Velikanov, Dhiya Eddine, Ilyas Chahed, Younes Belkada, Guillaume Kunsch, August 12, 2024, Welcome FalconMamba: The first strong attention-free 7B model, https://huggingface.co/blog/falconmamba
Jamba Team, 22 Aug 2024, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale, https://arxiv.org/abs/2408.12570
Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao, 9 Aug 2024 (v2), ReMamba: Equip Mamba with Effective Long-Sequence Modeling, https://arxiv.org/abs/2408.15496
From Transformers to the Future: An In-Depth Exploration of Modern Language Model Architectures H Xu, Z Bi, H Tseng, X Song, P Feng, https://osf.io/n8r5j/download
Shengkun Tang, Liqun Ma, Haonan Li, Mingjie Sun, Zhiqiang Shen, 18 Nov 2024, Bi-Mamba: Towards Accurate 1-Bit State Space Models, https://arxiv.org/abs/2411.11843
Thanaphon Suwannaphong, Ferdian Jovan, Ian Craddock, Ryan McConville, 12 Dec 2024, Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices, https://arxiv.org/abs/2412.09289
Mingjia Shi, Yuhao Zhou, Ruiji Yu, Zekai Li, Zhiyuan Liang, Xuanlei Zhao, Xiaojiang Peng, Tanmay Rajpurohit, Shanmukha Ramakrishna Vedantam, Wangbo Zhao, Kai Wang, Yang You, 17 Dec 2024, Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training, https://arxiv.org/abs/2412.12496
HF, December 18, 2024, Bamba: Inference-Efficient Hybrid Mamba2 Model, https://huggingface.co/blog/bamba
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Zhenxuan Yu, Yutaka Matsuo, Takeshi Kojima, Yusuke Iwasawa, Jan 2025, Slender-Mamba: Fully Quantized Mamba in 1.58 Bits From Head to Toe, Proceedings of the 31st International Conference on Computational Linguistics, pages 4715–4724, January 19–24, 2025, Association for Computational Linguistics, https://aclanthology.org/2025.coling-main.316.pdf
Zukang Xu, Yuxuan Yue, Xing Hu, Zhihang Yuan, Zixu Jiang, Zhixuan Chen, Jiangyong Yu, Chen Xu, Sifan Zhou, Dawei Yang, 23 Jan 2025, MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods, https://arxiv.org/abs/2501.13484
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
Jiyong Kim, Jaeho Lee, Jiahao Lin, Alish Kanani, Miao Sun, Umit Y. Ogras, and Jaehyun Park, 14 Aug 2025, eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing, https://arxiv.org/abs/2508.10370
Farnoush Bayatmakou, Reza Taleei, Nicole Simone, Arash Mohammadi, 23 Jul 2025, Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography, https://arxiv.org/abs/2507.17662
Hanwen Liu, Yifeng Gong, Zuwei Yan, Zeheng Zhuang, Jiaxuan Lu, 21 Jul 2025, MSGM: A Multi-Scale Spatiotemporal Graph Mamba for EEG Emotion Recognition, https://arxiv.org/abs/2507.15914
Osama Hardan, Omar Elshenhabi, Tamer Khattab, Mohamed Mabrok, 15 Jul 2025, Flatten Wisely: How Patch Order Shapes Mamba-Powered Vision for MRI Segmentation, https://arxiv.org/abs/2507.13384
Andrew H. Zhang, Alex He-Mo, Richard Fei Yin, Chunlin Li, Yuzhi Tang, Dharmendra Gurve, Veronique van der Horst, Aron S. Buchman, Nasim Montazeri Ghahjaverestan, Maged Goubran, Bo Wang, Andrew S. P. Lim, 9 Aug 2025, Mamba-based Deep Learning Approach for Sleep Staging on a Wireless Multimodal Wearable System without Electroencephalography, https://arxiv.org/abs/2412.15947
Ze Rong, ZiYue Zhao, Zhaoxin Wang, Lei Ma, 26 Jul 2025, FaRMamba: Frequency-based learning and Reconstruction aided Mamba for Medical Segmentation, https://arxiv.org/abs/2507.20056
Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, and Ying-Cong Chen, 26 Jul 2025, MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders, https://arxiv.org/abs/2408.15101
Yizhuo Wu, Francesco Fioranelli, Chang Gao, 27 Jul 2025, RadMamba: Efficient Human Activity Recognition through Radar-based Micro-Doppler-Oriented Mamba State-Space Model, https://arxiv.org/abs/2504.12039
Aotao Wang, Haikuo Shao, Shaobo Ma, Zhongfeng Wang, 28 Jul 2025, FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization, https://arxiv.org/abs/2505.18975
Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mengfei Shi, Xia Xie, Shengyong Chen, 31 Jul 2025, LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks, https://arxiv.org/abs/2507.22477
Alice Zhang, Chao Li, 29 Jul 2025, Adaptive State-Space Mamba for Real-Time Sensor Data Anomaly Detection, https://arxiv.org/abs/2503.22743
Jiaxuan Lu, Yuhui Lin, Junyan Shi, Fang Yan, Dongzhan Zhou, Yue Gao, Xiaosong Wang, 4 Aug 2025, Hypergraph Mamba for Efficient Whole Slide Image Understanding, https://arxiv.org/abs/2505.17457
Meng Zhou, Farzad Khalvati, 5 Aug 2025, ClinicalFMamba: Advancing Clinical Assessment using Mamba-based Multimodal Neuroimaging Fusion, https://arxiv.org/abs/2508.03008
Siyi Lu, Run Liu, Dongsheng Yang, Lei He, 8 Aug 2025, ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception, https://arxiv.org/abs/2508.06074
Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Zhiying Li, Guanggang Geng, Jian Weng, 6 Aug 2025, MambaITD: An Efficient Cross-Modal Mamba Network for Insider Threat Detection, https://arxiv.org/abs/2508.05695
Zineddine Bettouche, Khalid Ali, Andreas Fischer, Andreas Kassler, 7 Aug 2025, HiSTM: Hierarchical Spatiotemporal Mamba for Cellular Traffic Forecasting, https://arxiv.org/abs/2508.09184
Xi Xuan, Zimo Zhu, Wenxin Zhang, Yi-Cheng Lin, Tomi Kinnunen, 12 Aug 2025, Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative, https://arxiv.org/abs/2508.09294
Honggang Jia, Nan Cheng, Xiucheng Wang, Conghao Zhou, Ruijin Sun, Xuemin (Sherman) Shen, 28 Jul 2025, RadioMamba: Breaking the Accuracy-Efficiency Trade-off in Radio Map Construction via a Hybrid Mamba-UNet, https://arxiv.org/abs/2508.09140
Haolong Chen, Liang Zhang, Zhengyuan Xin, Guangxu Zhu, 17 Aug 2025, STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction, https://arxiv.org/abs/2508.12247
Jun Zeng, Yannan Huang, Elif Keles, Halil Ertugrul Aktas, Gorkem Durak, Nikhil Kumar Tomar, Quoc-Huy Trinh, Deepak Ranjan Nayak, Ulas Bagci, Debesh Jha, 17 Aug 2025, SRMA-Mamba: Spatial Reverse Mamba Attention Network for Pathological Liver Segmentation in MRI Volumes, https://arxiv.org/abs/2508.12410
NVIDIA: Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adi Renduchintala, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan, Ashton Sharabiani, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Banghua Zhu, Barnaby Simkin, Bilal Kartal, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Brian Yu, Bryan Catanzaro, Charles Wang, Charlie Truong, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christian Munley, Christopher Parisien, Dan Su, Daniel Afrimi, Daniel Korzekwa, Daniel Rohrer, Daria Gitman, et al. (161 additional authors not shown), 20 Aug 2025, NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model, https://arxiv.org/abs/2508.14444
Trinayan Baruah, Kaustubh Shivdikar, Sara Prescott, and David Kaeli, 25 Aug 2025, Characterizing the Behavior of Training Mamba-based State Space Models on GPUs, https://arxiv.org/abs/2508.17679
Cong Ma, Kayvan Najarian, 4 Sep 2025, Rethinking the long-range dependency in Mamba/SSM and transformer models, https://arxiv.org/abs/2509.04226
Mustafa Munir, Alex Zhang, Radu Marculescu, 4 Sep 2025, VCMamba: Bridging Convolutions with Multi-Directional Mamba for Efficient Visual Representation, https://arxiv.org/abs/2509.04669
Haosong Liu, Xiancheng Zhu, Huanqiang Zeng, Jianqing Zhu, Jiuwen Cao, and Junhui Hou, 5 Sep 2025, Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution, https://arxiv.org/abs/2509.04824
Zichuan Yang and Yongzhi Wang, 26 Aug 2025, EVM-Fusion: An Explainable Vision Mamba Architecture with Neural Algorithmic Fusion, https://arxiv.org/abs/2505.17367
John T. Halloran, Manbir Gulati, Paul F. Roysdon, 29 Aug 2025, Mamba State-Space Models Are Lyapunov-Stable Learners, https://arxiv.org/abs/2406.00209
Anuraj Maurya, 29 Aug 2025, Scaling Legal AI: Benchmarking Mamba and Transformers for Statutory Classification and Case Law Retrieval, https://arxiv.org/abs/2509.00141
Chengyuan Ma, Peng Jia, Hongyue Guo, and Wenming Yang, 2 Sep 2025, ESTM: An Enhanced Dual-Branch Spectral-Temporal Mamba for Anomalous Sound Detection, https://arxiv.org/abs/2509.02471
Saarang Panchavati, Corey Arnold, William Speier, 2 Sep 2025, Mentality: A Mamba-based Approach towards Foundation Models for EEG, https://arxiv.org/abs/2509.02746
Huaicheng Zhang, Ruoxin Wang, Chenlian Zhou, Jiguang Shi, Yue Ge, Zhoutong Li, Sheng Chang, Hao Wang, Jin He and Qijun Huang, 3 Sep 2025, S2M2ECG: Spatio-temporal bi-directional State Space Model Enabled Multi-branch Mamba for ECG, https://arxiv.org/abs/2509.03066
Hongjun Xu, Junxi Xia, Weisi Yang, Yueyuan Sui, Stephen Xia, 5 Sep 2025, MambaLite-Micro: Memory-Optimized Mamba Inference on MCUs, https://arxiv.org/abs/2509.05488
Yinuo Wang and Gavin Tao, 16 Aug 2025, LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba, https://arxiv.org/abs/2508.11849
Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao, 6 Sep 2025, M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models, https://arxiv.org/abs/2504.10449
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma, 5 Sep 2025, Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments, https://arxiv.org/abs/2505.08299
NVIDIA: Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo, Chengyu Dong, Christine Harvey, Christopher Parisien, Dan Su, Daniel Korzekwa, Danny Yin, Daria Gitman, David Mosallanezhad, Deepak Narayanan, Denys Fridman, Dima Rekesh, Ding Ma, Dmytro Pykhtar, Dong Ahn, Duncan Riach, Dusan Stosic, Eileen Long, Elad Segal, Ellie Evans, Eric Chung, Erick Galinkin, Evelina Bakhturina, Ewa Dobrowolska, Fei Jia, Fuxiao Liu, Gargi Prasad, Gerald Shen, Guilin Liu, et al. (148 additional authors not shown), 5 Sep 2025, Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models, https://arxiv.org/abs/2504.03624
Shucong Li and Zhenyu Liu and Zijie Hong and Zhiheng Zhou and Xianghai Cao, 9 Sep 2025, DEPF: A UAV Multispectral Object Detector with Dual-Domain Enhancement and Priority-Guided Mamba Fusion, https://arxiv.org/abs/2509.07327
Diego Fajardo-Rojas, Levente Baljer, Jordina Aviles Verdera, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter, 8 Sep 2025, PUUMA (Placental patch and whole-Uterus dual-branch U-Mamba-based Architecture): Functional MRI Prediction of Gestational Age at Birth and Preterm Risk, https://arxiv.org/abs/2509.07042
Zhifang Gong, Shuo Gao, Ben Zhao, Yingjing Xu, Yijun Yang, Shenghong Ju, Guangquan Zhou, 16 Sep 2025, CECT-Mamba: a Hierarchical Contrast-enhanced-aware Model for Pancreatic Tumor Subtyping from Multi-phase CECT, https://arxiv.org/abs/2509.12777
Shriyank Somvanshi and Pavan Hebli and Gaurab Chhetri and Subasish Das, 14 Sep 2025, Tabular Data with Class Imbalance: Predicting Electric Vehicle Crash Severity with Pretrained Transformers (TabPFN) and Mamba-Based Models, https://arxiv.org/abs/2509.11449

Knowledge Graph AI Architectures

Knowledge graphs represent structured information as a graph, usually a Directed Acyclic Graph (DAG). This additional structural information can improve LLM results, but it is not easy to integrate graph-structured data into the sequential text sequences expected by an LLM. One particular usage of Knowledge Graphs is to extend RAG architectures, called a "RAG Graph" architecture.

Research papers on Knowledge Graphs in AI include:

Shenzhe Zhu, 6 May 2024, Exploring knowledge graph-based neural-symbolic system from application perspective, https://arxiv.org/abs/2405.03524 (Integrate knowledge graph and symbolic reasoning into neural networks.)
GG Klager, March 12, 2024, Is GPT fit for KGQA? Masters Thesis, Department of Information Systems & Operations Management, Vienna University of Economics and Business, https://aic.ai.wu.ac.at/~polleres/supervised_theses/Gerhard_Klager_MSc_2024.pdf
Louis-François Bouchard, Aug 12, 2024, When to Use GraphRAG, https://louisbouchard.substack.com/p/when-to-use-graphrag
Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, Dhagash Mehta, 9 Aug 2024, HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction, https://arxiv.org/abs/2408.04948
Dr. Ashish Bamania, Aug 2024, ‘MedGraphRAG’ Is A Complete Game Changer For AI In Medicine A deep-dive into how RAG, GraphRAG, and MedGraphRAG work and how they significantly improve the performance of LLM responses in Medicine, https://levelup.gitconnected.com/medgraphrag-is-a-complete-game-changer-for-ai-in-medicine-c6b41b0effd6
Junde Wu, Jiayuan Zhu, Yunli Qi, 8 Aug 2024, Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2408.04187 Code: https://github.com/MedicineToken/Medical-Graph-RAG/tree/main
Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao, 26 May 2024, GRAG: Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2405.16506
Philip Rathle, Jul 11, 2024, The GraphRAG Manifesto: Adding Knowledge to GenAI, https://neo4j.com/blog/graphrag-manifesto/
Microsoft, Aug 2024 (accessed), GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system, https://github.com/microsoft/graphrag
Harry Li, Gabriel Appleby, Ashley Suh, 7 Jun 2024, LinkQ: An LLM-Assisted Visual Interface for Knowledge Graph Question-Answering, https://arxiv.org/abs/2406.06621
Xuan Chen, Tong Lu, Zhichun Wang, 6 Dec 2024, LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs, https://arxiv.org/abs/2412.04690
Lei Liang, Mengshu Sun, Zhengke Gui, Zhongshu Zhu, Zhouyu Jiang, Ling Zhong, Yuan Qu, Peilong Zhao, Zhongpu Bo, Jin Yang, Huaidong Xiong, Lin Yuan, Jun Xu, Zaoyang Wang, Zhiqiang Zhang, Wen Zhang, Huajun Chen, Wenguang Chen, Jun Zhou, 26 Sep 2024 (v3), KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, https://arxiv.org/abs/2409.13731
Mayi Xu, Yunfeng Ning, Yongqi Li, Jianhao Chen, Jintao Wen, Yao Xiao, Shen Zhou, Birong Pan, Zepeng Bao, Xin Miao, Hankun Kang, Ke Sun, Tieyun Qian, 2 Jan 2025, Reasoning based on symbolic and parametric knowledge bases: a survey, https://arxiv.org/abs/2501.01030 (Extensive survey of reasoning from CoT to knowledge graphs to table-based reasoning.)
Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
Aidan Hogan, Xin Luna Dong, Denny Vrandečić, Gerhard Weikum, 12 Jan 2025, Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions, https://arxiv.org/abs/2501.06699 (Classic search engines versus LLMs with knowledge graphs with a categorization of search use cases.)
Tiesunlong Shen, Jin Wang1, Xuejie Zhang, Erik Cambria, Jan 2025, Reasoning with Trees: Faithful Question Answering over Knowledge Graph, Proceedings of the 31st International Conference on Computational Linguistics, pages 3138–3157 January 19–24, 2025, Association for Computational Linguistics, https://aclanthology.org/2025.coling-main.211.pdf
Yuxing Lu, Sin Yee Goi, Xukai Zhao, Jinzhuo Wang, 22 Jan 2025 (v2), Biomedical Knowledge Graph: A Survey of Domains, Tasks, and Real-World Applications, https://arxiv.org/abs/2501.11632
Maria Korolov, 29 Jan 2025, Knowledge graphs: the missing link in enterprise AI, CIO, https://www.cio.com/article/3808569/knowledge-graphs-the-missing-link-in-enterprise-ai.html
Junde Wu, Jiayuan Zhu, Yuyuan Liu, 7 Feb 2025, Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research, https://arxiv.org/abs/2502.04644 https://github.com/theworldofagents/Agentic-Reasoning
Pengcheng Huang, Zhenghao Liu, Yukun Yan, Xiaoyuan Yi, Hao Chen, Zhiyuan Liu, Maosong Sun, Tong Xiao, Ge Yu, Chenyan Xiong, 21 Feb 2025, PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning, https://arxiv.org/abs/2502.15543
Han Zhang, Langshi Zhou, Hanfang Yang, 20 Feb 2025, Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection, https://arxiv.org/abs/2502.14932
Anastasios Nentidis, Charilaos Akasiadis, Angelos Charalambidis, Alexander Artikis, 26 Feb 2025, Dealing with Inconsistency for Reasoning over Knowledge Graphs: A Survey, https://arxiv.org/abs/2502.19023
R Chen, Mar 2025, Retrieval-Augmented Generation with Knowledge Graphs: A Survey Computer Science Undergradaute Conference 2025, https://openreview.net/pdf?id=ZikTuGY28C
Khorashadizadeh Hanieh, Amara Fatima Zahra, Ezzabady Morteza, Ieng Frédéric, Tiwari Sanju, et al.. Research Trends for the Interplay between Large Language Models and Knowledge Graphs. 1st International Workshop on Data Management Opportunities in Unifying Large Language Models + Knowledge Graph. Workshop at the 50th International Conference on Very Large Data Bases (VLDB 2024), Aug 2024, Guangzhou, China. hal-04770598 https://hal.science/hal-04770598/document
Ziheng Zhang, Zhenxi Lin, Yefeng Zheng, and Xian Wu. 2025. How much Medical Knowledge do LLMs have? An Evaluation of Medical Knowledge Coverage for LLMs. In Proceedings of the ACM on Web Conference 2025 (WWW '25). Association for Computing Machinery, New York, NY, USA, 5330–5341. https://doi.org/10.1145/3696410.3714535 https://dl.acm.org/doi/abs/10.1145/3696410.3714535 https://dl.acm.org/doi/pdf/10.1145/3696410.3714535
Chuzhan Hao, Wenfeng Feng, Yuewei Zhang, Hao Wang, 23 Jul 2025, DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning, https://arxiv.org/abs/2507.17365
Qikai Wei and Huansheng Ning and Chunlong Han and Jianguo Ding, 7 Jul 2025, A Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval-Augmented Generation in Large Language Models, https://arxiv.org/abs/2507.16826
Mingda Zhang, Na Zhao, Jianglong Qin, Guoyu Ye, Ruixiang Tang, 22 Jul 2025, A Multi-granularity Concept Sparse Activation and Hierarchical Knowledge Graph Fusion Framework for Rare Disease Diagnosis, https://arxiv.org/abs/2507.08529
Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Ding Wang, Botian Shi, 24 Jul 2025, Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning, https://arxiv.org/abs/2503.12972
Bhishma Dedhia, Yuval Kansal, Niraj K. Jha, 18 Jul 2025, Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need, https://arxiv.org/abs/2507.13966
Arief Purnama Muharram and Ayu Purwarianti, 21 Jul 2025, Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language, https://arxiv.org/abs/2409.00061
Xueli Pan, Victor de Boer, Jacco van Ossenbruggen, 14 Aug 2025, FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs, https://arxiv.org/abs/2508.10467
Rishi Parekh, Saisubramaniam Gopalakrishnan, Zishan Ahmad, Anirudh Deodhar, 23 Jul 2025, Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance, https://arxiv.org/abs/2507.17273
Aleksandr Perevalov, Andreas Both, 22 Jul 2025, Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning, https://arxiv.org/abs/2507.16971
Haoran Jiang, Shaohan Shi, Yunjie Yao, Chang Jiang, Quan Li, 23 Jul 2025, HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery, https://arxiv.org/abs/2507.17209
Jianhao Chen, Junyang Ren, Wentao Ding, Haoyuan Ouyang, Wei Hu, Yuzhong Qu, 23 Jul 2025, Conflict Detection for Temporal Knowledge Graphs:A Fast Constraint Mining Algorithm and New Benchmarks, https://arxiv.org/abs/2312.11053
Adrian Kaiser and Claudiu Leoveanu-Condrei and Ryan Gold and Marius-Constantin Dinu and Markus Hofmarcher, 23 Jul 2025, HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs, https://arxiv.org/abs/2507.15917
Jean Lelong, Adnane Errazine and Annabelle Blangero, 22 Jul 2025, Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications, https://arxiv.org/abs/2507.16507
Mingda Zhang, Na Zhao, Jianglong Qing, Qing xu, Kaiwen Pan, Ting luo, 22 Jul 2025, An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis, https://arxiv.org/abs/2507.07893
Yuxin Zhang (1), Xi Wang (1), Mo Hu (1), Zhenyu Zhang (1) ((1) Department of Construction Science, College of Architecture, Texas A&M University, College Station, USA), 18 Jul 2025, BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety, https://arxiv.org/abs/2507.13625
Nur A Zarin Nishat, Andrea Coletta, Luigi Bellomarini, Kossi Amouzouvi, Jens Lehmann, Sahar Vahdati, 17 Jul 2025, Aligning Knowledge Graphs and Language Models for Factual Accuracy, https://arxiv.org/abs/2507.13411
Hosein Azarbonyad, Zi Long Zhu, Georgios Cheirmpos, Zubair Afzal, Vikrant Yadav, Georgios Tsatsaronis, 18 Jul 2025, Question-Answer Extraction from Scientific Articles Using Knowledge Graphs and Large Language Models, https://arxiv.org/abs/2507.13827
Akash Bajwa and Chia Jeng Yang, May 27, 2024, The RAG Stack: Featuring Knowledge Graphs: Reducing Hallucinations To Make LLMs Production-Grade With Complex RAG, https://akashbajwa.substack.com/p/the-rag-stack-featuring-knowledge
Igor Novikov, Jul 23, 2024, RAG Architecture: Advanced RAG, https://pub.towardsai.net/rag-architecture-advanced-rag-3fea83e0d189
Junda Wu, Xintong Li, Ruoyu Wang, Yu Xia, Yuxin Xiong, Jianing Wang, Tong Yu, Xiang Chen, Branislav Kveton, Lina Yao, Jingbo Shang, Julian McAuley, 31 Oct 2024, OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models, https://arxiv.org/abs/2410.23703
Jaikrishna Manojkumar Patil, Nathaniel Lee, Al Mehdi Saadat Chowdhury, YooJung Choi, Paulo Shakarian, 8 Aug 2025, Probabilistic Circuits for Knowledge Graph Completion with Reduced Rule Sets, https://arxiv.org/abs/2508.06706
Yongkang Xiao, Rui Zhang, 8 Aug 2025, HERGC: Heterogeneous Experts Representation and Generative Completion for Multimodal Knowledge Graphs, https://arxiv.org/abs/2506.00826
Yuzhang Xie, Xu Han, Ran Xu, Xiao Hu, Jiaying Lu, Carl Yang, 26 Jul 2025, HypKG: Hypergraph-based Knowledge Graph Contextualization for Precision Healthcare, https://arxiv.org/abs/2507.19726
Alec Scully, Cameron Stockton, and Forrest Hare, 26 Jul 2025, Integrating Activity Predictions in Knowledge Graphs, https://arxiv.org/abs/2507.19733
Keyan Ding, Jing Yu, Junjie Huang, Yuchen Yang, Qiang Zhang, Huajun Chen, 27 Jul 2025, SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration, https://arxiv.org/abs/2507.20280
Jiajun Liu, Wenjun Ke, Peng Wang, Yao He, Ziyu Shang, Guozheng Li, Zijie Xu, and Ke Ji, 28 Jul 2025, Unlearning of Knowledge Graph Embedding via Preference Optimization, https://arxiv.org/abs/2507.20566
Lijian Li, 28 Jul 2025, Complementarity-driven Representation Learning for Multi-modal Knowledge Graph Completion, https://arxiv.org/abs/2507.20620
Xueyao Wan, Hang Yu, 28 Jul 2025, MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs, https://arxiv.org/abs/2507.20804
Enjun Du, Siyi Liu, Yongqi Zhang, 28 Jul 2025, Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning, https://arxiv.org/abs/2507.20498
Wenbin Guo, Xin Wang, Jiaoyan Chen, Zhao Li and Zirui Chen, 28 Jul 2025, Ontology-Enhanced Knowledge Graph Completion using Large Language Models, https://arxiv.org/abs/2507.20643
Muhammad Tayyab Khan, Lequn Chen, Wenhe Feng and Seung Ki Moon, 28 Jul 2025, Large Language Model Powered Decision Support for a Metal Additive Manufacturing Knowledge Graph, https://arxiv.org/abs/2505.20308
Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu and Huadong Ma, 29 Jul 2025, SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation, https://arxiv.org/abs/2507.21585
Alessandro Lonardi and Samy Badreddine and Tarek R. Besold and Pablo Sanchez Martin, 29 Jul 2025, Unifying Post-hoc Explanations of Knowledge Graph Completions, https://arxiv.org/abs/2507.22951
Nasim Shirvani-Mahdavi, Devin Wingfield, Amin Ghasemi, Chengkai Li, 31 Jul 2025, Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs, https://arxiv.org/abs/2507.23740
Jiaxin Bai, Wei Fan, Qi Hu, Qing Zong, Chunyang Li, Hong Ting Tsang, Hongyu Luo, Yauwai Yim, Haoyu Huang, Xiao Zhou, Feng Qin, Tianshi Zheng, Xi Peng, Xin Yao, Huiwen Yang, Leijie Wu, Yi Ji, Gong Zhang, Renhai Chen, Yangqiu Song, 31 Jul 2025, AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora, https://arxiv.org/abs/2505.23628
Tung-Wei Lin, Gabe Fierro, Han Li, Tianzhen Hong, Pierluigi Nuzzo, Alberto Sangiovanni-Vinentelli, 30 Jul 2025, Systematic Evaluation of Knowledge Graph Repair with Large Language Models, https://arxiv.org/abs/2507.22419
Thanh Hoang-Minh, 30 Jul 2025, Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs, https://arxiv.org/abs/2507.03947
Antonis Klironomos, Baifan Zhou, Zhipeng Tan, Zhuoxun Zheng, Mohamed H. Gad-Elrab, Heiko Paulheim, Evgeny Kharlamov, 1 Aug 2025, ExeKGLib: A Platform for Machine Learning Analytics based on Knowledge Graphs, https://arxiv.org/abs/2508.00394
Yuanyuan Liang, Xiaoman Wang, Tingyu Xie, and Lei Pan, 3 Aug 2025, ProKG-Dial: Progressive Multi-Turn Dialogue Construction with Domain Knowledge Graphs, https://arxiv.org/abs/2508.01869
Hanchen Yang, Jiaqi Wang, Jiannong Cao, Wengen Li, Jialun Zheng, Yangning Li, Chunyu Miao, Jihong Guan, Shuigeng Zhou, and Philip S. Yu, 31 Jul 2025, OKG-LLM: Aligning Ocean Knowledge Graph with Observation Data via LLMs for Global Sea Surface Temperature Prediction, https://arxiv.org/abs/2508.00933
Xiang Li, Penglei Sun, Wanyun Zhou, Zikai Wei, Yongqi Zhang, Xiaowen Chu, 1 Aug 2025, FinKario: Event-Enhanced Automated Construction of Financial Knowledge Graph, https://arxiv.org/abs/2508.00961
Wei Zhou, Peng Sun, Xuanhe Zhou, Qianglei Zang, Ji Xu, Tieying Zhang, Guoliang Li, Fan Wu, 2 Aug 2025, DBAIOps: A Reasoning LLM-Enhanced Database Operation and Maintenance System using Knowledge Graphs, https://arxiv.org/abs/2508.01136
Yang Zhao, Chengxiao Dai, Wei Zhuo, Tan Chuan Fu, Yue Xiu, Dusit Niyato, Jonathan Z. Low, Eugene Ho Hong Zhuang, Daren Zong Loong Tan, 3 Aug 2025, AGENTICT$^2$S:Robust Text-to-SPARQL via Agentic Collaborative Reasoning over Heterogeneous Knowledge Graphs for the Circular Economy, https://arxiv.org/abs/2508.01815
Linyu Li, Zhi Jin, Yuanpeng He, Dongming Jin, Yichi Zhang, Haoran Duan, Nyima Tash, 4 Aug 2025, Learning to Evolve: Bayesian-Guided Continual Knowledge Graph Embedding, https://arxiv.org/abs/2508.02426
Xinjie Zhao, Moritz Blum, Fan Gao, Yingjian Chen, Boming Yang, Luis Marquez-Carpintero, M\'onica Pina-Navarro, Yanran Fu, So Morikawa, Yusuke Iwasawa, Yutaka Matsuo, Chanjun Park, Irene Li, 5 Aug 2025, AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots, https://arxiv.org/abs/2508.02999
Taine J. Elliott, Stephen P. Levitt, Ken Nixon and Martin Bekker, 5 Aug 2025, Data Overdose? Time for a Quadruple Shot: Knowledge Graph Construction using Enhanced Triple Extraction, https://arxiv.org/abs/2508.03438
Yubo Wang, Shimin Di, Zhili Wang, Haoyang Li, Fei Teng, Hao Xin and Lei Chen, 5 Aug 2025, Understanding the Embedding Models on Hyper-relational Knowledge Graph, https://arxiv.org/abs/2508.03280
Ge Shi, Kaiyu Huang, Guochen Feng, 5 Aug 2025, Long Story Generation via Knowledge Graph and Literary Theory, https://arxiv.org/abs/2508.03137
Futian Wang, Yuhan Qiao, Xiao Wang, Fuling Wang, Yuxiang Zhang, Dengdi Sun, 5 Aug 2025, R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation, https://arxiv.org/abs/2508.03426
Nandana Mihindukulasooriya, Niharika S. D'Souza, Faisal Chowdhury, Horst Samulowitz, 4 Aug 2025, Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study, https://arxiv.org/abs/2506.19773
Hudson de Martim, 5 Aug 2025, A Foundational Schema.org Mapping for a Legal Knowledge Graph: Representing Brazilian Legal Norms as FRBR Works, https://arxiv.org/abs/2508.00827
Ruochen Zhao, Simone Conia, Eric Peng, Min Li, Saloni Potdar, 6 Aug 2025, AgREE: Agentic Reasoning for Knowledge Graph Completion on Emerging Entities, https://arxiv.org/abs/2508.04118
Qian Yong, Yanhui Li, Jialiang Shi, Yaguang Dou, Tian Qi, 6 Aug 2025, Enhancing Serendipity Recommendation System by Constructing Dynamic User Knowledge Graphs with Large Language Models, https://arxiv.org/abs/2508.04032
Krzysztof Olejniczak, Xingyue Huang, Mikhail Galkin, \.Ismail \.Ilkan Ceylan, 5 Aug 2025, One Model, Any Conjunctive Query: Graph Neural Networks for Answering Queries over Incomplete Knowledge Graphs, https://arxiv.org/abs/2409.13959
Ge Chang, Jinbo Su, Jiacheng Liu, Pengfei Yang, Yuhao Shang, Huiwen Zheng, Hongli Ma, Yan Liang, Yuanchun Li, Yunxin Liu, 7 Aug 2025, GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning, https://arxiv.org/abs/2508.05498
Claudia d'Amato, Ivan Diliso, Nicola Fanizzi, Zafar Saeed, 7 Aug 2025, Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models, https://arxiv.org/abs/2508.05587
Xu Yuan, Liangbo Ning, Wenqi Fan, Qing Li, 7 Aug 2025, mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering, https://arxiv.org/abs/2508.05318
Claudia dAmato, Giuseppe Rubini, Francesco Didio, Donato Francioso, Fatima Zahra Amara, Nicola Fanizzi, 8 Aug 2025, Automated Creation of the Legal Knowledge Graph Addressing Legislation on Violence Against Women: Resource, Methodology and Lessons Learned, https://arxiv.org/abs/2508.06368
Siamak Farshidi and Amir Saberhabibi and Behbod Eskafi and Niloofar Nikfarjam and Sadegh Eskandari and Slinger Jansen and Michel Chaudron and Bedir Tekinerdogan, 6 Aug 2025, Empirical Evaluation of AI-Assisted Software Package Selection: A Knowledge Graph Approach, https://arxiv.org/abs/2508.05693
Congmin Min, Rhea Mathew, Joyce Pan, Sahil Bansal, Abbas Keshavarzi, Amar Viswanathan Kannan, 7 Aug 2025, Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems, https://arxiv.org/abs/2507.03226
Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab, 11 Aug 2025, What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge, https://arxiv.org/abs/2508.08344
Roberto Barile, Claudia d'Amato, Nicola Fanizzi, 12 Aug 2025, GRainsaCK: a Comprehensive Software Library for Benchmarking Explanations of Link Prediction Tasks on Knowledge Graphs, https://arxiv.org/abs/2508.08815
Bhavik Agarwal, Hemant Sunil Jomraj, Simone Kaplunov, Jack Krolick, Viktoria Rojkova, 13 Aug 2025, RAGulating Compliance: A Multi-Agent Knowledge Graph for Regulatory QA, https://arxiv.org/abs/2508.09893
Yuheng Wang, Tianze Yu, Jiayue Cai, Sunil Kalia, Harvey Lui, Z. Jane Wang, Tim K. Lee, 13 Aug 2025, Integrating Clinical Knowledge Graphs and Gradient-Based Neural Systems for Enhanced Melanoma Diagnosis via the 7-Point Checklist, https://arxiv.org/abs/2407.16822
Yifei Li, Lingling Zhang, Hang Yan, Tianzhe Zhao, Zihan Ma, Muye Huang, Jun Liu, 15 Aug 2025, SAGE: Scale-Aware Gradual Evolution for Continual Knowledge Graph Embedding, https://arxiv.org/abs/2508.11347
Nasim Shirvani-Mahdavi, Chengkai Li, 14 Aug 2025, Rule2Text: A Framework for Generating and Evaluating Natural Language Explanations of Knowledge Graph Rules, https://arxiv.org/abs/2508.10971
Duzhen Zhang, Zixiao Wang, Zhong-Zhi Li, Yahan Yu, Shuncheng Jia, Jiahua Dong, Haotian Xu, Xing Wu, Yingying Zhang, Tielin Zhang, Jie Yang, Xiuying Chen, Le Song, 17 Aug 2025, MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph, https://arxiv.org/abs/2508.12393
Ziteng Hu, Yingjie Xia, Xiyuan Chen, Li Kuang, 18 Aug 2025, SecFSM: Knowledge Graph-Guided Verilog Code Generation for Secure Finite State Machines in Systems-on-Chip, https://arxiv.org/abs/2508.12910
Hung Nghiep Tran, Atsuhiro Takasu, 15 Aug 2025, Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space, https://arxiv.org/abs/1909.08191
Daniel Daza, Alberto Bernardi, Luca Costabello, Christophe Gueret, Masoud Mansoury, Michael Cochez, Martijn Schut, 19 Aug 2025, Interactive Query Answering on Knowledge Graphs with Soft Entity Constraints, https://arxiv.org/abs/2508.13663
Mariam Arustashvili, J\"org Deigm\"oller, Heiko Paulheim, 19 Aug 2025, Knowledge Graph Completion for Action Prediction on Situational Graphs -- A Case Study on Household Tasks, https://arxiv.org/abs/2508.13675
Yang Xiao, Ruimeng Ye, Bohan Liu, Xiaolong Ma, Bo Hui, 19 Aug 2025, Efficient Knowledge Graph Unlearning with Zeroth-order Information, https://arxiv.org/abs/2508.14013
Peilin Ji, Xiao Xue, Simeng Wang, Wenhao Yan, 20 Aug 2025, Entropy-Constrained Strategy Optimization in Urban Floods: A Multi-Agent Framework with LLM and Knowledge Graph Integration, https://arxiv.org/abs/2508.14654
Dennis Schiese, Aleksandr Perevalov, Andreas Both, 20 Aug 2025, Towards LLM-generated explanations for Component-based Knowledge Graph Question Answering Systems, https://arxiv.org/abs/2508.14553
Haji Gul, Abul Ghani Naim, Ajaz Ahmad Bhat, 21 Aug 2025, Evaluating Knowledge Graph Complexity via Semantic, Spectral, and Structural Metrics for Link Prediction, https://arxiv.org/abs/2508.15291
Runxuan Liu, Bei Luo, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, Bing Qin, 21 Aug 2025, Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering, https://arxiv.org/abs/2502.11491
Nan Wang, Yongqi Fan, yansha zhu, ZongYu Wang, Xuezhi Cao, Xinyan He, Haiyun Jiang, Tong Ruan, Jingping Liu, 12 Aug 2025, KG-o1: Enhancing Multi-hop Question Answering in Large Language Models via Knowledge Graph Integration, https://arxiv.org/abs/2508.15790
Ryoma Kondo, Riona Matsuoka, Takahiro Yoshida, Kazuyuki Yamasawa, Ryohei Hisano, 24 Aug 2025, Capturing Legal Reasoning Paths from Facts to Law in Court Judgments using Knowledge Graphs, https://arxiv.org/abs/2508.17340
Yitong Lin, Jiaying He, Jiahe Chen, Xinnan Zhu, Jianwei Zheng, Tao Bo, 22 Jul 2025, BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning, https://arxiv.org/abs/2507.14468
Jialiang Wang, Hanmo Liu, Shimin Di, Zhili Wang, Jiachuan Wang, Lei Chen, Xiaofang Zhou, 21 Jul 2025, Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models, https://arxiv.org/abs/2408.06717
Mubaris Nadeem, Johannes Zenkert, Lisa Bender, Christian Weber, Madjid Fathi, 11 Aug 2025, KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations, https://arxiv.org/abs/2508.07834
Yaoze Zhang, Rong Wu, Pinlong Cai, Xiaoman Wang, Guohang Yan, Song Mao, Ding Wang, Botian Shi, 14 Aug 2025, LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval, https://arxiv.org/abs/2508.10391
Robert Frenken, Sidra Ghayour Bhatti, Hanqin Zhang, Qadeer Ahmed, 25 Jul 2025, KD-GAT: Combining Knowledge Distillation and Graph Attention Transformer for a Controller Area Network Intrusion Detection System, https://arxiv.org/abs/2507.19686
Zhaoyan Wang, Hyunjun Ahn, In-Young Ko, 28 Jul 2025, Beyond Interactions: Node-Level Graph Generation for Knowledge-Free Augmentation in Recommender Systems, https://arxiv.org/abs/2507.20578
Zhen Wu, Ritam Dutt, Luke M. Breitfeller, Armineh Nourbakhsh, Siddharth Parekh, Carolyn Ros\'e, 2 Aug 2025, $R^2$-CoD: Understanding Text-Graph Complementarity in Relational Reasoning via Knowledge Co-Distillation, https://arxiv.org/abs/2508.01475
Jiayi Wen, Tianxin Chen, Zhirun Zheng, Cheng Huang, 6 Aug 2025, A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models, https://arxiv.org/abs/2508.04276
Zhu Xu, Ting Lei, Zhimin Li, Guan Wang, Qingchao Chen, Yuxin Peng, Yang liu, 7 Aug 2025, TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring, https://arxiv.org/abs/2508.04943
Daniel Airinei, Elena Burceanu, Marius Leordeanu, 15 Aug 2025, Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation, https://arxiv.org/abs/2508.11446
Bowen Wang, Zhouqiang Jiang, Yasuaki Susumu, Shotaro Miwa, Tianwei Chen, Yuta Nakashima, 25 Aug 2025, Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown, https://arxiv.org/abs/2506.17589
Zahra Zehtabi Sabeti Moghaddam, Zeinab Dehghani, Maneeha Rani, Koorosh Aslansefat, Bhupesh Kumar Mishra, Rameez Raja Kureshi, Dhavalkumar Thakker, 3 Sep 2025, Explainable Knowledge Graph Retrieval-Augmented Generation (KG-RAG) with KG-SMILE, https://arxiv.org/abs/2509.03626
Kishor Datta Gupta, Mohd Ariful Haque, Hasmot Ali, Marufa Kamal, Syed Bahauddin Alam, and Mohammad Ashiqur Rahman, 4 Sep 2025, Continuous Monitoring of Large-Scale Generative AI via Deterministic Knowledge Graph Structures, https://arxiv.org/abs/2509.03857
Shanglin Wu, Lihui Liu, Jinho D. Choi, Kai Shu, 31 Aug 2025, Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction, https://arxiv.org/abs/2509.03540
Zhaoyan Gong, Juan Li, Zhiqiang Liu, Lei Liang, Huajun Chen, Wen Zhang, 4 Sep 2025, RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models, https://arxiv.org/abs/2509.03995
Farnoosh Hashemi, Laks V.S. Lakshmanan, 4 Sep 2025, KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation, https://arxiv.org/abs/2509.04684
Zhangding Liu, Neda Mohammadi, and John E. Taylor, 5 Sep 2025, FloodVision: Urban Flood Depth Estimation Using Foundation Vision-Language Models and Domain Knowledge Graph, https://arxiv.org/abs/2509.04772
Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Zhiqi Shen, 5 Sep 2025, Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation, https://arxiv.org/abs/2408.05748
Nitin Nagesh Kulkarni, Bryson Wilcox, Max Sawa, Jason Thom, 25 Aug 2025, PKG-DPO: Optimizing Domain-Specific AI systems with Physics Knowledge Graphs and Direct Preference Optimization, https://arxiv.org/abs/2508.18391
Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai, Hao Wang, 26 Aug 2025, VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft, https://arxiv.org/abs/2508.18722
Rikuto Kotoge, Ziwei Yang, Zheng Chen, Yushun Dong, Yasuko Matsubara, Jimeng Sun, Yasushi Sakurai, 28 Aug 2025, ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation, https://arxiv.org/abs/2502.18026
Tingxuan Xu, Jiarui Feng, Justin Melendez, Kaleigh Roberts, Donghong Cai, Mingfang Zhu, Donald Elbert, Yixin Chen, Randall J. Bateman, 28 Aug 2025, Addressing accuracy and hallucination of LLMs in Alzheimer's disease research through knowledge graphs, https://arxiv.org/abs/2508.21238
Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Yuan He, Jiaoyan Chen, Steffen Staab, Evgeny Kharlamov, 29 Aug 2025, Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness, https://arxiv.org/abs/2504.05163
Brian Wang, Mani Srivastava, 30 Aug 2025, SIGMUS: Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces, https://arxiv.org/abs/2509.00287
Jiasheng Xu, Mingda Li, Yongqiang Tang, Peijie Wang, Wensheng Zhang, 1 Sep 2025, Towards Open-World Retrieval-Augmented Generation on Knowledge Graph: A Multi-Agent Collaboration Framework, https://arxiv.org/abs/2509.01238
Sergio Consoli, Pietro Coletti, Peter V. Markov, Lia Orfei, Indaco Biazzo, Lea Schuh, Nicolas Stefanovitch, Lorenzo Bertolini, Mario Ceresa, Nikolaos I. Stilianakis, 2 Sep 2025, An Epidemiological Knowledge Graph extracted from the World Health Organization's Disease Outbreak News, https://arxiv.org/abs/2509.02258
Susana Nunes, Samy Badreddine, Catia Pesquita, 2 Sep 2025, Rewarding Explainability in Drug Repurposing with Knowledge Graphs, https://arxiv.org/abs/2509.02276
Haimei Pan, Jiyun Zhang, Qinxi Wei, Xiongnan Jin, Chen Xinkai, Jie Cheng, 25 Aug 2025, Robotic Fire Risk Detection based on Dynamic Knowledge Graph Reasoning: An LLM-Driven Approach with Graph Chain-of-Thought, https://arxiv.org/abs/2509.00054
Yu Liu, Yanan Cao, Xixun Lin, Yanmin Shang, Shi Wang, Shirui Pan, 1 Sep 2025, Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-Tuning, https://arxiv.org/abs/2509.01166
Madan Krishnamurthy, Surya Saha, Pierrette Lo, Patricia L. Whetzel, Tursynay Issabekova, Jamed Ferreris Vargas, Jack DiGiovanna, Melissa A Haendel, 1 Sep 2025, Enabling Down Syndrome Research through a Knowledge Graph-Driven Analytical Framework, https://arxiv.org/abs/2509.01565
Zihao Li, Dongqi Fu, Mengting Ai, Jingrui He, 1 Sep 2025, APEX$^2$: Adaptive and Extreme Summarization for Personalized Knowledge Graphs, https://arxiv.org/abs/2412.17336
Siyuan Li, Ruitong Liu, Yan Wen, Te Sun, Andi Zhang, Yanbiao Ma, Xiaoshuai Hao, 30 Aug 2025, Flow-Modulated Scoring for Semantic-Aware Knowledge Graph Completion, https://arxiv.org/abs/2506.23137
Qurat Ul Ain and Mohamed Amine Chatti and Jean Qussa and Amr Shakhshir and Rawaa Alatrash and Shoeb Joarder, 5 Sep 2025, An Optimized Pipeline for Automatic Educational Knowledge Graph Construction, https://arxiv.org/abs/2509.05392
Rawaa Alatrash and Mohamed Amine Chatti and Nasha Wibowo and Qurat Ul Ain, 5 Sep 2025, Inferring Prerequisite Knowledge Concepts in Educational Knowledge Graphs: A Multi-criteria Approach, https://arxiv.org/abs/2509.05393
Mengxue Yang, Chun Yang, Jiaqi Zhu, Jiafan Li, Jingqi Zhang, Yuyang Li, Ying Li, 8 Sep 2025, SLiNT: Structure-aware Language Model with Injection and Contrastive Training for Knowledge Graph Completion, https://arxiv.org/abs/2509.06531
Manit Baser, Dinil Mon Divakaran, Mohan Gurusamy, 6 Sep 2025, ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs, https://arxiv.org/abs/2506.01386
Hamid Ahmad, Heiko Paulheim, Rita T. Sousa, 9 Sep 2025, Bio-KGvec2go: Serving up-to-date Dynamic Biomedical Knowledge Graph Embeddings, https://arxiv.org/abs/2509.07905
Andrey Sakhovskiy, Elena Tutubalina, 9 Sep 2025, BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment, https://arxiv.org/abs/2509.07588
Fernando Spadea and Oshani Seneviratne, 8 Sep 2025, Avoiding Over-Personalization with Rule-Guided Knowledge Graph Adaptation for LLM Recommendations, https://arxiv.org/abs/2509.07133
Hudson de Martim, 9 Sep 2025, Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs, https://arxiv.org/abs/2506.07853
Mingyang Li, Viktor Schlegel, Tingting Mu, Warren Del-Pinto, Goran Nenadic, 4 Sep 2025, Structured Information Matters: Explainable ICD Coding with Patient-Level Knowledge Graphs, https://arxiv.org/abs/2509.09699
Vaibhav Chaudhary, Neha Soni, Narotam Singh, Amita Kapoor, 11 Sep 2025, Fusing Knowledge and Language: A Comparative Study of Knowledge Graph-Based Question Answering with LLMs, https://arxiv.org/abs/2509.09272
Julia Gastinger, Christian Meilicke, Heiner Stuckenschmidt, 11 Sep 2025, CountTRuCoLa: Rule Confidence Learning for Temporal Knowledge Graph Forecasting, https://arxiv.org/abs/2509.09474
Vadim Zadykian, Bruno Andrade and Haithem Afli, 11 Sep 2025, Towards Explainable Job Title Matching: Leveraging Semantic Textual Relatedness and Knowledge Graphs, https://arxiv.org/abs/2509.09522
Junhong Lin, Song Wang, Xiaojie Guo, Julian Shun, Yada Zhu, 18 Sep 2025, Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs, https://arxiv.org/abs/2509.15464
Arvindh Arun, Sumit Kumar, Mojtaba Nayyeri, Bo Xiong, Ponnurangam Kumaraguru, Antonio Vergari, Steffen Staab, 19 Sep 2025, SEMMA: A Semantic Aware Knowledge Graph Foundation Model, https://arxiv.org/abs/2505.20422
Mengzheng Yang, Yanfei Ren, David Osei Opoku, Ruochang Li, Peng Ren, Chunxiao Xing, 22 Aug 2025, DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge Graph, https://arxiv.org/abs/2509.10467
Haodi Ma, Dzmitry Kasinets, Daisy Zhe Wang, 15 Sep 2025, Transformer-Based Multimodal Knowledge Graph Completion with Link-Aware Contexts, https://arxiv.org/abs/2501.15688
Alberto Cattaneo, Stephen Bonner, Thomas Martynec, Edward Morrissey, Carlo Luschi, Ian P Barrett, Daniel Justus, 18 Sep 2025, The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models, https://arxiv.org/abs/2409.04103
Chenjun Li, Laurin Lux, Alexander H. Berger, Martin J. Menten, Mert R. Sabuncu, Johannes C. Paetzold, 17 Sep 2025, Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis, https://arxiv.org/abs/2503.09808
Michael Kishelev, Pranab Bhadani, Wanying Ding, Vinay Chaudhri, 9 Sep 2025, JEL: A Novel Model Linking Knowledge Graph entities to News Mentions, https://arxiv.org/abs/2509.08086
Przemys{\l}aw Stok{\l}osa, Janusz A. Starzyk, Pawe{\l} Raif, Adrian Horzyk, Marcin Kowalik, 9 Sep 2025, Associative Knowledge Graphs for Efficient Sequence Storage and Retrieval, https://arxiv.org/abs/2411.14480
Siyuan Li, Yan Wen, Ruitong Liu, Te Sun, Ruihao Zhou, Jingyi Kang, Yunjia Wu, 10 Sep 2025, Context-Driven Knowledge Graph Completion with Semantic-Aware Relational Message Passing, https://arxiv.org/abs/2506.23141
Minh Pham Dinh, Munira Syed, Michael G Yankoski, Trenton W. Ford, 17 Sep 2025, DAVIS: Planning Agent with Knowledge Graph-Powered Inner Monologue, https://arxiv.org/abs/2410.09252

Graph Neural Networks

Research papers on Graph Neural Networks (GNNs):

Zhichun Guo, April 2024, Empowering Graph Neural Networks for Real-World Tasks, Ph.D. Thesis, Computer Science and Engineering, University of Notre Dame, Indiana, https://doi.org/10.7274/25608504.v1 https://curate.nd.edu/articles/dataset/Empowering_Graph_Neural_Networks_for_Real-World_Tasks/25608504/1 PDF: https://curate.nd.edu/ndownloader/files/46035312/1
Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui, 7 May 2024, Acceleration Algorithms in GNNs: A Survey, https://arxiv.org/abs/2405.04114
Yun Zhu, Yaoke Wang, Haizhou Shi, Siliang Tang, 28 Jan 2024, Efficient Tuning and Inference for Large Language Models on Textual Graphs, https://arxiv.org/abs/2401.15569 (Optimizing Graph Neural Networks on textual graphs using caching and early exit inference.)
Sebastian Eliassen, Raghavendra Selvan, 16 Jan 2024 (v2), Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization, https://arxiv.org/abs/2309.11856
Weishu Deng, Jia Rao, 2024, Mega: More Efficient Graph Attention for GNNs, 2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS), Year: 2024, Pages: 71-81, DOI Bookmark: 10.1109/ICDCS60910.2024.00016, https://www.computer.org/csdl/proceedings-article/icdcs/2024/860500a071/1ZCgMaVLfRm
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, 9 Aug 2024 (v2), A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations, IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2024.3447085, https://arxiv.org/abs/2308.06767 https://ieeexplore.ieee.org/abstract/document/10643325
Lena Sasal, Daniel Busby, Abdenour Hadid, 29 Aug 2024, TempoKGAT: A Novel Graph Attention Network Approach for Temporal Graph Analysis, https://arxiv.org/abs/2408.16391
Xinke Jiang, Rihong Qiu, Yongxin Xu, Wentao Zhang, Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang, 31 Oct 2024, RAGraph: A General Retrieval-Augmented Graph Learning Framework, https://arxiv.org/abs/2410.23855
V. Slavin, O. Kryvchikov, D. Laptev, 23 Jul 2025, Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems, https://arxiv.org/abs/2507.17509
Yuelin Wang, Kai Yi, Xinliang Liu, Yu Guang Wang, Shi Jin, 23 Jul 2025, ACMP: Allen-Cahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks, https://arxiv.org/abs/2206.05437
Ana Gonzalez Bermudez, Miquel Farreras, Milan Groshev, Jos\'e Antonio Trujillo, Isabel de la Bandera and Raquel Barco, 23 Jul 2025, Graph Neural Networks for O-RAN Mobility Management: A Link Prediction Approach, https://arxiv.org/abs/2502.02170
Yumeng Wang, Zengyi Wo, Wenjun Wang, Xingcheng Fu, Minglai Shao, 22 Jul 2025, Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks, https://arxiv.org/abs/2507.16347
Olga Solodova, Nick Richardson, Deniz Oktay, Ryan P. Adams, 22 Jul 2025, Graph Neural Networks Gone Hogwild, https://arxiv.org/abs/2407.00494
Zihao Song, Shirantha Welikala, Panos J. Antsaklis and Hai Lin, 22 Jul 2025, Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach, https://arxiv.org/abs/2504.06439
Jai Bardhan, Tanumoy Mandal, Subhadip Mitra, Cyrin Neeraj, Mihir Rawat, 22 Jul 2025, Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network, https://arxiv.org/abs/2505.07769
Lijun Wu, Dong Hao, Zhiyi Fan, 19 Jul 2025, Explainable Graph Neural Networks via Structural Externalities, https://arxiv.org/abs/2507.17848
Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan, 24 Jul 2025, ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks, https://arxiv.org/abs/2507.18031
Xinran Li, Xiujuan Xu, Jiaqi Qiao, 24 Jul 2025, Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation, https://arxiv.org/abs/2507.15205
Hao Ai and Yu-xi Liu, 24 Jul 2025, Scalable Parameter Design for Superconducting Quantum Circuits with Graph Neural Networks, https://arxiv.org/abs/2411.16354
Guanyuan Pan, Tiansheng Zhou, Bingtao Ma, Yaqi Wang, Jianxiang Zhao, Zhi Li, Yugui Lin, Pietro Lio, Shuai Wang, 24 Jul 2025, GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction, https://arxiv.org/abs/2504.10240
Edward Henderson, Dewi Gould, Richard Everson, George De Ath, Nick Pepper, 17 Jul 2025, Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity, https://arxiv.org/abs/2507.13423
Yifan Wei, Anwar Said, Waseem Abbas, Xenofon Koutsoukos, 18 Jul 2025, Robust Anomaly Detection with Graph Neural Networks using Controllability, https://arxiv.org/abs/2507.13954
Jagruti Patel, Thomas A. W. Bolton, Mikkel Sch\"ottner, Anjali Tarun, Sebastien Tourbier, Yasser Alem\`an-G\`omez, Jonas Richiardi, Patric Hagmann, 18 Jul 2025, Structural Connectome Harmonization Using Deep Learning: The Strength of Graph Neural Networks, https://arxiv.org/abs/2507.13992
Yeming Cai, Zhenglin Li, Yang Wang, 11 Jul 2025, Enhancing Breast Cancer Detection with Vision Transformers and Graph Neural Networks, https://arxiv.org/abs/2507.13372
Vijay K. Dubey (1), Collin E. Haese (1), Osman G\"ultekin (1), David Dalton (2), Manuel K. Rausch (1), Jan N. Fuhg (1) ((1) The University of Texas at Austin, (2) University of Glasgow), 17 Jul 2025, Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection, https://arxiv.org/abs/2507.13459
Srinitish Srinivasan and Omkumar CU, 18 Jul 2025, Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks?, https://arxiv.org/abs/2504.00142
Xu Cheng, Liang Yao, Feng He, Yukuo Cen, Yufei He, Chenhui Zhang, Wenzheng Feng, Hongyun Cai, Jie Tang, 19 Jul 2025, LPS-GNN : Deploying Graph Neural Networks on Graphs with 100-Billion Edges, https://arxiv.org/abs/2507.14570
Rabia Latief Bhat and Iqra Altaf Gillani, 21 Jul 2025, Spatio-Temporal Demand Prediction for Food Delivery Using Attention-Driven Graph Neural Networks, https://arxiv.org/abs/2507.15246
Yufei Jin and Xingquan Zhu, 18 Jul 2025, Oversmoothing Alleviation in Graph Neural Networks: A Survey and Unified View, https://arxiv.org/abs/2405.01663
Jialiang Wang, Hanmo Liu, Shimin Di, Zhili Wang, Jiachuan Wang, Lei Chen, Xiaofang Zhou, 21 Jul 2025, Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models, https://arxiv.org/abs/2408.06717
Zizhou Zhang, Qinyan Shen, Zhuohuan Hu, Qianying Liu, Huijie Shen, 20 Jul 2025, Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain, https://arxiv.org/abs/2507.07854
Peyman Baghershahi, Gregoire Fournier, Pranav Nyati, Sourav Medya, 9 Aug 2025, From Nodes to Narratives: Explaining Graph Neural Networks with LLMs and Graph Context, https://arxiv.org/abs/2508.07117
Zhihao Xue, Yun Zi, Nia Qi, Ming Gong, Yujun Zou, 9 Aug 2025, Multi-Level Service Performance Forecasting via Spatiotemporal Graph Neural Networks, https://arxiv.org/abs/2508.07122
Tiantian Yang, Zhiqian Chen, 10 Aug 2025, MOTGNN: Interpretable Graph Neural Networks for Multi-Omics Disease Classification, https://arxiv.org/abs/2508.07465
Rahul Khorana, 11 Aug 2025, Topological Feature Compression for Molecular Graph Neural Networks, https://arxiv.org/abs/2508.07807
Bowen Zhang, Genan Dai, Hu Huang, Long Lan, 9 Aug 2025, Geometry-Aware Spiking Graph Neural Network, https://arxiv.org/abs/2508.06793
Morteza Ziabakhsh, Kiyan Rezaee, Sadegh Eskandari, Seyed Amir Hossein Tabatabaei, Mohammad M. Ghassemi, 10 Aug 2025, Extracting Overlapping Microservices from Monolithic Code via Deep Semantic Embeddings and Graph Neural Network-Based Soft Clustering, https://arxiv.org/abs/2508.07486
Sujia Huang, Lele Fu, Zhen Cui, Tong Zhang, Na Song, Bo Huang, 29 Jul 2025, Torque-based Graph Surgery:Enhancing Graph Neural Networks with Hierarchical Rewiring, https://arxiv.org/abs/2507.21422
Mustapha Hemis, Hamza Kheddar, Mohamed Chahine Ghanem, Bachir Boudraa, 29 Jul 2025, Hierarchical Graph Neural Network for Compressed Speech Steganalysis, https://arxiv.org/abs/2507.21591
Zhanhong Cheng, Lingqian Hu, Yuheng Bu, Yuqi Zhou, Shenhao Wang, 28 Jul 2025, Graph neural networks for residential location choice: connection to classical logit models, https://arxiv.org/abs/2507.21334
Haolin Li, Haoyu Wang, Luana Ruiz, 30 Jul 2025, Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs, https://arxiv.org/abs/2410.16593
Shuyang Guo, Wenjin Xie, Ping Lu, Ting Deng, Richong Zhang, Jianxin Li, Xiangping Huang, Zhongyi Liu, 27 Jul 2025, Improving Subgraph Matching by Combining Algorithms and Graph Neural Networks, https://arxiv.org/abs/2507.20226
Mohit Gupta, Debjit Bhowmick, Ben Beck, 18 Jul 2025, BikeVAE-GNN: A Variational Autoencoder-Augmented Hybrid Graph Neural Network for Sparse Bicycle Volume Estimation, https://arxiv.org/abs/2507.19517
Yihan Wang, Jianing Zhao, 20 Jul 2025, Research on the application of graph data structure and graph neural network in node classification/clustering tasks, https://arxiv.org/abs/2507.19527
Yazeed Alrubyli, Omar Alomeir, Abrar Wafa, Di\'ana Hidv\'egi, Hend Alrasheed, Mohsen Bahrami, 25 Jul 2025, NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology, https://arxiv.org/abs/2507.19697
Kunhao Li, Di Wu, Jun Bai, Jing Xu, Lei Yang, Ziyi Zhang, Yiliao Song, Wencheng Yang, Taotao Cai, Yan Li, 26 Jul 2025, Who Owns This Sample: Cross-Client Membership Inference Attack in Federated Graph Neural Networks, https://arxiv.org/abs/2507.19964
Vicente Ramos (1), Sundous Hussein (1), Mohamed Abdel-Hafiz (1), Arunangshu Sarkar (2), Weixuan Liu (2), Katerina J. Kechris (2), Russell P. Bowler (3), Leslie Lange (4), Farnoush Banaei-Kashani (1) ((1) Department of Computer Science and Engineering, University of Colorado Denver, Denver, USA, (2) Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, USA, (3) Genomic Medicine Institute, Cleveland Clinic, Cleveland, USA, (4) Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, USA), 27 Jul 2025, BioNeuralNet: A Graph Neural Network based Multi-Omics Network Data Analysis Tool, https://arxiv.org/abs/2507.20440
Yongzheng Liu, Yiming Wang, Po Xu, Yingjie Xu, Yuntian Chen, Dongxiao Zhang, 28 Jul 2025, BuildSTG: A Multi-building Energy Load Forecasting Method using Spatio-Temporal Graph Neural Network, https://arxiv.org/abs/2507.20838
Bernardo Cuenca Grau, Eva Feng, Przemys{\l}aw A. Wa{\l}\k{e}ga, 26 Jul 2025, The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic, https://arxiv.org/abs/2505.08021
Miguel Lopez-Duran, Julian Fierrez, Aythami Morales, Ruben Tolosana, Oscar Delgado-Mohatar, Alvaro Ortigosa, 28 Jul 2025, Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs, https://arxiv.org/abs/2505.14699
Eran Rosenbluth and Martin Grohe, 30 Jul 2025, Repetition Makes Perfect: Recurrent Graph Neural Networks Match Message Passing Limit, https://arxiv.org/abs/2505.00291
Tong Nie, Jian Sun, Wei Ma, 31 Jul 2025, Predicting Large-scale Urban Network Dynamics with Energy-informed Graph Neural Diffusion, https://arxiv.org/abs/2508.00037
Mohit Gupta, Debjit Bhowmick, Rhys Newbury, Meead Saberi, Shirui Pan and Ben Beck, 31 Jul 2025, INSPIRE-GNN: Intelligent Sensor Placement to Improve Sparse Bicycling Network Prediction via Reinforcement Learning Boosted Graph Neural Networks, https://arxiv.org/abs/2508.00141
Yoonhyuk Choi, Jiho Choi, Chong-Kwon Kim, 1 Aug 2025, Sheaf Graph Neural Networks via PAC-Bayes Spectral Optimization, https://arxiv.org/abs/2508.00357
Mukesh Kumar Sahu and Pinki Roy, 1 Aug 2025, Similarity-Based Self-Construct Graph Model for Predicting Patient Criticalness Using Graph Neural Networks and EHR Data, https://arxiv.org/abs/2508.00615
Molly Noel, Gabriel Mancino-Ball, Yangyang Xu, 1 Aug 2025, Neighbor-Sampling Based Momentum Stochastic Methods for Training Graph Neural Networks, https://arxiv.org/abs/2508.00267
Gaotang Li, Danai Koutra, Yujun Yan, 1 Aug 2025, Tackling Size Generalization of Graph Neural Networks on Biological Data from a Spectral Perspective, https://arxiv.org/abs/2305.15611
Yoonhyuk Choi, Jiho Choi, Chong-Kwon Kim, 1 Aug 2025, Adaptive Branch Specialization in Spectral-Spatial Graph Neural Networks for Certified Robustness, https://arxiv.org/abs/2505.08320
Trung Nguyen, Md Masud Rana, Farjana Tasnim Mukta, Chang-Guo Zhan, Duc Duy Nguyen, 1 Aug 2025, Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction, https://arxiv.org/abs/2507.18926
Xudong Wang, Tongxin Li, Chris Ding, Jicong Fan, 4 Aug 2025, Adaptive Riemannian Graph Neural Networks, https://arxiv.org/abs/2508.02600
Yanmei Hu, Siyuan Yin, Yihang Wu, Xue Yue and Yue Liu, 2 Aug 2025, A graph neural network based on feature network for identifying influential nodes, https://arxiv.org/abs/2508.01278
Divya Anand Sinha, Ruijie Du, Yezi Liu, Athina Markopolou, Yanning Shen, 3 Aug 2025, Gradient Inversion Attack on Graph Neural Networks, https://arxiv.org/abs/2411.19440
Antonio Tudisco, Deborah Volpe, Giacomo Orlandi, Giovanna Turvani, 4 Aug 2025, Graph Neural Network-Based Predictor for Optimal Quantum Hardware Selection, https://arxiv.org/abs/2507.19093
Kangkang Lu, Yanhua Yu, Zhiyong Huang, Tat-Seng Chua, 5 Aug 2025, Enhancing Spectral Graph Neural Networks with LLM-Predicted Homophily, https://arxiv.org/abs/2506.14220
Santhoshkumar Peddi, Sadhvik Bathini, Arun Balasubramanian, Monalisa Sarma, Debasis Samanta, 6 Aug 2025, ProtoN: Prototype Node Graph Neural Network for Unconstrained Multi-Impression Ear Recognition, https://arxiv.org/abs/2508.04381
Zhihao Wen, Yuan Fang, Pengcheng Wei, Fayao Liu, Zhenghua Chen, Min Wu, 6 Aug 2025, Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction, https://arxiv.org/abs/2405.04336
Krzysztof Olejniczak, Xingyue Huang, Mikhail Galkin, \.Ismail \.Ilkan Ceylan, 5 Aug 2025, One Model, Any Conjunctive Query: Graph Neural Networks for Answering Queries over Incomplete Knowledge Graphs, https://arxiv.org/abs/2409.13959
Moshe Eliasof, Eldad Haber, Carola-Bibiane Sch\"onlieb, 7 Aug 2025, TANGO: Graph Neural Dynamics via Learned Energy and Tangential Flows, https://arxiv.org/abs/2508.05070
Massimiliano Romiti, 7 Aug 2025, A Graph Neural Network Approach for Mapping the Conceptual Structure and Inter-Branch Connectivity of Physics, https://arxiv.org/abs/2508.05724
Qin Chen, Guojie Song, 8 Aug 2025, Adaptive Heterogeneous Graph Neural Networks: Bridging Heterophily and Heterogeneity, https://arxiv.org/abs/2508.06034
Vibhor Agrawal, Fay Wang, Rishi Puri, 25 Jul 2025, Query-Aware Graph Neural Networks for Enhanced Retrieval-Augmented Generation, https://arxiv.org/abs/2508.05647
Dahai Yu, Dingyi Zhuang, Lin Jiang, Rongchao Xu, Xinyue Ye, Yuheng Bu, Shenhao Wang, Guang Wang, 12 Aug 2025, UQGNN: Uncertainty Quantification of Graph Neural Networks for Multivariate Spatiotemporal Prediction, https://arxiv.org/abs/2508.08551
Luigi D'Amico, Daniel De Rosso, Ninad Dixit, Raul Salles de Padua, Samuel Palmer, Samuel Mugel, Rom\'an Or\'us, Holger Eble, and Ali Abedi, 12 Aug 2025, Blockchain Network Analysis using Quantum Inspired Graph Neural Networks & Ensemble Models, https://arxiv.org/abs/2508.09237
Minghao Liu, Chia-Hsuan Lu, Marta Kwiatkowska, 12 Aug 2025, Exact Verification of Graph Neural Networks with Incremental Constraint Solving, https://arxiv.org/abs/2508.09320
Yun Zi, Ming Gong, Zhihao Xue, Yujun Zou, Nia Qi, Yingnan Deng, 13 Aug 2025, Graph Neural Network and Transformer Integration for Unsupervised System Anomaly Discovery, https://arxiv.org/abs/2508.09401
Fang Wang and Ernesto Damiani, 13 Aug 2025, Time-Aware and Transition-Semantic Graph Neural Networks for Interpretable Predictive Business Process Monitoring, https://arxiv.org/abs/2508.09527
Subhankar Sarkar and Souvik Chakraborty, 13 Aug 2025, Physics- and geometry-aware spatio-spectral graph neural operator for time-independent and time-dependent PDEs, https://arxiv.org/abs/2508.09627
Mohammad Zia Ur Rehman, Sufyaan Zahoor, Areeb Manzoor, Musharaf Maqbool, Nagendra Kumar, 7 Aug 2025, A Context-aware Attention and Graph Neural Network-based Multimodal Framework for Misogyny Detection, https://arxiv.org/abs/2508.09175
Marco S\"alzer, Fran\c{c}ois Schwarzentruber, Nicolas Troquard, 13 Aug 2025, Verifying Quantized Graph Neural Networks is PSPACE-complete, https://arxiv.org/abs/2502.16244
Asela Hevapathige, Asiri Wijesinghe, Ahad N. Zehmakan, 15 Aug 2025, Graph Neural Diffusion via Generalized Opinion Dynamics, https://arxiv.org/abs/2508.11249
Fanzhen Liu, Xiaoxiao Ma, Jian Yang, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Quan Z. Sheng, Jia Wu, 15 Aug 2025, Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies, https://arxiv.org/abs/2508.11513
Hossein Shokouhinejad, Roozbeh Razavi-Far, Griffin Higgins, Ali A Ghorbani, 14 Aug 2025, Explainable Attention-Guided Stacked Graph Neural Networks for Malware Detection, https://arxiv.org/abs/2508.09801
BG Tong, 10 Aug 2025, A Graph Neural Network based on a Functional Topology Model: Unveiling the Dynamic Mechanisms of Non-Suicidal Self-Injury in Single-Channel EEG, https://arxiv.org/abs/2508.11684
Anshul Ahluwalia, Payman Behnam, Rohit Das, Alind Khare, Biswadeep Chakraborty, Pan Li, Alexey Tumanov, 16 Aug 2025, STRIDE: Structure and Embedding Distillation with Attention for Graph Neural Networks, https://arxiv.org/abs/2310.15938
Ningyi Liao, Haoyu Liu, Zulun Zhu, Siqiang Luo, Laks V.S. Lakshmanan, 18 Aug 2025, Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency, https://arxiv.org/abs/2406.09675
Su Chen, Xiaohua Qi, Xixun Lin, Yanmin Shang, Xiaolin Xu and Yangxi Li, 17 Aug 2025, Deep Graph Neural Point Process For Learning Temporal Interactive Networks, https://arxiv.org/abs/2508.13219
Junwei Su, Chuan Wu, 20 Aug 2025, On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks, https://arxiv.org/abs/2508.14338
Zengyi Wo, Chang Liu, Yumeng Wang, Minglai Shao, Wenjun Wang, 20 Aug 2025, Improving Fairness in Graph Neural Networks via Counterfactual Debiasing, https://arxiv.org/abs/2508.14683
Mengyang Cao, Frank F. Yang, Yi Jin, Yijun Yan, 10 Aug 2025, Graph Neural Network for Product Recommendation on the Amazon Co-purchase Graph, https://arxiv.org/abs/2508.14059
Sebastian Musia{\l}, Bartosz Zieli\'nski, Tomasz Danel, 20 Aug 2025, Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis, https://arxiv.org/abs/2508.15015
Mustafa Mohammadi Gharasuie and Luis Rueda, 20 Aug 2025, Fast Graph Neural Network for Image Classification, https://arxiv.org/abs/2508.14958
Zhiqiang Que, Chang Sun, Sudarshan Paramesvaran, Emyr Clement, Katerina Karakoulaki, Christopher Brown, Lauri Laatu, Arianna Cox, Alexander Tapper, Wayne Luk, Maria Spiropulu, 21 Aug 2025, JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs, https://arxiv.org/abs/2508.15468
Anahita Asadi, Leonid Popryho, Inna Partin-Vaisband, 22 Aug 2025, Fast and Accurate RFIC Performance Prediction via Pin Level Graph Neural Networks and Probabilistic Flow, https://arxiv.org/abs/2508.16403
Circe Hsu, Claire Schlesinger, Karan Mudaliar, Jordan Leung, Robin Walters, Peter Schindler, 22 Aug 2025, FIRE-GNN: Force-informed, Relaxed Equivariance Graph Neural Network for Rapid and Accurate Prediction of Surface Properties, https://arxiv.org/abs/2508.16012
Yuebo Luo, Shiyang Li, Junran Tao, Kiran Thorat, Xi Xie, Hongwu Peng, Nuo Xu, Caiwen Ding, Shaoyi Huang, 22 Aug 2025, DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs, https://arxiv.org/abs/2508.16769
Junhyun Lee, Veronika Thost, Bumsoo Kim, Jaewoo Kang, Tengfei Ma, 22 Aug 2025, Understanding and Tackling Over-Dilution in Graph Neural Networks, https://arxiv.org/abs/2508.16829
Bicheng Wang and Junping Wang and Yibo Xue, 22 Aug 2025, Physics-Inspired Spatial Temporal Graph Neural Networks for Predicting Industrial Chain Resilience, https://arxiv.org/abs/2508.16836
Silvia Beddar-Wiesing and Alice Moallemy-Oureh, 25 Aug 2025, Weisfeiler-Lehman meets Events: An Expressivity Analysis for Continuous-Time Dynamic Graph Neural Networks, https://arxiv.org/abs/2508.18052
Menglin Yang, Min Zhou, Tong Zhang, Jiahong Liu, Zhihao Li, Lujia Pan, Hui Xiong, Irwin King, 22 Aug 2025, Hyperbolic Graph Neural Networks: A Review of Methods and Applications, https://arxiv.org/abs/2202.13852
Moshe Eliasof, Eldad Haber, 23 Aug 2025, Quadratic Binary Optimization with Graph Neural Networks, https://arxiv.org/abs/2404.04874
XiaYu Liu, Chao Fan, Yang Liu, Hou-biao Li, 24 Aug 2025, Multi-Level Fusion Graph Neural Network for Molecule Property Prediction, https://arxiv.org/abs/2507.03430
Michela Lapenna, Caterina De Bacco, 22 Aug 2025, How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data?, https://arxiv.org/abs/2506.11869
Kevin Monteiro, Sam Nallaperuma-Herzberg, Martina Mason, Steve Niederer, 2 Jul 2025, Graph Convolutional Neural Networks to Model the Brain for Insomnia, https://arxiv.org/abs/2507.14147
Ganesh Sundaram, Jonas Ulmen, and Daniel G\"orges, 20 Jul 2025, Enhanced Pruning Strategy for Multi-Component Neural Architectures Using Component-Aware Graph Analysis, https://arxiv.org/abs/2504.13296
Andrew Kiruluta, Andreas Lemos, and Priscilla Burity, 27 Jul 2025, Beyond Neural Networks: Symbolic Reasoning over Wavelet Logic Graph Signals, https://arxiv.org/abs/2507.21190
Cencheng Shen, Yuexiao Dong, 8 Aug 2025, A Graph Sufficiency Perspective for Neural Networks, https://arxiv.org/abs/2507.10215
Mustafa Mohammadi Gharasuie and Luis Rueda, 19 Aug 2025, Accelerating Image Classification with Graph Convolutional Neural Networks using Voronoi Diagrams, https://arxiv.org/abs/2508.14218
Nathan X. Kodama and Kenneth A. Loparo, 22 Aug 2025, Latent Graph Learning in Generative Models of Neural Signals, https://arxiv.org/abs/2508.16776
Lingkai Kong, Haotian Sun, Yuchen Zhuang, Haorui Wang, Wenhao Mu, Chao Zhang, 23 Aug 2025, Two Birds with One Stone: Enhancing Uncertainty Quantification and Interpretability with Graph Functional Neural Process, https://arxiv.org/abs/2508.17097
Riccardo Cappi, Paolo Frazzetto, Nicol\`o Navarin, Alessandro Sperduti, 25 Aug 2025, Unveiling the Actual Performance of Neural-based Models for Equation Discovery on Graph Dynamical Systems, https://arxiv.org/abs/2508.18173
Razi Hasson and Reuven Guetta, 4 Sep 2025, Comment on "A Note on Over-Smoothing for Graph Neural Networks", https://arxiv.org/abs/2509.04178
Rog\'erio Almeida Gouv\^ea, Pierre-Paul De Breuck, Tatiane Pretto, Gian-Marco Rignanese, Marcos Jos\'e Leite dos Santos, 2 Sep 2025, Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability, https://arxiv.org/abs/2509.03547
Yun Chu, Qiuhao Wang, Enze Zhou, Qian Liu and Gang Zheng, 4 Sep 2025, EZhouNet:A framework based on graph neural network and anchor interval for the respiratory sound event detection, https://arxiv.org/abs/2509.01153
Faqian Guan and Tianqing Zhu and Zhoutian Wang and Wei Ren and Wanlei Zhou, 5 Sep 2025, Graph Unlearning: Efficient Node Removal in Graph Neural Networks, https://arxiv.org/abs/2509.04785
Arefin Niam, Tevfik Kosar and M S Q Zulkar Nine, 5 Sep 2025, RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks, https://arxiv.org/abs/2509.05207
Henri Doerks, Paul H\"ausner, Daniel Hern\'andez Escobar, Jens Sj\"olund, 5 Sep 2025, Learning to accelerate distributed ADMM using graph neural networks, https://arxiv.org/abs/2509.05288
Riddhiman Raut, Evan M. Mihalko, Amrita Basak, 28 Aug 2025, Multiscale Graph Neural Network for Turbulent Flow-Thermal Prediction Around a Complex-Shaped Pin-Fin, https://arxiv.org/abs/2509.04463
Mayur S Gowda, John Shi, Augusto Santos, Jos\'e M. F. Moura, 4 Sep 2025, Inferring the Graph Structure of Images for Graph Neural Networks, https://arxiv.org/abs/2509.04677
Levi Rauchwerger and Ron Levie, 25 Aug 2025, A Note on Graphon-Signal Analysis of Graph Neural Networks, https://arxiv.org/abs/2508.18564
Hongbo Liu, Siyi Li, Zheng Yu, 26 Aug 2025, Predicting Drug-Drug Interactions Using Heterogeneous Graph Neural Networks: HGNN-DDI, https://arxiv.org/abs/2508.18766
Paul Garnier, Jonathan Viquerat, Elie Hachem, 26 Aug 2025, Automated discovery of finite volume schemes using Graph Neural Networks, https://arxiv.org/abs/2508.19052
Hugo Attali, Thomas Papastergiou, Nathalie Pernelle, Fragkiskos D. Malliaros, 26 Aug 2025, Dynamic Triangulation-Based Graph Rewiring for Graph Neural Networks, https://arxiv.org/abs/2508.19071
Levi Rauchwerger and Stefanie Jegelka and Ron Levie, 26 Aug 2025, Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs, https://arxiv.org/abs/2411.05464
Rajesh Mangannavar, Stefan Lee, Alan Fern, Prasad Tadepalli, 25 Aug 2025, Graph Neural Network Based Action Ranking for Planning, https://arxiv.org/abs/2412.04752
Adarsh Jamadandi, Jing Xu, Adam Dziedzic, Franziska Boenisch, 26 Aug 2025, Memorization in Graph Neural Networks, https://arxiv.org/abs/2508.19352
Meng Qin, Weihua Li, Jinqiang Cui, Sen Pei, 27 Aug 2025, InfraredGP: Efficient Graph Partitioning via Spectral Graph Neural Networks with Negative Corrections, https://arxiv.org/abs/2508.19737
Mingyue Kong, Yinglong Zhang, Chengda Xu, Xuewen Xia, Xing Xu, 27 Aug 2025, Parameter-Free Structural-Diversity Message Passing for Graph Neural Networks, https://arxiv.org/abs/2508.19884
Xianfeng Song, Yi Zou, Zheng Shi, Zheng Liu, 27 Aug 2025, GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network, https://arxiv.org/abs/2412.18221
Tu\u{g}rul Hasan Karabulut and \.Inci M. Bayta\c{s}, 28 Aug 2025, Local Virtual Nodes for Alleviating Over-Squashing in Graph Neural Networks, https://arxiv.org/abs/2508.20597
Jinluan Yang, Ruihao Zhang, Zhengyu Chen, Fei Wu, Kun Kuang, 30 Aug 2025, Unifying Adversarial Perturbation for Graph Neural Networks, https://arxiv.org/abs/2509.00387
Lukas Pertl, Han Xuanyuan, Pietro Li\`o, 31 Aug 2025, Superposition in Graph Neural Networks, https://arxiv.org/abs/2509.00928
Oph\'elia Miralles, Daniele Nerini, Jonas Bhend, Baudouin Raoult, Christoph Spirig, 16 Aug 2025, Deep Learning for Operational High-Resolution Nowcasting in Switzerland Using Graph Neural Networks, https://arxiv.org/abs/2509.00017
Hind Aljuaid, Areej Alhothali, Ohoud Al-Zamzami, Hussein Assalahi, 1 Sep 2025, TransGAT: Transformer-Based Graph Neural Networks for Multi-Dimensional Automated Essay Scoring, https://arxiv.org/abs/2509.01640
C\'edric Allier, Magdalena C. Schneider, Michael Innerberger, Larissa Heinrich, John A. Bogovic, Stephan Saalfeld, 31 Aug 2025, Decomposing heterogeneous dynamical systems with graph neural networks, https://arxiv.org/abs/2407.19160
Howard Dai, Nyambura Njenga, Benjamin Whitsett, Catherine Ma, Darwin Deng, Sara de \'Angel, Alexandre Van Tassel, Siddharth Viswanath, Ryan Pellico, Ian Adelstein, Smita Krishnaswamy, 2 Sep 2025, Learning Laplacian Eigenvectors: a Pre-training Method for Graph Neural Networks, https://arxiv.org/abs/2509.02803
Niteesh Midlagajni, Constantin A. Rothkopf, 3 Sep 2025, Graph neural networks for learning liquid simulations in dynamic scenes containing kinematic objects, https://arxiv.org/abs/2509.03446
Joel Jaskari, Chandreyee Roy, Fumiko Ogushi, Mikko Saukkoriipi, Jaakko Sahlsten, Kimmo Kaski, 3 Sep 2025, Temporal social network modeling of mobile connectivity data with graph neural networks, https://arxiv.org/abs/2509.03319
Shuichi Nishino, Tomohiro Shiraishi, Teruyuki Katsuoka, Ichiro Takeuchi, 3 Sep 2025, Statistical Test for Saliency Maps of Graph Neural Networks via Selective Inference, https://arxiv.org/abs/2505.16893
Jie Fu, Hong Yuan, Zhili Chen, Wendy Hui Wang, 5 Sep 2025, Safeguarding Graph Neural Networks against Topology Inference Attacks, https://arxiv.org/abs/2509.05429
Dibyajyoti Nayak and Somdatta Goswami, 7 Sep 2025, Data-Efficient Time-Dependent PDE Surrogates: Graph Neural Simulators vs Neural Operators, https://arxiv.org/abs/2509.06154
Chang Xue, Youwei Lu, Chen Yang, Jinming Xing, 8 Sep 2025, RecMind: LLM-Enhanced Graph Neural Networks for Personalized Consumer Recommendations, https://arxiv.org/abs/2509.06286
Shaoqi Wei, Senling Wang, Hiroshi Kai, Yoshinobu Higami, Ruijun Ma, Tianming Ni, Xiaoqing Wen and Hiroshi Takahashi, 8 Sep 2025, A Spatio-Temporal Graph Neural Networks Approach for Predicting Silent Data Corruption inducing Circuit-Level Faults, https://arxiv.org/abs/2509.06289
Lili Chen, Changyang She, Jingge Zhu and Jamie Evans, 8 Sep 2025, Graph Neural Networks for Resource Allocation in Interference-limited Multi-Channel Wireless Networks with QoS Constraints, https://arxiv.org/abs/2509.06395
Kushal Bose and Swagatam Das, 8 Sep 2025, Asynchronous Message Passing for Addressing Oversquashing in Graph Neural Networks, https://arxiv.org/abs/2509.06777
Emmanouil Karystinaios, Johannes Hentschel, Markus Neuwirth, Gerhard Widmer, 8 Sep 2025, AnalysisGNN: Unified Music Analysis with Graph Neural Networks, https://arxiv.org/abs/2509.06654
Matthew Lai, Keegan Go, Zhibin Li, Torsten Kroger, Stefan Schaal, Kelsey Allen, Jonathan Scholz, 5 Sep 2025, RoboBallet: Planning for Multi-Robot Reaching with Graph Neural Networks and Reinforcement Learning, https://arxiv.org/abs/2509.05397
Zhongyuan Zhao, Gunjan Verma, Ananthram Swami, Santiago Segarra, 5 Sep 2025, Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks (Journal Version), https://arxiv.org/abs/2509.05447
Priodyuti Pradhan and Amit Reza, 8 Sep 2025, Predicting Steady-State Behavior in Complex Networks with Graph Neural Networks, https://arxiv.org/abs/2502.01693
Lachlan Simpson, Kyle Millar, Adriel Cheng, Cheng-Chew Lim, Hong Gunn Chew, 9 Sep 2025, Graph-based Integrated Gradients for Explaining Graph Neural Networks, https://arxiv.org/abs/2509.07648
Katherine Berry and Liang Cheng, 9 Sep 2025, A Survey of Graph Neural Networks for Drug Discovery: Recent Developments and Challenges, https://arxiv.org/abs/2509.07887
Yuqi Zhou, Zhanhong Cheng, Lingqian Hu, Yuheng Bu, Shenhao Wang, 8 Sep 2025, NestGNN: A Graph Neural Network Framework Generalizing the Nested Logit Model for Travel Mode Choice, https://arxiv.org/abs/2509.07123
Nil Ayday, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, 12 Sep 2025, Why does your graph neural network fail on some graphs? Insights from exact generalisation error, https://arxiv.org/abs/2509.10337
Richard Bergna, Sergio Calvo-Ordo\~nez, Felix L. Opolka, Pietro Li\`o, Jose Miguel Hernandez-Lobato, 12 Sep 2025, Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations, https://arxiv.org/abs/2408.16115
Pritam Sen, Yao Ma, Cristian Borcea, 11 Sep 2025, CryptGNN: Enabling Secure Inference for Graph Neural Networks, https://arxiv.org/abs/2509.09107
Kordel K. France, Ovidiu Daescu, 11 Sep 2025, Diffusion Graph Neural Networks for Robustness in Olfaction Sensors and Datasets, https://arxiv.org/abs/2506.00455
Zimo Yan and Jie Zhang and Zheng Xie and Yiping Song and Hao Li, 18 Sep 2025, A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions Prediction, https://arxiv.org/abs/2509.15256
Giacomo Dall'Olio, Rainer Kolisch, Yaoxin Wu, 18 Sep 2025, Partial Column Generation with Graph Neural Networks for Team Formation and Routing, https://arxiv.org/abs/2509.15275
Xiao Yue, Guangzhi Qu, Lige Gan, 18 Sep 2025, GIN-Graph: A Generative Interpretation Network for Model-Level Explanation of Graph Neural Networks, https://arxiv.org/abs/2503.06352
Jaume Banus, Augustin C. Ogier, Roger Hullin, Philippe Meyer, Ruud B. van Heeswijk, Jonas Richiardi, 16 Sep 2025, Spatiotemporal graph neural process for reconstruction, extrapolation, and classification of cardiac trajectories, https://arxiv.org/abs/2509.12953
Dieter Balemans, Thomas Huybrechts, Jan Steckel, Siegfried Mercelis, 4 Sep 2025, Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning, https://arxiv.org/abs/2509.10526
Amirhossein Ghaffari, Huong Nguyen, Lauri Lov\'en, Ekaterina Gilman, 4 Sep 2025, STM-Graph: A Python Framework for Spatio-Temporal Mapping and Graph Neural Network Predictions, https://arxiv.org/abs/2509.10528
Mayssa Soussia, Yijun Lin, Mohamed Ali Mahjoub and Islem Rekik, 13 Sep 2025, CogGNN: Cognitive Graph Neural Networks in Generative Connectomics, https://arxiv.org/abs/2509.10864
Jin Han, Xin-Zheng Lu, Jia-Rui Lin, 14 Sep 2025, BIGNet: Pretrained Graph Neural Network for Embedding Semantic, Spatial, and Topological Data in BIM Models, https://arxiv.org/abs/2509.11104
Luke Delzer, Robert Kroleski, Ali K. AlShami, Jugal Kalita, 15 Sep 2025, Drug Repurposing Using Deep Embedded Clustering and Graph Neural Networks, https://arxiv.org/abs/2509.11493
Samir Moustafa, Lorenz Kummer, Simon Fetzel, Nils M. Kriege, Wilfried N. Gansterer, 15 Sep 2025, Visualization and Analysis of the Loss Landscape in Graph Neural Networks, https://arxiv.org/abs/2509.11792
Mayur Patil, Qadeer Ahmed, Shawn Midlam-Mohler, 15 Sep 2025, Travel Time and Weather-Aware Traffic Forecasting in a Conformal Graph Neural Network Framework, https://arxiv.org/abs/2509.12043
Prajit Sengupta and Islem Rekik, 2 Sep 2025, FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification, https://arxiv.org/abs/2509.10510
Aryan Gupta, 10 Sep 2025, Assessing the Limits of Graph Neural Networks for Vapor-Liquid Equilibrium Prediction: A Cryogenic Mixture Case Study, https://arxiv.org/abs/2509.10565
Rodrigue Govan (ISEA), Romane Scherrer (ISEA), Philippe Fournier-Viger, Nazha Selmaoui-Folcher (ISEA), 15 Sep 2025, SpaPool: Soft Partition Assignment Pooling for__Graph Neural Networks, https://arxiv.org/abs/2509.11675
Andrea Cavallo and Samuel Rey and Antonio G. Marques and Elvin Isufi, 18 Sep 2025, Precision Neural Networks: Joint Graph And Relational Learning, https://arxiv.org/abs/2509.14821
Qingqi Zhao and Heng Xiao, 17 Sep 2025, An End-to-End Differentiable, Graph Neural Network-Embedded Pore Network Model for Permeability Prediction, https://arxiv.org/abs/2509.13841

Compound AI Architectures

Compound AI architectures are a new category that generalizes both RAG and multi-AI ensemble architectures. The general idea is that various components can be placed around an LLM, or multiple LLM queries can be used, and this can be done in a variety of ways. RAG is a well-known subcategory in this vein, as are extensions using Knowledge Graphs.

Research on Compound AI architectures:

Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini, 31 Jul 2024, Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, https://arxiv.org/abs/2407.21787 (Generating multiple answers by repeated inference queries, and then using a verifier to choose the best one, which is shown to greatly increase overall accuracy.)
Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz, 31 Jul 2024, Adaptive Retrieval-Augmented Generation for Conversational Systems, https://arxiv.org/abs/2407.21712 (Deciding whether or not to include a RAG external data request in the inference of a chatbot in a multi-turn conversation.)
Matei Zaharia, Omar Khattab, Lingjiao Chen, Jared Quincy Davis, Heather Miller, Chris Potts, James Zou, Michael Carbin, Jonathan Frankle, Naveen Rao, Ali Ghodsi, Feb 18, 2024, The Shift from Models to Compound AI Systems, https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/
Jared Quincy Davis, Boris Hanin, Lingjiao Chen, Peter Bailis, Ion Stoica, Matei Zaharia, 23 Jul 2024, Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design, https://www.arxiv.org/abs/2407.16831
Sherry Ruan, Tian Zhao, 28 May 2024, JungleGPT: Designing and Optimizing Compound AI Systems for E-Commerce, https://arxiv.org/abs/2407.00038
Cognine, 2024, Why 2024 is the Year of AI Agents and Compound AI Systems? https://cognine.com/why-2024-is-the-year-of-ai-agents-and-compound-ai-systems/
Sean Sheng and Sherlock Xu, August 15, 2024, A Guide to Compound AI Systems, https://www.bentoml.com/blog/a-guide-to-compound-ai-systems
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 6 Jun 2024 (v2), SGLang: Efficient Execution of Structured Language Model Programs, https://arxiv.org/abs/2312.07104 https://github.com/sgl-project/sglang
An Efficient Network Orchestrator for Distributed Compound Language Model Systems Muhammad Shahir Abdurrahman, Stanford University, Stanford, California, USA, https://www.scs.stanford.edu/24sp-cs244b/projects/An_Efficient_Network_Orchestrator_for_Distributed_Compound_Language_Model_Systems.pdf
Melissa Malec, June 5, 2024, AI Orchestration Explained: The What, Why & How for 2024, https://hatchworks.com/blog/gen-ai/ai-orchestration/
Yanxi Chen, Yaliang Li, Bolin Ding, Jingren Zhou, 20 Jul 2024, On the Design and Analysis of LLM-Based Algorithms, https://arxiv.org/abs/2407.14788 https://github.com/modelscope/agentscope/tree/main/examples/paper_llm_based_algorithm
Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
Latent Space, Nov 2024, Why Compound AI + Open Source will beat Closed AI, https://www.latent.space/p/fireworks
Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri, Rodrigo Fonseca, Adam Belay, Ricardo Bianchini, 29 Jan 2025 (v2), Towards Resource-Efficient Compound AI Systems, https://arxiv.org/abs/2501.16634
Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Matei Zaharia, James Zou, Ion Stoica, 20 Feb 2025, Optimizing Model Selection for Compound AI Systems, https://arxiv.org/abs/2502.14815
Rajeshkumar Bambhaniya, Abhimanyu ; Wu, Hanjiang ; Subramanian, Suvinay ; Srinivasan, Sudarshan ; Kundu, Souvik ; Yazdanbakhsh, Amir ; Elavazhagan, Midhilesh ; Kumar, Madhu ; Krishna, Tushar, April 2025, Understanding and Optimizing Multi-Stage AI Inference Pipelines, https://ui.adsabs.harvard.edu/abs/2025arXiv250409775R/abstract https://arxiv.org/abs/2504.09775
OnlyCFO, Apr 29, 2025, Bullish: Vertical & Compound Software: In a world of AI, companies need to be more multi-product and vertical to win, https://www.onlycfo.io/p/bullish-vertical-and-compound-software
Tomasz Tunguz, Jul 17, 2025, Hidden Technical Debt in AI, https://tomtunguz.com/hidden-technical-debt-in-ai/
Yang Liu, Bingjie Yan, Tianyuan Zou, Jianqing Zhang, Zixuan Gu, Jianbing Ding, Xidong Wang, Jingyi Li, Xiaozhou Ye, Ye Ouyang, Qiang Yang, Ya-Qin Zhang, 24 Apr 2025, Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks, https://arxiv.org/abs/2504.17421
Marc Brooker, Aug 2025, LLMs as Parts of Systems, https://brooker.co.za/blog/2025/08/12/llms-as-components.html
Deepti Raghavan, Keshav Santhanam, Muhammad Shahir Rahman, Nayani Modugula, Luis Gaspar Schroeder, Maximilien Cura, Houjun Liu, Pratiksha Thaker, Philip Levis, Matei Zaharia, 22 Jul 2025, Alto: Orchestrating Distributed Compound AI Systems with Nested Ancestry, https://arxiv.org/abs/2403.04311
Soheil Radfar, Faezeh Maghsoodifar, Hamed Moftakhari and Hamid Moradkhani, 20 Jul 2025, Integrating Newton's Laws with deep learning for enhanced physics-informed compound flood modelling, https://arxiv.org/abs/2507.15021
Hongzhi Zhang, Zhonglie Liu, Kun Meng, Jiameng Chen, Jia Wu, Bo Du, Di Lin, Yan Che, Wenbin Hu, 28 Jul 2025, Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction, https://arxiv.org/abs/2507.20925
Nguyen Manh Son, Pham Huu Vang, Nguyen Thi Dung, Nguyen Manh Ha. Ta Thi Thao, Tran Thi Thu Thuy, Phan Minh Giang, 13 Aug 2025, In silico study on the cytotoxicity against Hela cancer cells of xanthones bioactive compounds from Garcinia cowa: QSAR based on Graph Deep Learning, Network Pharmacology, and Molecular Docking, https://arxiv.org/abs/2508.10117
Wonjun Yi, Wonho Jung, Hyeonuk Nam, Kangmin Jang, Yong-Hwa Park, 8 Sep 2025, Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition, https://arxiv.org/abs/2505.24001
Jonathan Adam Rico, Nagarajan Raghavan, and Senthilnath Jayavelu, 19 Sep 2025, Compound Fault Diagnosis for Train Transmission Systems Using Deep Learning with Fourier-enhanced Representation, https://arxiv.org/abs/2504.07155

Research on Efficient Architectures

Canwen Xu, 2024, Efficient Natural Language Processing for Language Models, Ph.D. thesis, Computer Science, UNIVERSITY OF CALIFORNIA SAN DIEGO, PDF: https://escholarship.org/uc/item/9dv1k5xv PDF: https://escholarship.org/content/qt9dv1k5xv/qt9dv1k5xv.pdf?t=sc34ay (Evaluates several acceleration methods including early-exit, PEFT, and distillation.)
Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar, Dec 2023, LLM in a flash: Efficient Large Language Model Inference with Limited Memory Apple Research, https://arxiv.org/abs/2312.11514
Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, and Yuxiong He. 2022b. Extreme compression for pre-trained transformers made simple and efficient. In Advances in Neural Information Processing Systems https://arxiv.org/abs/2206.01859
Howard, A., Sandler, M., Chu, G., Chen, L., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q. V., and Adam, H. (2019). Searching for mobilenetv3. CoRR, abs/1905.02244. URL: http://arxiv.org/abs/1905.02244
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf.com/content_CVPR_2019/html/Tan_MnasNet_Platform_Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper.html.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114, Long Beach, California, USA. PMLR. URL: http://proceedings.mlr.press/v97/tan19a.html
Tan, M. and Le, Q. (2021). Efficientnetv2: Smaller models and faster training. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 10096–10106. PMLR. URL: https://proceedings.mlr.press/ v139/tan21a.html
Iandola, F. N., Moskewicz, M. W., Ashraf, K., Han, S., Dally, W. J., and Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size. CoRR, abs/1602.07360. URL: http://arxiv.org/abs/1602.07360
Gholami, A., Kwon, K., Wu, B., Tai, Z., Yue, X., Jin, P., Zhao, S., and Keutzer, K. (2018). Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. URL: https://openaccess.thecvf.com/content_cvpr_2018_workshops/w33/html/Gholami_SqueezeNext_Hardware_Aware_Neural_CVPR_2018_paper.html
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861. URL: http://arxiv.org/abs/1704.04861
Vivienne Sze , Yu-Hsin Chen, et al., Jun 24, 2020, Efficient Processing of Deep Neural Networks (Synthesis Lectures on Computer Architecture) Part of: Synthesis Lectures on Computer Architecture (7 books), https://www.amazon.com/Efficient-Processing-Networks-Synthesis-Architecture/dp/1681738317/
Samsul Ariffin Abdul Karim, Oct 12, 2022, Intelligent Systems Modeling and Simulation II: Machine Learning, Neural Networks, Efficient Numerical Algorithm and Statistical Methods (Studies in Systems, Decision and Control Book 444) https://www.amazon.com/Intelligent-Systems-Modeling-Simulation-Statistical-ebook/dp/B0BJ1P94WC/
Manpreet Singh Ghotra and Rajdeep Dua, Nov 10, 2017, Neural Network Programming with TensorFlow: Unleash the power of TensorFlow to train efficient neural networks, https://www.amazon.com/Neural-Network-Programming-TensorFlow-efficient-ebook/dp/B077DFVV43/
Lukas Arno Jakob Cavigelli, Qiuting Huang, et al., Jul 26, 2019, Towards Energy-Efficient Convolutional Neural Network Inference, https://www.amazon.com/Towards-Energy-Efficient-Convolutional-Network-Inference/dp/3866286511/
Vgel, December 18, 2023, How to make LLMs go fast, https://vgel.me/posts/faster-inference/
H Xu, Y Song, Q Liu, J van Genabith, D Xiong, 2024, Rewiring the Transformer with Depth-Wise LSTMs, LREC-COLING 2024, pages 14122–14133, 20-25 May, 2024, https://aclanthology.org/2024.lrec-main.1231.pdf
Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar, 10 May 2024, Linearizing Large Language Models, https://arxiv.org/abs/2405.06640 Code: https://github.com/TRI-ML/linear_open_lm
Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui, 7 May 2024, Acceleration Algorithms in GNNs: A Survey, https://arxiv.org/abs/2405.04114
Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei, 9 May 2024 (v2), You Only Cache Once: Decoder-Decoder Architectures for Language Models, https://arxiv.org/abs/2405.05254 Code: https://aka.ms/YOCO (A novel decoder-decoder architecture with fast KV caching and cross-attention.)
Badri Narayana Patro, Vijay Srinivas Agneeswaran, 24 Apr 2024, Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges, https://arxiv.org/abs/2404.16112
Sathya Krishnan Suresh, Shunmugapriya P, 24 Apr 2024 (v2), Towards smaller, faster decoder-only transformers: Architectural variants and their implications, https://arxiv.org/abs/2404.14462 Code: https://github.com/SkAndMl/gpt-variations (Focuses on three new variants of decoder-only Transformer architectures: ParallelGPT (p-gpt), LinearlyCompressedGPT (lc-gpt), and ConvCompressedGPT (cc-gpt).)
Jianhui Pang, Fanghua Ye, Longyue Wang, Dian Yu, Derek F. Wong, Shuming Shi, Zhaopeng Tu, 17 Jan 2024 (v2), Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models, https://arxiv.org/abs/2401.08350 Code: https://github.com/pangjh3/LLM4MT
Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, 1 Dec 2023, The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, https://arxiv.org/abs/2312.00678 Project: https://github.com/tding1/Efficient-LLM-Survey
Jesse Roberts, 2 Feb 2024 (v3), How Powerful are Decoder-Only Transformer Neural Models? https://arxiv.org/abs/2305.17026
Mackenzie Morehead, Apr 16, 2024, Is Attention All You Need? https://www.mackenziemorehead.com/is-attention-all-you-need/
Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas, 11 Apr 2024, RecurrentGemma: Moving Past Transformers for Efficient Open Language Models, Google Research, https://arxiv.org/abs/2404.07839
Georgy Tyukin, 2 Apr 2024, Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations, Masters Thesis, Data Science and Machine Learning, University College London., https://arxiv.org/abs/2404.05741 (Reviews various model compression and inference optimization techniques, and specifically analyzes layer skipping and sublayer skipping, such as attention head pruning and FFN/MLP pruning.)
Stan Gibson, 03 Jun 2024, Getting infrastructure right for generative AI, CIO, https://www.cio.com/article/2128440/getting-infrastructure-right-for-generative-ai.html
Staphord Bengesi, Hoda El-Sayed, Md Kamruzzaman Sarker, Yao Houkpati, John Irungu, Timothy Oladunni, 2023, Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers, 21 Nov 2023, https://arxiv.org/abs/2311.10242
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, Jan 2024, Understanding LLMs: A Comprehensive Overview from Training to Inference https://arxiv.org/abs/2401.02038
Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni, Nov 2023, Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, https://arxiv.org/abs/2311.00871
Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré, Apr 2023, Hyena Hierarchy: Towards Larger Convolutional Language Models, https://arxiv.org/pdf/2302.10866.pdf
Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà, 2 May 2024 (v2), A Primer on the Inner Workings of Transformer-based Language Models, https://arxiv.org/pdf/2405.00208 (Analyzes the theory of the Transformer architecture, including an interesting separation of the effects of attention versus FFNs on logits to give attributions.)
Simeon Emanuilov, Apr 4, 2024 LLM agent operating system (AIOS) and the future of LLM-powered agents, https://medium.com/@simeon.emanuilov/llm-agent-operating-system-aios-and-the-future-of-llm-powered-agents-3d08b4e91c34 https://unfoldai.com/aios-llm-powered-agents/
CAMERON R. WOLFE, PH.D. MAR 04, 2024, Decoder-Only Transformers: The Workhorse of Generative LLMs, https://cameronrwolfe.substack.com/p/decoder-only-transformers-the-workhorse
Rachel Gordon, Publication Date:March 21, 2024, AI generates high-quality images 30 times faster in a single step, MIT News, https://news.mit.edu/2024/ai-generates-high-quality-images-30-times-faster-single-step-0321 (MIT's new image generation framework called "distribution matching distillation" is faster than diffusion models.)
Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang, 22 Mar 2024, Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference, https://arxiv.org/abs/2403.14520 Code: https://sites.google.com/view/cobravlm (Multimodal version of the new Mamba architecture.)
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai, 25 Jan 2024, ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models, https://arxiv.org/abs/2401.14351 Code: https://github.com/ServerlessLLM/ServerlessLLM
Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim, 18 Jan 2024, Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation, https://arxiv.org/abs/2401.08417
Gavin Li, Nov 19, 2023, Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique, AI Advances https://ai.gopubby.com/unbelievable-run-70b-llm-inference-on-a-single-4gb-gpu-with-this-new-technique-93e2057c7eeb
Yumo Bai, Feb 3, 2024 Why are most LLMs decoder-only? Dive into the rabbit hole of recent advancement in Large Language Models, https://medium.com/@yumo-bai/why-are-most-llms-decoder-only-590c903e4789
Christopher Wolters, Xiaoxuan Yang, Ulf Schlichtmann, Toyotaro Suzumura, 12 Jun 2024, Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference, https://arxiv.org/abs/2406.08413
David Spuler, March 2024, Chapter 2. Transformers & LLMs, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou, 2023, Revisiting Vision Transformer from the View of Path Ensemble, https://arxiv.org/abs/2308.06548 PDF: https://arxiv.org/pdf/2308.06548.pdf
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention is all you need, 2017, arXive preprint arXiv:1706.03762. https://arxiv.org/abs/1706.03762
azhar, Dec 29, 2023, Decoding Mamba: The Next Big Leap in AI Sequence Modeling, https://medium.com/ai-insights-cobet/decoding-mamba-the-next-big-leap-in-ai-sequence-modeling-ef3908060cb8
Chen, C, 2024, Hardware‑software co‑exploration and optimization for next‑generation learning machines. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/178423 (Extensive coverage of hardware design with multiple contributions to accelerating various neural network types, ranging from acceleration of various single non-linear functions and end-to-end optimization algorithms. Specific topics include data compression, non-maximum suppression, MHA, and MatMul/GEMM optimizations.)
Louis-François Bouchard, Louie Peters, May 2024, Chapter 2: Architectures, Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG, https://www.amazon.com/Building-LLMs-Production-Reliability-Fine-Tuning/dp/B0D4FFPFW8/
Matt Murphy, Tim Tully, Derek Xiao, January 18, 2024, The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures, Menlo Ventures, https://menlovc.com/perspective/the-modern-ai-stack-design-principles-for-the-future-of-enterprise-ai-architectures/ (Various details about the AI tech stack, organizational AI maturity levels, and several interesting facts: inference is 95% of AI cost now, 60% of organizations are using multi-model methods, RAG is the dominant architecture currently, and AI application development teams are primarily made up of non-ML software engineers leveraging on top of AI models.)
MongoDB, Jun 20, 2024, Understanding the AI Stack In the Era of Generative AI: Exploring the Layers and Components of Today’s AI Applications https://medium.com/mongodb/understanding-the-ai-stack-in-the-era-of-generative-ai-f1fcd66e1393
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
Chip Huyen, Jul 25, 2024, Building A Generative AI Platform, https://huyenchip.com/2024/07/25/genai-platform.html
Yorick Sens, Henriette Knopp, Sven Peldszus, Thorsten Berger, 12 Aug 2024, A Large-Scale Study of Model Integration in ML-Enabled Software Systems, https://arxiv.org/abs/2408.06226
Rohan Baskar Prabhakar, Hengrui Zhang, David Wentlzaff, 14 Aug 2024, Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference, https://arxiv.org/abs/2408.07802 (Modified Transformer architecture with parallelized sub-layers of attention and FFN.)
Hugo Laurençon, Andrés Marafioti, Victor Sanh, Léo Tronchon, 22 Aug 2024, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637
Tymofii Reizin, 2024, Fast Algorithms for Attention Mechanism, Bachelor Thesis, Department of Applied Mathematics, Charles University, Prague, https://dspace.cuni.cz/bitstream/handle/20.500.11956/192084/130390128.pdf?sequence=1
Minghao Shao, Abdul Basit, Ramesh Karri, Muhammad Shafique, Architectures: Trends, Benchmarks, and Challenges, https://www.researchgate.net/profile/Minghao_Shao2/publication/383976933Survey of different Large Language Model_Survey_of_different_Large_Language_Model_Architectures_Trends_Benchmarks_and_Challenges/links/66e2d320f84dd1716ce79f85/Survey-of-different-Large-Language-Model-Architectures-Trends-Benchmarks-and-Challenges.pdf
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping, 17 Sep 2024, NVLM: Open Frontier-Class Multimodal LLMs, NVIDIA, https://arxiv.org/abs/2409.11402 https://huggingface.co/nvidia/NVLM-D-72B https://nvlm-project.github.io/
Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo, 17 Oct 2024, Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation, https://arxiv.org/abs/2410.13848 https://github.com/deepseek-ai/Janus?tab=readme-ov-file
Lak Lakshmanan, Oct 4, 2024, How to Choose the Architecture for Your GenAI Application. A framework to select the simplest, fastest, cheapest architecture that will balance LLMs’ creativity and risk, https://towardsdatascience.com/how-to-choose-the-architecture-for-your-genai-application-6053e862c457
Dr. Ashish Bamania, Nov 2024, Vision Transformers Completely Redefine How AI Perceives The Real World: A deep dive into the Vision Transformer (ViT) architecture that transformed Computer Vision and learning to build one from scratch, https://levelup.gitconnected.com/vision-transformers-completely-redefine-how-ai-perceives-the-real-world-e3a06b826760
From Transformers to the Future: An In-Depth Exploration of Modern Language Model Architectures H Xu, Z Bi, H Tseng, X Song, P Feng, https://osf.io/n8r5j/download
Narcisa Guran, Florian Knauf, Man Ngo, Stefan Petrescu, Jan S. Rellermeyer, 21 Nov 2024, Towards a Middleware for Large Language Models, https://arxiv.org/abs/2411.14513
Akash Bajwa, Feb 03, 2025, Forward Deployed Engineers: A Means To An End For AI Startups: Capturing Business Logic And Expert Reasoning, https://akashbajwa.substack.com/p/forward-deployed-engineers-a-means (" AI truly is a new way of computing, and that means the better analogies are to computing itself. Transformers are the transistor, and mainframes are today’s models. The GUI is, arguably, still TBD.")
Rajeshkumar Bambhaniya, Abhimanyu ; Wu, Hanjiang ; Subramanian, Suvinay ; Srinivasan, Sudarshan ; Kundu, Souvik ; Yazdanbakhsh, Amir ; Elavazhagan, Midhilesh ; Kumar, Madhu ; Krishna, Tushar, April 2025, Understanding and Optimizing Multi-Stage AI Inference Pipelines, https://ui.adsabs.harvard.edu/abs/2025arXiv250409775R/abstract https://arxiv.org/abs/2504.09775
Devansh, Jun 1, 2025, The Costly Open-Source LLM Lie: Open Source LLMs are not Free, https://machine-learning-made-simple.medium.com/the-costly-open-source-llm-lie-f83fdc5d5701
Sebastian Raschka, Jul 19, 2025, The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design, https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison