Aussie AI

Retrieval Augmented Generation (RAG) Architectures

  • Last Updated 30 August, 2025
  • by David Spuler, Ph.D.

What is RAG?

RAG is a fundamental technique in generative AI that extends the knowledge of an LLM without fine-tuning. Rather than train new knowledge in the LLM's parameters, we instead look up the extra information by searching a database. The LLM receives the user's prompt and the extra information found by the RAG lookup (called the "retriever" component). The LLM then uses its summarization and natural language capabilities to answer the user's question, based on the extra RAG text as input context.

RAG is commonly used as the go-to architecture for fine-tuning an LLM on a business's specialist data. For example, to create a chatbot that knows about your products, you could use fine-tuning to create a custom LLM that knows about your products. The more efficient way is to leave your LLM unchanged, but put your special documents into a RAG database (e.g. your entire website), and then have the LLM search these documents using a RAG architecture.

The current capabilities of Google and Bing with AI assistants are a RAG-like architecture, but more like a mega-RAG architecture, using a rather large database of documents. The way it works is that Google or Bing first search the entire internet (however they do this), and then the LLM summarizes the handful of internet documents into the final AI answer.

Beyond RAG

There's a lot of different variations on the RAG architecture. Also, RAG architectures can be extended in various ways. Some of the similar capabilities with "augmentation" of the LLM's input prompt with extra data include:

  • Retrieval Augmented Language Models (RALM) — the most general category including augmentation by basically anything; see more about RALM.
  • Tool-Augmented Language Models (TALM) — use dynamic tool execution to compute extra input data. See more about tool integrations.
  • Data source integrations ("plugins") — extended ways to search big databases, such as real estate listing or the entire internet, using a RAG-like approach.

Finally, note that RAG is an inherently "read-only" approach that only generates answers. It doesn't change anything for the user, and the generalization of that idea is "agents" that can do real-world actions (i.e., they're "read-write" and can do "actions"). For example, RAG could maybe tell you what your symptoms might be caused by, but an LLM agent can also book your doctor's appointment for you.

RAG Optimizations

RAG optimizations are LLM efficiency improvements applied to a RAG architecture. First point: RAG architectures are inherently an optimization, themselves. RAG was created because fine-tuning was too expensive and has various other limitations (e.g., attribution, explainability), although Parameter-Efficient Fine-Tuning (PEFT) techniques have also attacked the inefficiences in fine-tuning, so maybe it's a tie between RAG and FT/PEFT.

But you can also optimize your RAG architecture. The first point is that many of the major LLM optimizations also work on the RAG LLM, so there's many ways to do this (e.g., quantization, pruning, inference optimizations, etc.)

However, there are a few techniques that are specifically applicable to RAG architectures because they optimize either (a) non-LLM RAG components, or (b) the RAG prompt structure.

Some examples of RAG non-LLM optimizations include:

  • RAG database speedups (e.g., indexing, all the usual database stuff)
  • Keyword versus vector lookups in the retriever (e.g., hybrid keyword-vector search, metadata search, etc.)
  • Caching — multiple types (e.g. caching in the retriever versus the LLM parts)

Secondly, there are some RAG-specific techniques on the "length" dimension (i.e., input tokens), that are applicable to an input prompt that is extended with extra prepended "context" tokens. Some examples include:

RAG is not the only architecture to use prepended context. For example, chatbots prepend the conversation history, so many of these approaches apply there too.

RAG Survey Papers

Survey papers on RAG architectures:

  • Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
  • Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
  • Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
  • Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu, 13 Feb 2022 (v2), A Survey on Retrieval-Augmented Text Generation, https://arxiv.org/abs/2202.01110
  • Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui, 21 Jun 2024 (v6), Retrieval-Augmented Generation for AI-Generated Content: A Survey, https://arxiv.org/abs/2402.19473
  • Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu, 3 Jul 2024 (v2), Evaluation of Retrieval-Augmented Generation: A Survey, https://arxiv.org/abs/2405.07437
  • Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang, 27 Mar 2024 (v5), Retrieval-Augmented Generation for Large Language Models: A Survey, https://arxiv.org/abs/2312.10997
  • Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543
  • Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
  • Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
  • Chaitanya Sharma, 28 May 2025, Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers, https://arxiv.org/abs/2506.00054
  • Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401

RAG Tool Usage

Research on RAG architectures using dynamic tools:

  • Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
  • Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
  • Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
  • Mengsong Wu, Tong Zhu, Han Han, Xiang Zhang, Wenbiao Shao, Wenliang Chen, 21 Mar 2025, Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models, https://arxiv.org/abs/2503.16779 https://github.com/fairyshine/Chain-of-Tools
  • Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin, 6 Aug 2025, TURA: Tool-Augmented Unified Retrieval Agent for AI Search, https://arxiv.org/abs/2508.04604
  • Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401

RAG Reasoning

Research papers on reasoning models and RAG include:

  • B Zhan, A Li, X Yang, D He, Y Duan, S Yan, 2024, RARoK: Retrieval-Augmented Reasoning on Knowledge for Medical Question Answering, 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2837-2843, DOI: 10.1109/BIBM62325.2024.10822341, https://www.computer.org/csdl/proceedings-article/bibm/2024/10822341/23onp6dXOSI (RAG combined with Chain-of-Thought for medical reasoning.)
  • Xinyan Guan, Jiali Zeng, Fandong Meng, Chunlei Xin, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Jie Zhou. 3 Feb 2025, DeepRAG: Thinking to Retrieval Step by Step for Large Language Models, https://arxiv.org/abs/2502.01142
  • P Verma, SP Midigeshi, G Sinha, A Solin, N Natarajan, Mar 2025, Plan *RAG: Efficient Test-Time Planning for Retrieval Augmented Generation, ICLR 2025 review, https://openreview.net/pdf?id=gi9aqlYdBk (Improve RAG reasoning efficiency via planning for parallel reasoning.)
  • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
  • Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
  • Yu Wang, Shiwan Zhao, Zhihu Wang, Ming Fan, Yubo Zhang, Xicheng Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu, 4 Jul 2025 (v3), RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning, https://arxiv.org/abs/2506.11555
  • Yunfan Gao, Yun Xiong, Yijie Zhong, Yuxi Bi, Ming Xue, Haofen Wang, 24 Apr 2025 (v2), Synergizing RAG and Reasoning: A Systematic Review, https://arxiv.org/abs/2504.15909
  • Weitao Li, Boran Xiang, Xiaolong Wang, Zhinan Gou, Weizhi Ma, Yang Liu, 8 Aug 2025, UR: Unify RAG and Reasoning through Reinforcement Learning, https://arxiv.org/abs/2508.06165 https://github.com/Tsinghua-dhy/UR2

RAG Best Practices

RAG best practices are practical guidelines on getting the most out of your RAG architecture. This can include accuracy improvements and efficiency optimizations. Research papers that examine the general state of RAG architectures in terms of their best practices include:

  • Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
  • Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
  • Siran Li, Linus Stenzel, Carsten Eickhoff, Seyed Ali Bahrainian, 13 Jan 2025, Enhancing Retrieval-Augmented Generation: A Study of Best Practices, https://arxiv.org/abs/2501.07391 https://github.com/ali-bahrainian/RAG_best_practices (Examines RAG best practices such as model size, prompt wording, chunk size, knowledge base size, and more.)
  • Harvey Bower, 2024, Debugging RAG Pipelines: Best Practices for High-Performance LLMs, https://www.amazon.com/dp/B0DNWN5RB1
  • Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296

Chunking

Chunking is the splitting of documents into sections called "chunks" that are used as extra context for the LLM. Retrieving relevant chunks is very important for accurate RAG results, and the speed of a RAG system is also affected by the size of each chunk, as measured in tokens. Chunking is a complex issue that needs to decide where to split a document, such as at paragraph or section separators.

Research papers on chunking:

  • Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
  • Thuwarakesh Murallie, Aug 2024, How to Achieve Near Human-Level Performance in Chunking for RAGs: The costly yet powerful splitting technique for superior RAG retrieval, https://towardsdatascience.com/agentic-chunking-for-rags-091beccd94b1
  • Florian June, Sep 2024, Kotaemon Unveiled: Innovations in RAG Framework for Document QA: PDF Parsing, GraphRAG, Agent-Based Reasoning, and Insights, https://ai.gopubby.com/kotaemon-unveiled-innovations-in-rag-framework-for-document-qa-0b6d67e4b9b7
  • Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
  • Brandon Smith, Anton Troynikov, July 03, 2024, Evaluating Chunking Strategies for Retrieval, Chroma Technical Report, https://research.trychroma.com/evaluating-chunking https://github.com/brandonstarxel/chunking_evaluation
  • Siran Li, Linus Stenzel, Carsten Eickhoff, Seyed Ali Bahrainian, 13 Jan 2025, Enhancing Retrieval-Augmented Generation: A Study of Best Practices, https://arxiv.org/abs/2501.07391 https://github.com/ali-bahrainian/RAG_best_practices (Examines RAG best practices such as model size, prompt wording, chunk size, knowledge base size, and more.)
  • Sergey Filimonov, Jan 15, 2025, Ingesting Millions of PDFs and why Gemini 2.0 Changes Everything, https://www.sergey.fyi/articles/gemini-flash-2
  • Andrew Neeser, Kaylen Latimer, Aadyant Khatri, Chris Latimer, Naren Ramakrishnan, 16 Feb 2025, QuOTE: Question-Oriented Text Embeddings, https://arxiv.org/abs/2502.10976 (Augmenting RAG chunks with additional information, such as questions the chunk might answer.)
  • Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401
  • Robin D. Pesl, Jerin G. Mathew, Massimo Mecella, Marco Aiello, 28 Jul 2025, Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation, https://arxiv.org/abs/2411.19804
  • Takumi Kobayashi, Masato Kobayashi, Thanpimon Buamanee, Yuki Uranishi, 28 Jul 2025, Bi-LAT: Bilateral Control-Based Imitation Learning via Natural Language and Action Chunking with Transformers, https://arxiv.org/abs/2504.01301
  • Mehrdad Zakershahrak, Samira Ghodratnama, 7 Aug 2025, H-Net++: Hierarchical Dynamic Chunking for Tokenizer-Free Language Modelling in Morphologically-Rich Languages, https://arxiv.org/abs/2508.05628

Multimodal RAG

Multimodal RAG is the use of images in the datastore for chunk retrieval, and is also sometimes called "visual RAG." A common example of multimodal RAG is ingesting PDF documents in their native format, using image-based analysis, rather than converting them to text. The retriever in multimodal RAG may return images and/or text to be passed to the Multimodal LLM (MLLM) for inference. The final output from the visual RAG system may be text or images or both, as with any other use of a multimodal LLM.

Multimodal RAG is one of the newest areas of AI research, combining the recent advances in multimodal LLMs with the older RAG architectural styles. Research papers on multimodal RAG (visual RAG):

RAG Fusion

RAG fusion is a RAG extension that incorporates analyzing multiple versions of the query to return the best context chunks. The model generates multiple "reformulated" versions of the original text query, each of which is sent to the retriever, and a final use of "Reciprocal Rank Fusion" combines all of the returned chunks into a single ranking, like a "reranker" component, but using multiple similar rankings. The main advantage is finding more accurate context for the LLM, and the downside is the many additional calls to the retriever database with slightly modified queries.

Research on RAG fusion algorithms:

Super RAG

Super RAG is a generalization of retrieval to accept more general information than naive RAG systems. Hence, a "super RAG" system is an embodiment of a more general type of RALM. Research papers on "super RAG" include:

Agentic RAG

Agentic RAG is the combination of agent and RAG technologies. Traditional RAG is a read-only use of extra context, but adding agent capabilities to the system allows a RAG-based application to perform tasks or actions.

Papers on agentic RAG include:

Reranker Component in RAG

The reranker is a RAG component that aims to calibrate the best chunk for the LLM to use. The input is a set of chunks or documents from the retriever in a preliminary ordering, which are then "re-ranked" into a better order. The basic idea is:

  • Retriever returns several chunks
  • Reranker orders them in priority of relevance
  • Packer merges the chunks with the user's query and other global instructions
  • One final LLM request answers the user's question

Here are some research papers specific to the reranker component:

Long Context RAG

Long context RAG, or simply "long RAG", is the use of LLM long context capabilities to improve RAG architectures. The simplest ideas include using bigger chunks or sending more chunks to the LLM, both of which give more tokens for the LLM to process as context. There is a lot of research on getting LLMs to run fast on long context inputs, and some of this is specially related to RAG architectures.

Research papers on "long RAG" include:

Mini-RAG

Mini-RAG is single-document RAG that stores the entirety of the knowledge base in the LLM's input context. The advantage of this architecture is that there is no need for a retriever component at all, but the disadvantages include token counts for inference, and practical limitations on the size of the document being used. Efficiency constraints are crumbling lately, viz "long RAG" based on LLM efficiency optimizations, such as prefix KV caching.

Research papers on single-document RAG or "mini-RAG" include:

RAG Knowledge Graph

A RAG Knowledge Graph architecture, or a "RAG Graph," is a combination of RAG with a Knowledge Graph. Instead of returning text chunks, the retriever returns a structured "graph" that represents additional knowledge. The advantage of a graph is that it contains concept relationships such as hierarchies.

Research on RAG with Knowledge Graphs:

Ontology RAG

Ontology-based RAG is the use of a special type of Knowledge Graph, known as an "ontology" or "taxonomy" of the concept space. Extra information can be extracted from the taxonomy as a special type of retrieval for RAG-based systems. The advantage is the ability to better capture structured information and hierarchical relationships between concepts in the ontology.

Research papers on LLMs and Ontologies include:

  • Prajwal Kailas, Max Homilius, Rahul C. Deo, Calum A. MacRae, 16 Dec 2024, NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text, https://arxiv.org/abs/2412.11477
  • Muhayy Ud Din, Jan Rosell, Waseem Akram, Isiah Zaplana, Maximo A Roa, Lakmal Seneviratne, Irfan Hussain, 10 Dec 2024, Ontology-driven Prompt Tuning for LLM-based Task and Motion Planning, https://arxiv.org/abs/2412.07493 https://muhayyuddin.github.io/llm-tamp/ (Detecting objects in the prompt text and then using a RALM algorithm to query an ontology database.)
  • Oleksandr Palagin, Vladislav Kaverinskiy, Anna Litvin, Kyrylo Malakhov, 11 Jul 2023, OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning, International Journal of Computing, 22(2), 170-183, https://arxiv.org/abs/2307.05082 https://doi.org/10.47839/ijc.22.2.3086 https://computingonline.net/computing/article/view/3086
  • Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
  • Kartik Sharma, Peeyush Kumar, Yunqing Li, 12 Dec 2024, OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models, https://arxiv.org/abs/2412.15235
  • Chengshuai Zhao, Garima Agrawal, Tharindu Kumarage, Zhen Tan, Yuli Deng, Ying-Chih Chen, Huan Liu, 10 Dec 2024, Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education, https://arxiv.org/abs/2412.14191
  • Ramona Kühn, Jelena Mitrović, Michael Granitzer, 18 Dec 2024, Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration, https://arxiv.org/abs/2412.13799
  • Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang, 13 Sep 2024, A RAG Approach for Generating Competency Questions in Ontology Engineering, https://arxiv.org/abs/2409.08820
  • Rafael Teixeira de Lima, Shubham Gupta, Cesar Berrospi, Lokesh Mishra, Michele Dolfi, Peter Staar, Panagiotis Vagenas, 29 Nov 2024, Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems, https://arxiv.org/abs/2411.19710
  • Yuxing Lu, Sin Yee Goi, Xukai Zhao, Jinzhuo Wang, 22 Jan 2025 (v2), Biomedical Knowledge Graph: A Survey of Domains, Tasks, and Real-World Applications, https://arxiv.org/abs/2501.11632
  • Battazza, I. F. C., Rodrigues, C. M. d. O., & Oliveira, J. F. L. d. (2025). A Framework for Market State Prediction with Ontological Asset Selection: A Multimodal Approach. Applied Sciences, 15(3), 1034. https://doi.org/10.3390/app15031034 https://www.mdpi.com/2076-3417/15/3/1034
  • AD Al Hauna, AP Yunus, M Fukui, S Khomsah - International Journal on Robotics, Apr 2025, Enhancing LLM Efficiency: A Literature Review of Emerging Prompt Optimization Strategies, https://doi.org/10.33093/ijoras.2025.7.1.9 https://mmupress.com/index.php/ijoras/article/view/1311 PDF: https://mmupress.com/index.php/ijoras/article/view/1311/834
  • Jean-Philippe Corbeil, Amin Dada, Jean-Michel Attendu, Asma Ben Abacha, Alessandro Sordoni, Lucas Caccia, François Beaulieu, Thomas Lin, Jens Kleesiek, Paul Vozila, 15 May 2025, A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment, https://arxiv.org/abs/2505.10717
  • Junde Wu, Jiayuan Zhu, Yunli Qi, Jingkun Chen, Aug 2025, Min Xu, Filippo Menolascina, Yueming Jin, Vicente Grau, Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28443–28467 July 27- August 1, 2025, https://aclanthology.org/2025.acl-long.1381.pdf
  • Ziheng Zhang, Zhenxi Lin, Yefeng Zheng, and Xian Wu. 2025. How much Medical Knowledge do LLMs have? An Evaluation of Medical Knowledge Coverage for LLMs. In Proceedings of the ACM on Web Conference 2025 (WWW '25). Association for Computing Machinery, New York, NY, USA, 5330–5341. https://doi.org/10.1145/3696410.3714535 https://dl.acm.org/doi/abs/10.1145/3696410.3714535 https://dl.acm.org/doi/pdf/10.1145/3696410.3714535
  • Yan Ting Chok, Soyon Park, Seungheun Baek, Hajung Kim, Junhyun Lee, Jaewoo Kang, 14 Aug 2025, HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation, https://arxiv.org/abs/2508.10425
  • Yiping Song, Jiaoyan Chen and Renate A. Schmidt, 14 Aug 2025, GenOM: Ontology Matching with Description Generation and Large Language Model, https://arxiv.org/abs/2508.10703
  • Qing Cheng, Zefan Zeng, Xingchen Hu, Yuehang Si, Zhong Liu, 23 Jul 2025, A Survey of Event Causality Identification: Taxonomy, Challenges, Assessment, and Prospects, https://arxiv.org/abs/2411.10371
  • Stefan Borgwardt, Duy Nhu, Gabriele R\"oger, 23 Jul 2025, Automated planning with ontologies under coherence update semantics (Extended Version), https://arxiv.org/abs/2507.15120
  • Lam Nguyen and Erika Barcelos and Roger French and Yinghui Wu, 18 Jul 2025, KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models, https://arxiv.org/abs/2507.14032
  • Oussama Bouaggad, Natalia Grabar, 18 Jul 2025, Search-Optimized Quantization in Biomedical Ontology Alignment, https://arxiv.org/abs/2507.13742
  • Hui Yang, Jiaoyan Chen, Yuan He, Yongsheng Gao, Ian Horrocks, 18 Jul 2025, Language Models as Ontology Encoders, https://arxiv.org/abs/2507.14334
  • Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskis\"arkk\"a, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese, 19 Jul 2025, Large Language Models Assisting Ontology Evaluation, https://arxiv.org/abs/2507.14552
  • Ritesh Chandra, Shashi Shekhar Kumar, Rushil Patra, Sonali Agarwal, 21 Jul 2025, Decision support system for Forest fire management using Ontology with Big Data and LLMs, https://arxiv.org/abs/2405.11346
  • Devichand Budagam, Ashutosh Kumar, Mahsa Khoshnoodi, Sankalp KJ, Vinija Jain, Aman Chadha, 21 Jul 2025, Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles, https://arxiv.org/abs/2406.12644
  • Soumen Sinha, Tanisha Rana, Rahul Roy, 22 Jul 2025, A novel approach to navigate the taxonomic hierarchy to address the Open-World Scenarios in Medicinal Plant Classification, https://arxiv.org/abs/2502.17289
  • Maurice Funk, Marvin Grosser, Carsten Lutz, 11 Aug 2025, Fitting Description Logic Ontologies to ABox and Query Examples, https://arxiv.org/abs/2508.08007
  • Xiaohua Feng,Jiaming Zhang,Fengyuan Yu,Chengye Wang,Li Zhang,Kaixiang Li,Yuyuan Li,Chaochao Chen,Jianwei Yin, 26 Jul 2025, A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction, https://arxiv.org/abs/2507.19894
  • Md Fantacher Islam, Jarrod Mosier, Vignesh Subbian, 26 Jul 2025, NIRS: An Ontology for Non-Invasive Respiratory Support in Acute Care, https://arxiv.org/abs/2507.19992
  • Joydeep Chandra and Satyam Kumar Navneet, 26 Jul 2025, Policy-Driven AI in Dataspaces: Taxonomy, Explainability, and Pathways for Compliant Innovation, https://arxiv.org/abs/2507.20014
  • Wenbin Guo, Xin Wang, Jiaoyan Chen, Zhao Li and Zirui Chen, 28 Jul 2025, Ontology-Enhanced Knowledge Graph Completion using Large Language Models, https://arxiv.org/abs/2507.20643
  • Federico Donato and Adrien Barton, 26 Jul 2025, An ontological analysis of risk in Basic Formal Ontology, https://arxiv.org/abs/2507.21171
  • Vishal Raman, Vijai Aravindh R, 29 Jul 2025, Evo-DKD: Dual-Knowledge Decoding for Autonomous Ontology Evolution in Large Language Models, https://arxiv.org/abs/2507.21438
  • Sabrina Patania, Luca Annese, Cansu Koyuturk, Azzurra Ruggeri, Dimitri Ognibene, 25 May 2025, Dialogic Social Learning for Artificial Agents: Enhancing LLM Ontology Acquisition through Mixed-Initiative Educational Interactions, https://arxiv.org/abs/2507.21065
  • Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade, 31 Jul 2025, Tractable Responsibility Measures for Ontology-Mediated Query Answering, https://arxiv.org/abs/2507.23191
  • Zhangcheng Qiang, Kerry Taylor, Weiqing Wang, Jing Jiang, 25 Mar 2025, OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching, https://arxiv.org/abs/2503.21813
  • Renato Vukovic, Carel van Niekerk, Michael Heck, Benjamin Ruppik, Hsien-Chin Lin, Shutong Feng, Nurul Lubis, Milica Gasic, 31 Jul 2025, Text-to-SQL Task-oriented Dialogue Ontology Construction, https://arxiv.org/abs/2507.23358
  • Haonan Bian, Yutao Qi, Rui Yang, Yuanxi Che, Jiaqian Wang, Heming Xia, Ranran Zhen, 2 Aug 2025, From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs, https://arxiv.org/abs/2508.01424
  • Manuel Cossio, 3 Aug 2025, A comprehensive taxonomy of hallucinations in Large Language Models, https://arxiv.org/abs/2508.01781
  • Yuki Yamagata, Koji Kyoda, Hiroya Itoga, Emi Fujisawa and Shuichi Onami, 4 Aug 2025, SSBD Ontology: A Two-Tier Approach for Interoperable Bioimaging Metadata, https://arxiv.org/abs/2508.02084
  • Haoran Sun, Yusen Wu, Peng Wang, Wei Chen, Yukun Cheng, Xiaotie Deng, Xu Chu, 5 Aug 2025, Game Theory Meets Large Language Models: A Systematic Survey with Taxonomy and New Frontiers, https://arxiv.org/abs/2502.09053
  • Alessia Pisu, Livio Pompianu, Francesco Osborne, Diego Reforgiato Recupero, Daniele Riboni, Angelo Salatino, 6 Aug 2025, A Hybrid AI Methodology for Generating Ontologies of Research Topics from Scientific Paper Corpora, https://arxiv.org/abs/2508.04213
  • Yuyang Liu, Qiuhe Hong, Linlan Huang, Alexandra Gomez-Villa, Dipam Goswami, Xialei Liu, Joost van de Weijer, Yonghong Tian, 6 Aug 2025, Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting, https://arxiv.org/abs/2508.04227
  • Sigma Jahan, Saurabh Singh Rajput, Tushar Sharma, Mohammad Masudur Rahman, 6 Aug 2025, Taxonomy of Faults in Attention-Based Neural Networks, https://arxiv.org/abs/2508.04925
  • Anouk Oudshoorn, Magdalena Ortiz, Mantas Simkus, 16 Jul 2025, SHACL Validation in the Presence of Ontologies: Semantics and Rewriting Techniques, https://arxiv.org/abs/2507.12286
  • Sviatoslav Lushnei, Dmytro Shumskyi, Severyn Shykula, Ernesto Jimenez-Ruiz, Artur d'Avila Garcez, 11 Aug 2025, Large Language Models as Oracles for Ontology Alignment, https://arxiv.org/abs/2508.08500
  • Amir Mohammad Salehoof, Ali Ramezani, Yadollah Yaghoobzadeh, Majid Nili Ahmadabadi, 12 Aug 2025, A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions, https://arxiv.org/abs/2508.08795
  • Farzana Zahid, Anjalika Sewwandi, Lee Brandon, Vimal Kumar, Roopak Sinha, 12 Aug 2025, Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment, https://arxiv.org/abs/2508.08629
  • Jiawei Zhou, Amy Z. Chen, Darshi Shah, Laura M. Schwab Reese, and Munmun De Choudhury, 11 Aug 2025, A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health, https://arxiv.org/abs/2411.02594
  • David J. Moore, 18 Aug 2025, A Taxonomy of Hierarchical Multi-Agent Systems: Design Patterns, Coordination Mechanisms, and Industrial Applications, https://arxiv.org/abs/2508.12683
  • Zabir Al Nazi, Vagelis Hristidis, Aaron Lawson McLean, Jannat Ara Meem and Md Taukir Azam Chowdhury, 15 Aug 2025, Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models, https://arxiv.org/abs/2508.11784
  • Simon Hosemann, Jean Christoph Jung, Carsten Lutz, Sebastian Rudolph, 11 Aug 2025, Fitting Ontologies and Constraints to Relational Structures, https://arxiv.org/abs/2508.13176
  • Hui Wei, Dong Yoon Lee, Shubham Rohal, Zhizhang Hu, Ryan Rossi, Shiwei Fang, Shijia Pan, 21 Aug 2025, A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis, https://arxiv.org/abs/2506.12263
  • Runxuan Liu, Bei Luo, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, Bing Qin, 21 Aug 2025, Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering, https://arxiv.org/abs/2502.11491
  • John Beverley and Danielle Limbaugh, 26 Jul 2025, Ontological Foundations of State Sovereignty, https://arxiv.org/abs/2507.21172
  • Michael Banf and Johannes Kuhn, 22 Aug 2025, Tripartite-GraphRAG via Plugin Ontologies, https://arxiv.org/abs/2504.19667
  • Natalie Abreu, Edwin Zhang, Eran Malach, Naomi Saphra, 25 Aug 2025, A Taxonomy of Transcendence, https://arxiv.org/abs/2508.17669

RAG Caching

RAG caching is the use of caching optimizations to improve the latency and speed of a RAG system. Several components in a RAG architecture can be optimized with a cache. The retrieval component can use all of the types of caching that are applicable to whatever database or datastore architecture it uses, irrespective whether it's keyword or vector lookup, and whether stored on disk or cached in memory. All of these different retrieval options can have a cache. At the bottom level of the LLM, there are various KV caching techniques (see further below). At the topmost level, there can be an overall cache via an "inference cache" for exactly identical queries, or a "semantic cache" for similar queries.

Research papers on RAG cache architectures:

RAG KV Caching Optimizations

KV caching optimizations are the storing of Key-Vector data from LLM inference for use in subsequent inference requests in a RAG system. In addition to RAG caches, such as retrieval caches, there are various LLM cache methods. Several of the many types of KV caching optimizations can optimize RAG architectures (and other LLM use cases). The main KV cache techniques involve precomputed caches for RAG chunks, such as prefix caching or session caching. More information is available:

Other general types of caching that apply to any LLM system, and can be used with RAG:

RAG Optimization Research Papers

Research papers on optimization of RAG architectures:

General Research Papers on RAG

There are rather a lot of research papers on RAG, as its a fundamental underpinning technique of generative AI. Here's a few of them:

  • Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna, 3 Jun 2024, Demystifying Platform Requirements for Diverse LLM Inference Use Cases, https://arxiv.org/abs/2406.01698 Code: https://github.com/abhibambhaniya/GenZ-LLM-Analyzer (Analysis of cost of serving LLMs, including separate profiles of prefill versus decoding phases, and the cost of extra prompt processing in RAG architectures with prepended information.)
  • Timo Lehto, June 2024, Developing LLM-powered Applications Using Modern Frameworks, Bachelor’s Thesis, Information and Communications Technology, Jamk University of Applied Sciences, Finland, June 2024, 53 pages., https://www.theseus.fi/bitstream/handle/10024/862271/Lehto_Timo.pdf?sequence=2 (Building LLM-based applications in RAG architecture using LangChain.)
  • Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
  • Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543 Project: https://github.com/2471023025/RALM_Survey
  • Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, 22 Apr 2024, A Survey on Efficient Inference for Large Language Models, https://arxiv.org/abs/2404.14294
  • Mandar Karhade, Mar 20, 2024, Why RAG Applications Fail in Production, Towards AI, https://pub.towardsai.net/why-rag-applications-fail-in-production-a-technical-deep-dive-15cc976af52c
  • Priyank Rathod, May 21, 2024, Efficient Usage of RAG Systems in the World of LLMs, https://www.techrxiv.org/doi/full/10.36227/techrxiv.171625877.73379410/v1
  • June 2024 (accessed), R2R: The ultimate open-source RAG framework, https://github.com/SciPhi-AI/R2R
  • Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Bin Cui, 27 Mar 2024 (v2), Retrieval-Augmented Generation for AI-Generated Content: A Survey, https://arxiv.org/abs/2402.19473 Project: https://github.com/hymie122/RAG-Survey
  • Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe, 12 Jan 2024, The Unreasonable Effectiveness of Easy Training Data for Hard Tasks, https://arxiv.org/abs/2401.06751
  • Bijit Ghosh, Dec 25, 2023, Advanced RAG for LLMs/SLMs, Medium, https://medium.com/@bijit211987/advanced-rag-for-llms-slms-5bcc6fbba411
  • Iulia Brezeanu, Jan 5, 2024, How to Cut RAG Costs by 80% Using Prompt Compression, Towards Data Science, https://towardsdatascience.com/how-to-cut-rag-costs-by-80-using-prompt-compression-877a07c6bedb
  • James Nguyen, Nov 19, 2023, Forget RAG: Embrace agent design for a more intelligent grounded ChatGPT! https://james-tn.medium.com/forget-rag-embrace-agent-design-for-a-more-intelligent-grounded-chatgpt-6c562d903c61
  • Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, Apr 2021, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401
  • Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang, 3 Jun 2024 (v2), CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion, https://arxiv.org/abs/2405.16444 Code: https://github.com/YaoJiayi/CacheBlend.git (Generalizes prefix KV caching to KV cache fusion with selective recomputation of some KV cache data.)
  • David Spuler, March 2024, Chapter 6. Training, Fine-Tuning & RAG, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
  • Tiernan Ray, June 3, 2024, Make room for RAG: How Gen AI's balance of power is shifting, https://www.zdnet.com/article/make-room-for-rag-how-gen-ais-balance-of-power-is-shifting/
  • Shicheng Xu, Liang Pang, Mo Yu, Fandong Meng, Huawei Shen, Xueqi Cheng, Jie Zhou, 12 Jun 2024 (v2), Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation, https://arxiv.org/abs/2402.18150 (Analysis about how LLMs can mishandle information retrieved from a datastore and how to make LLMs better at handling RAG information using a specialized training regime.)
  • Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
  • Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
  • Ziyan Jiang, Xueguang Ma, Wenhu Chen, June 2024, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, arXiv preprint arXiv:2406.15319, https://arxiv.org/abs/2406.15319 (Improved accuracy performance of RAG methods when using a long context LLM and longer chunk sizes for the retriever.)
  • Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
  • Louis-François Bouchard, Louie Peters, May 2024, Chapter 7: RAG, and Chapter 8, Advanced RAG, Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG, https://www.amazon.com/Building-LLMs-Production-Reliability-Fine-Tuning/dp/B0D4FFPFW8/
  • Matt Murphy, Tim Tully, Derek Xiao, January 18, 2024, The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures, Menlo Ventures, https://menlovc.com/perspective/the-modern-ai-stack-design-principles-for-the-future-of-enterprise-ai-architectures/ (Various details about the AI tech stack, organizational AI maturity levels, and several interesting facts: inference is 95% of AI cost now, 60% of organizations are using multi-model methods, RAG is the dominant architecture currently, and AI application development teams are primarily made up of non-ML software engineers leveraging on top of AI models.)
  • Anirban Ghoshal, July 3, 2024, AWS approach to RAG evaluation could help enterprises reduce AI spending, https://www.infoworld.com/article/3715629/aws-new-approach-to-rag-evaluation-could-help-enterprises-reduce-ai-spending.html
  • Yi Zhou, Dec 16, 2023, Optimizing GenAI: Comparing Model Training, Fine-Tuning, RAG, and Prompt Engineering, https://medium.com/generative-ai-revolution-ai-native-transformation/optimizing-genai-comparing-model-training-fine-tuning-rag-and-prompt-engineering-7a7c6c65e0f0
  • Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
  • Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
  • Chip Huyen, Jul 25, 2024, Building A Generative AI Platform, https://huyenchip.com/2024/07/25/genai-platform.html
  • Akash Bajwa and Chia Jeng Yang, May 27, 2024, The RAG Stack: Featuring Knowledge Graphs: Reducing Hallucinations To Make LLMs Production-Grade With Complex RAG, https://akashbajwa.substack.com/p/the-rag-stack-featuring-knowledge
  • Chips Ahoy Capital, Jul 02, 2024, Evolution of Databases in the World of AI Apps, https://chipsahoycapital.substack.com/p/evolution-of-databases-in-the-world
  • Pavan Belagatti, Jul 31, 2024, Semantic Chunking for Enhanced RAG Applications! https://levelup.gitconnected.com/semantic-chunking-for-enhanced-rag-applications-b6bc92942af0
  • Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
  • Louis-François Bouchard, Aug 12, 2024, When to Use GraphRAG, https://louisbouchard.substack.com/p/when-to-use-graphrag
  • Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
  • Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo, 17 Jan 2024, Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native, https://arxiv.org/abs/2401.12230
  • David Spuler, March 2024, Use Cases for FT vs RAG, in Generative AI in C++, https://www.aussieai.com/book/ch6-use-cases-rag-vs-ft
  • Jason Perlow, Sept. 6, 2024, Understanding RAG: How to integrate generative AI LLMs with your business knowledge, https://www.zdnet.com/article/understanding-rag-how-to-integrate-generative-ai-llms-with-your-business-knowledge/
  • Sau Sheong, Jun 13, 2024, Programming with AI — RAG: Using RAG in LLM Applications, https://sausheong.com/programming-with-ai-rag-27bf5c19daa7
  • Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
  • Chenliang Zhang, Lin Wang, Yuanyuan Lu, Yusheng Qi, Kexin Wang, Peixu Hou, Wenshi Chen, 14 Aug 2025, A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering, https://arxiv.org/abs/2508.10337
  • Juyuan Wang, Rongchen Zhao, Wei Wei, Yufeng Wang, Mo Yu, Jie Zhou, Jin Xu, Liyan Xu, 14 Aug 2025, ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning, https://arxiv.org/abs/2508.10419
  • Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Fengran Mo, Yufei Cui, Ling Zhou, 13 Aug 2025, FinSage: A Multi-aspect RAG System for Financial Filings Question Answering, https://arxiv.org/abs/2504.14493
  • Athanasios Davvetas, Xenia Ziouvelou, Ypatia Dami, Alexis Kaponis, Konstantina Giouvanopoulou, Michael Papademas, 23 Jul 2025, TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment, https://arxiv.org/abs/2507.17514
  • Shiting Chen, Zijian Zhao, Jinsong Chen, 23 Jul 2025, Each to Their Own: Exploring the Optimal Embedding in RAG, https://arxiv.org/abs/2507.17442
  • Yue Ding, Conor McCarthy, Kevin O'Shea and Mingming Liu, 23 Jul 2025, Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis, https://arxiv.org/abs/2507.10382
  • Jean Lelong, Adnane Errazine and Annabelle Blangero, 22 Jul 2025, Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications, https://arxiv.org/abs/2507.16507
  • Lars Hillebrand, Armin Berger, Daniel Uedelhoven, David Berghaus, Ulrich Warning, Tim Dilmaghani, Bernd Kliem, Thomas Schmid, R\"udiger Loitz, Rafet Sifa, 22 Jul 2025, Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance, https://arxiv.org/abs/2507.16711
  • San Kim, Jonghwi Kim, Yejin Jeon, Gary Geunbae Lee, 24 Jul 2025, Safeguarding RAG Pipelines with GMTP: A Gradient-based Masked Token Probability Method for Poisoned Document Detection, https://arxiv.org/abs/2507.18202
  • Shad Nygren, Pinar Avci, Andre Daniels, Reza Rassol, Afshin Beheshti, Diego Galeano, 18 Jul 2025, RAG-based Architectures for Drug Side Effect Retrieval in LLMs, https://arxiv.org/abs/2507.13822
  • Mohita Chowdhury, Yajie Vera He, Jared Joselowitz, Aisling Higham, Ernest Lim, 18 Jul 2025, ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems, https://arxiv.org/abs/2501.08208
  • Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas, 18 Jul 2025, Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning, https://arxiv.org/abs/2507.10571
  • Qikai Wei and Huansheng Ning and Chunlong Han and Jianguo Ding, 7 Jul 2025, A Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval-Augmented Generation in Large Language Models, https://arxiv.org/abs/2507.16826
  • Shubham Mohole, Hongjun Choi, Shusen Liu, Christine Klymko, Shashank Kushwaha, Derek Shi, Wesam Sakla, Sainyam Galhotra, Ruben Glatt, 23 Jul 2025, VERIRAG: Healthcare Claim Verification via Statistical Audit in Retrieval-Augmented Generation, https://arxiv.org/abs/2507.17948
  • Jie Ouyang, Tingyue Pan, Mingyue Cheng, Ruiran Yan, Yucong Luo, Jiaying Lin, Qi Liu, 18 Jul 2025, HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation, https://arxiv.org/abs/2503.04800
  • Jerry Wang and Fang Yu, 20 Jul 2025, DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection, https://arxiv.org/abs/2507.15042
  • Marina Danilevsky, Kristjan Greenewald, Chulaka Gunasekara, Maeda Hanafi, Lihong He, Yannis Katsis, Krishnateja Killamsetty, Yulong Li, Yatin Nandwani, Lucian Popa, Dinesh Raghu, Frederick Reiss, Vraj Shah, Khoi-Nguyen Tran, Huaiyu Zhu, Luis Lastras, 20 Jul 2025, A Library of LLM Intrinsics for Retrieval-Augmented Generation, https://arxiv.org/abs/2504.11704
  • Shengming Zhao, Yuchen Shao, Yuheng Huang, Jiayang Song, Zhijie Wang, Chengcheng Wan, Lei Ma, 21 Jul 2025, Understanding the Design Decisions of Retrieval-Augmented Generation Systems, https://arxiv.org/abs/2411.19463
  • Jubin Abhishek Soni, Amit Anand, Rajesh Kumar Pandey, Aniket Abhishek Soni, 19 Jul 2025, Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation, https://arxiv.org/abs/2506.11092
  • Pravallika Abbineni, Saoud Aldowaish, Colin Liechty, Soroosh Noorzad, Ali Ghazizadeh, Morteza Fayazi, 11 Aug 2025, MuaLLM: A Multimodal Large Language Model Agent for Circuit Design Assistance with Hybrid Contextual Retrieval-Augmented Generation, https://arxiv.org/abs/2508.08137
  • Agada Joseph Oche and Arpan Biswas, 8 Aug 2025, Role of Large Language Models and Retrieval-Augmented Generation for Accelerating Crystalline Material Discovery: A Systematic Review, https://arxiv.org/abs/2508.06691
  • Ran Xu, Yuchen Zhuang, Yue Yu, Haoyu Wang, Wenqi Shi, Carl Yang, 26 Jul 2025, RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation, https://arxiv.org/abs/2507.20059
  • Baiyu Chen, Wilson Wongso, Xiaoqian Hu, Yue Tan, Flora Salim, 27 Jul 2025, Multi-Stage Verification-Centric Framework for Mitigating Hallucination in Multi-Modal RAG, https://arxiv.org/abs/2507.20136
  • Robin D. Pesl, Jerin G. Mathew, Massimo Mecella, Marco Aiello, 28 Jul 2025, Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation, https://arxiv.org/abs/2411.19804
  • Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie, 27 Jul 2025, Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control, https://arxiv.org/abs/2502.12145
  • Nicholas Botti (Federal Reserve Board), Flora Haberkorn (Federal Reserve Board), Charlotte Hoopes (Federal Reserve Board), Shaun Khan (Federal Reserve Board), 28 Jul 2025, Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures, https://arxiv.org/abs/2507.21360
  • Ashley Rector, Keaton Minor, Kamden Minor, Jeff McCormack, Beth Breeden, Ryan Nowers, Jay Dorris, 29 Jul 2025, Validating Pharmacogenomics Generative Artificial Intelligence Query Prompts Using Retrieval-Augmented Generation (RAG), https://arxiv.org/abs/2507.21453
  • Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu and Huadong Ma, 29 Jul 2025, SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation, https://arxiv.org/abs/2507.21585
  • Gr\'egoire Martinon, Alexandra Lorenzo de Brionne, J\'er\^ome Bohard, Antoine Lojou, Damien Hervault, Nicolas J-B. Brunel (ENSIIE, LaMME), 29 Jul 2025, Towards a rigorous evaluation of RAG systems: the challenge of due diligence, https://arxiv.org/abs/2507.21753
  • Kezhen Zhong, Basem Suleiman, Abdelkarim Erradi, Shijing Chen, 10 Jul 2025, SemRAG: Semantic Knowledge-Augmented RAG for Improved Question-Answering, https://arxiv.org/abs/2507.21110
  • Kushal Chawla, Alfy Samuel, Anoop Kumar, Daben Liu, 29 Jul 2025, FB-RAG: Improving RAG with Forward and Backward Lookup, https://arxiv.org/abs/2505.17206
  • YiHan Jiao, ZheHao Tan, Dan Yang, DuoLin Sun, Jie Feng, Yue Shen, Jian Wang, Peng Wei, 29 Jul 2025, HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation, https://arxiv.org/abs/2507.05714
  • Hyeon Seong Jeong, Sangwoo Jo, Byeong Hyun Yoon, Yoonseok Heo, Haedong Jeong, Taehoon Kim, 31 Jul 2025, Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation, https://arxiv.org/abs/2507.23217
  • Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Jianxin Li, Ziwei Zhang, 31 Jul 2025, GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning, https://arxiv.org/abs/2507.23581
  • Kwun Hang Lau, Ruiyuan Zhang, Weijie Shi, Xiaofang Zhou, Xiaojun Cheng, 21 Jul 2025, Reading Between the Timelines: RAG for Answering Diachronic Questions, https://arxiv.org/abs/2507.22917
  • Shuyu Guo, Zhaochun Ren, 24 Jul 2025, Enhancing RAG Efficiency with Adaptive Context Compression, https://arxiv.org/abs/2507.22931
  • Daeyong Kwon, SeungHeon Doh and Juhan Nam, 31 Jul 2025, MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation, https://arxiv.org/abs/2507.23334
  • Zerui Yang and Yuwei Wan and Siyu Yan and Yudai Matsuda and Tong Xie and Bram Hoex and Linqi Song, 31 Jul 2025, DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search, https://arxiv.org/abs/2507.07426
  • Hruday Markondapatnaikuni, Basem Suleiman, Abdelkarim Erradi, Shijing Chen, 31 Jul 2025, KeyKnowledgeRAG (K^2RAG): An Enhanced RAG method for improved LLM question-answering capabilities, https://arxiv.org/abs/2507.07695
  • Roie Kazoom, Raz Lapid, Moshe Sipper and Ofer Hadar, 30 Jul 2025, Don't Lag, RAG: Training-Free Adversarial Detection Using RAG, https://arxiv.org/abs/2504.04858
  • Eric Yang, Jonathan Amar, Jong Ha Lee, Bhawesh Kumar, Yugang Jia, 30 Jul 2025, The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation for Healthcare QA, https://arxiv.org/abs/2407.18044
  • Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Ajay Varghese Thomas, Noel Shallum, Kripabandhu Ghosh and Arnab Bhattacharya, 1 Aug 2025, NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System, https://arxiv.org/abs/2508.00709
  • Ningning Zhang, Chi Zhang, Zhizhong Tan, Xingxing Yang, Weiping Deng, Wenyong Wang, 1 Aug 2025, Credible Plan-Driven RAG Method for Multi-Hop Question Answering, https://arxiv.org/abs/2504.16787
  • Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu, 1 Aug 2025, RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism, https://arxiv.org/abs/2507.02962
  • Roie Kazoom, Ofir Cohen, Rami Puzis, Asaf Shabtai, Ofer Hadar, 1 Aug 2025, VAULT: Vigilant Adversarial Updates via LLM-Driven Retrieval-Augmented Generation for NLI, https://arxiv.org/abs/2508.00965
  • Pengcheng Zhou, Yinglun Feng, Zhongliang Yang, 1 Aug 2025, Provably Secure Retrieval-Augmented Generation, https://arxiv.org/abs/2508.01084
  • Jaskaranjeet Singh, Rakesh Thakur, 3 Aug 2025, Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language, https://arxiv.org/abs/2508.01918
  • Vali Tawosia, Salwa Alamir, Xiaomo Liu, Manuela Veloso, 4 Aug 2025, Meta-RAG on Large Codebases Using Code Summarization, https://arxiv.org/abs/2508.02611
  • Jimeng Shi, Sizhe Zhou, Bowen Jin, Wei Hu, Runchu Tian, Shaowen Wang, Giri Narasimhan, Jiawei Han, 4 Aug 2025, Hypercube-Based Retrieval-Augmented Generation for Scientific Question-Answering, https://arxiv.org/abs/2505.19288
  • Yaodong Su, Yixiang Fang, Yingli Zhou, Quanqing Xu, Chuanhui Yang, 3 Aug 2025, Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval, https://arxiv.org/abs/2507.08445
  • Tuan-Dung Bui, Duc-Thieu Luu-Van, Thanh-Phat Nguyen, Thu-Trang Nguyen, Son Nguyen, and Hieu Dinh Vo, 2 Aug 2025, RAMBO: Enhancing RAG-based Repository-Level Method Body Completion, https://arxiv.org/abs/2409.15204
  • Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim, 4 Aug 2025, Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation, https://arxiv.org/abs/2508.02835
  • Kaiwen Zhao, Bharathan Balaji, Stephen Lee, 5 Aug 2025, CF-RAG: A Dataset and Method for Carbon Footprint QA Using Retrieval-Augmented Generation, https://arxiv.org/abs/2508.03489
  • Giovanni Cherubin, Andrew Paverd, 4 Aug 2025, Highlight & Summarize: RAG without the jailbreaks, https://arxiv.org/abs/2508.02872
  • Kunal Sawarkar, Shivam R. Solanki, Abhilasha Mangal, 5 Aug 2025, MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering, https://arxiv.org/abs/2505.18247
  • Jiayi Wen, Tianxin Chen, Zhirun Zheng, Cheng Huang, 6 Aug 2025, A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models, https://arxiv.org/abs/2508.04276
  • Herbert Ullrich, Jan Drchal, 5 Aug 2025, AIC CTU@FEVER 8: On-premise fact checking through long context RAG, https://arxiv.org/abs/2508.04390
  • Tianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Shuchang Lyu, Baoyuan Wu, and Guangliang Cheng, 6 Aug 2025, RAIDX: A Retrieval-Augmented Generation and GRPO Reinforcement Learning Framework for Explainable Deepfake Detection, https://arxiv.org/abs/2508.04524
  • Chitranshu Harbola, Anupam Purwar, 28 Jul 2025, Prescriptive Agents based on Rag for Automated Maintenance (PARAM), https://arxiv.org/abs/2508.04714
  • Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li, 7 Aug 2025, QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering, https://arxiv.org/abs/2508.05197
  • Xu Yuan, Liangbo Ning, Wenqi Fan, Qing Li, 7 Aug 2025, mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering, https://arxiv.org/abs/2508.05318
  • Zhenghao Liu, Xingsheng Zhu, Tianshuo Zhou, Xinyi Zhang, Xiaoyuan Yi, Yukun Yan, Ge Yu, Maosong Sun, 7 Aug 2025, Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts, https://arxiv.org/abs/2502.17297
  • Vibhor Agrawal, Fay Wang, Rishi Puri, 25 Jul 2025, Query-Aware Graph Neural Networks for Enhanced Retrieval-Augmented Generation, https://arxiv.org/abs/2508.05647
  • Chandler Campbell, Bernie Boscoe, Tuan Do, 25 Jul 2025, AquiLLM: a RAG Tool for Capturing Tacit Knowledge in Research Groups, https://arxiv.org/abs/2508.05648
  • Jiaxuan Liang, Shide Zhou, and Kailong Wang, 26 Jul 2025, OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation Tools, https://arxiv.org/abs/2508.05650
  • Aditya Nagori, Ricardo Accorsi Casonatto, Ayush Gautam, Abhinav Manikantha Sai Cheruvu, and Rishikesan Kamaleswaran, 30 Jul 2025, Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review, https://arxiv.org/abs/2508.05660
  • Yuzhou Zhu, 31 Jul 2025, From Static to Dynamic: A Streaming RAG Approach to Real-time Knowledge Base, https://arxiv.org/abs/2508.05662
  • Hei Yu Chan, Kuok Tou Ho, Chenglong Ma, Yujing Si, Hok Lai Lin, Sa Lei Lam, 1 Aug 2025, Enhancing Retrieval-Augmented Generation for Electric Power Industry Customer Support, https://arxiv.org/abs/2508.05664
  • Alejandro Godinez, 1 Aug 2025, HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis, https://arxiv.org/abs/2508.05666
  • Weitao Li, Boran Xiang, Xiaolong Wang, Zhinan Gou, Weizhi Ma, Yang Liu, 8 Aug 2025, UR$^2$: Unify RAG and Reasoning through Reinforcement Learning, https://arxiv.org/abs/2508.06165
  • Richard Willats, Josh Pennington, Aravind Mohan, Bertie Vidgen, 8 Aug 2025, Classification is a RAG problem: A case study on hate speech detection, https://arxiv.org/abs/2508.06204
  • Andrew Brown, Muhammad Roman and Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401
  • Congmin Min, Rhea Mathew, Joyce Pan, Sahil Bansal, Abbas Keshavarzi, Amar Viswanathan Kannan, 7 Aug 2025, Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems, https://arxiv.org/abs/2507.03226
  • Shu Wang, Yixiang Fang, Yingli Zhou, Xilin Liu, Yuchi Ma, 8 Aug 2025, ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation, https://arxiv.org/abs/2502.09891
  • Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab, 11 Aug 2025, What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge, https://arxiv.org/abs/2508.08344
  • Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao, 12 Aug 2025, SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling, https://arxiv.org/abs/2508.09105
  • Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal, 12 Aug 2025, Retrieval-Augmented Generation with Conflicting Evidence, https://arxiv.org/abs/2504.13079
  • Tim Cofala, Oleh Astappiev, William Xion, Hailay Teklehaymanot, 12 Aug 2025, RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition, https://arxiv.org/abs/2506.14412
  • Amit Kumar Jaiswal, Haiming Liu, Ingo Frommholz, 6 Aug 2025, Multimodal RAG Enhanced Visual Description, https://arxiv.org/abs/2508.09170
  • Seyed Shayan Daneshvar, Yu Nong, Xu Yang, Shaowei Wang, Haipeng Cai, 12 Aug 2025, VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs, https://arxiv.org/abs/2408.04125
  • Changjian Wang, Weihong Deng, Weili Guan, Quan Lu, Ning Jiang, 15 Aug 2025, Cross-Granularity Hypergraph Retrieval-Augmented Generation for Multi-hop Question Answering, https://arxiv.org/abs/2508.11247
  • Jinquan Shi, Yingying Cheng, Fan Zhang, Miao Jiang, Jun Lin, Yanbai Shen, 18 Aug 2025, GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance, https://arxiv.org/abs/2508.12682
  • Yifei Chen, Guanting Dong, Yutao Zhu, Zhicheng Dou, 19 Aug 2025, Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration, https://arxiv.org/abs/2508.13828
  • Yukun Cao, Zengyi Gao, Zhiyang Li, Xike Xie, S. Kevin Zhou, Jianliang Xu, 19 Aug 2025, LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration, https://arxiv.org/abs/2411.05844
  • Yao Ding, Yuqing Wu, Ziyang Ding, 11 Aug 2025, An automatic patent literature retrieval system based on LLM-RAG, https://arxiv.org/abs/2508.14064
  • Lorenz Brehme, Benedikt Dornauer, Thomas Str\"ohle, Maximilian Ehrhart and Ruth Breu, 11 Aug 2025, Retrieval-Augmented Generation in Industry: An Interview Study on Use Cases, Requirements, Challenges, and Evaluation, https://arxiv.org/abs/2508.14066
  • Skatje Myers, Dmitriy Dligach, Timothy A. Miller, Samantha Barr, Yanjun Gao, Matthew Churpek, Anoop Mayampurath, Majid Afshar, 20 Aug 2025, Evaluating Retrieval-Augmented Generation vs. Long-Context Input for Clinical Reasoning over EHRs, https://arxiv.org/abs/2508.14817
  • Sarat Ahmad, Zeinab Nezami, Maryam Hafeez, Syed Ali Raza Zaidi, 20 Aug 2025, Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN), https://arxiv.org/abs/2507.03608
  • Eunseong Choi, June Park, Hyeri Lee, Jongwuk Lee, 21 Aug 2025, Conflict-Aware Soft Prompting for Retrieval-Augmented Generation, https://arxiv.org/abs/2508.15253
  • Wutao Liu, YiDan Wang, and Pan Gao, 21 Aug 2025, First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection, https://arxiv.org/abs/2508.15313
  • Mandeep Rathee, Venktesh V, Sean MacAvaney, Avishek Anand, 21 Aug 2025, Test-time Corpus Feedback: From Retrieval to RAG, https://arxiv.org/abs/2508.15437
  • Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang and Weidi Xie, 21 Aug 2025, End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning, https://arxiv.org/abs/2508.15746
  • Kai Hu, Parfait Atchade-Adelomou, Carlo Adornetto, Adrian Mora-Carrero, Luis Alonso-Pastor, Ariel Noyman, Yubo Liu, Kent Larson, 22 Aug 2025, Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain, https://arxiv.org/abs/2508.16172
  • Yosef Dayani, Omer Benishu, Sagie Benaim, 22 Aug 2025, MV-RAG: Retrieval Augmented Multiview Diffusion, https://arxiv.org/abs/2508.16577
  • Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng, 22 Aug 2025, Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG, https://arxiv.org/abs/2504.05220
  • Yiming Xu, Junfeng Jiao, 24 Aug 2025, Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction, https://arxiv.org/abs/2508.17527
  • Kaiwen Zuo, Zelin Liu, Raman Dutt, Ziyang Wang, Zhongtian Sun, Yeming Wang, Fan Mo, Pietro Li\`o, 24 Aug 2025, How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System, https://arxiv.org/abs/2508.17215
  • Hsuan-Kung Yang, Tsu-Ching Hsiao, Ryoichiro Oka, Ryuya Nishino, Satoko Tofukuji, Norimasa Kobori, 10 Aug 2025, An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance, https://arxiv.org/abs/2508.16602
  • Jeongsoo Lee, Daeyong Kwon, Kyohoon Jin, 23 Aug 2025, GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation, https://arxiv.org/abs/2508.16994
  • Jiale Liu, Jiahao Zhang, Suhang Wang, 24 Aug 2025, Exposing Privacy Risks in Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2508.17222
  • Xiaqiang Tang, Yi Wang, Keyu Hu, Rui Xu, Chuang Li, Weigao Sun, Jian Li, Sihong Xie, 24 Aug 2025, SSFO: Self-Supervised Faithfulness Optimization for Retrieval-Augmented Generation, https://arxiv.org/abs/2508.17225
  • Amulya Suravarjhula, Rashi Chandrashekhar Agrawal, Sakshi Jayesh Patel, Rahul Gupta, 11 Aug 2025, Retrieval-Augmented Multi-Agent System for Rapid Statement of Work Generation, https://arxiv.org/abs/2508.07569
  • Haidong Xu, Guangwei Xu, Zhedong Zheng, Xiatian Zhu, Wei Ji, Xiangtai Li, Ruijie Guo, Meishan Zhang, Min zhang, Hao Fei, 16 Aug 2025, VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models, https://arxiv.org/abs/2508.12081
  • Steeve Cuthbert Marcelyn, Yucen Gao, Yuzhe Zhang, Xiaofeng Gao, 20 Aug 2025, PathGPT: Reframing Path Recommendation as a Natural Language Generation Task with Retrieval-Augmented Language Models, https://arxiv.org/abs/2504.05846
  • Xiaokai Bai, Chenxu Zhou, Lianqing Zheng, Si-Yuan Cao, Jianan Liu, Xiaohan Zhang, Zhengzhuang Zhang, Hui-liang Shen, 26 Jul 2025, RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection, https://arxiv.org/abs/2507.19856
  • Jeiyoon Park, Yongshin Han, Minseop Kim, Kisu Yang, 4 Aug 2025, Dynamic Context Adaptation for Consistent Role-Playing Agents with Retrieval-Augmented Generations, https://arxiv.org/abs/2508.02016
  • Jonas van Elburg, Peter van der Putten, Maarten Marx, 15 Aug 2025, Can we Evaluate RAGs with Synthetic Data?, https://arxiv.org/abs/2508.11758

Advanced RAG

Research papers on advanced RAG architectures:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: