Aussie AI

Chapter 17. Research on RAG

Book Excerpt from "RAG Optimization: Accurate and Efficient LLM Applications"

by David Spuler and Michael Sharpe

Chapter 17. Research on RAG

What’s Hot in RAG Research?

Looking for a dissertation idea in RAG? Well, RAG is inherently hot in itself, as one of the most popular architectures for generative AI apps. Hence, there’s a steady stream of all sorts of research on RAG, as you can see from our list at the end of this chapter.

Nevertheless, here are some thoughts on specific areas of RAG research that are recently hot, or predicted to shortly get hotter:

Agentic RAG
Multi-step reasoning (for RAG)
RAG tool integrations
Long context RAG
Advanced RAG architectures
Fused KV caching of RAG chunks
Multimodal RAG chunks

Agentic RAG. This goes two ways: (a) agents using RAG for data lookups, or (b) RAG apps doing actions via agent interfaces. Most likely, both. There’s plenty of research papers on agents and “agentic architectures,” and there’s an endless stream of RAG papers, but not many yet in the combined area. The area of agentic RAG has been examined in Chapter 15.

Multi-step reasoning. There probably isn’t any hotter area of research right now than “multi-step reasoning” or “test time compute” as a way to get to greater model intelligence. The leader of the pack is “Chain-of-Thought,” as implemented in OpenAI’s “o1” model (the “Strawberry” version), but there are numerous “chain-of-something” variants. All of these can be used by RAG, as part of its LLM lookup, and see Chapter 13 for some discussion of reasoning and RAG. However, there hasn’t been much research on this “reasoning for RAG” special case yet. We predict more!

RAG tool integrations. There is ongoing research into the integration of tools (“function calling”) for RAG architectures. This goes in multiple directions: (a) using data integration tool interfaces for more general RALM architectures (see Chapter 14), (b) using computation tools in RAG (e.g., clocks, calculators), and (c) using action tools in agentic RAG architectures (see Chapter 15). Although there’s much research on tool usage by LLMs in general, one of the RAG-specific issues with tool usage is that the RAG LLM must make two similar and overlapping choices: whether or not to use tools, and whether or not a RAG chunk is needed to answer a query. Hence, the general area of RAG query planning involves both areas, and now must also be combined with new “multi-step reasoning” choices, too.

Long context RAG. There are numerous papers on long context LLMs, and now there are many commercial inference engines that support long context lookups. Hence, it’s become a solved problem. However, the use of long context LLMs in RAG hasn’t received as much research attention. The extreme ideas are mini-RAG (single-document RAG) or mega-RAG (big chunk RAG), and these are covered in Chapter 10.

Advanced RAG architectures. There continues to be a stream of research papers on numerous ways to improve the basic RAG algorithm. The vector database received most of the early research attention, but researchers are now sharing the love to rerankers, packers, keyword datastore lookups, and more. This area of advanced RAG architectures is examined in Chapter 14.

Fused KV caching. There’s a problem with RAG in that “prefix caching” doesn’t work too well as a speed optimization, because returning multiple chunks in query-specific orderings is not amenable to prefix caching. The fix is “non-prefix KV caching” and there’s a few papers on this already. If solved, then the KV cache can be precomputed for every RAG chunk, alleviating the need for the prefill or prompt processing phase of LLM inference completely. On the other hand, the growth of “big chunks” in mini-RAG (single-document) and mega-RAG architectures (long context chunks) means there is less of a problem, because there’s fewer chunks overall, and therefore the RAG chunk ordering workarounds for prefix KV caching may be adequate. There aren’t many papers in this area yet. Nevertheless, this idea of non-prefix fused KV caching as used in a RAG cache is examined in detail in Chapter 11.

Context compression. Every token in a RAG chunk costs money in terms of LLM inference processing, so it’s desirable to reduce those tokens. There’s various research on “context compression” techniques, such as token pruning or token merging, in general LLM inference optimization. However, reducing the size of RAG chunks using these context compression ideas needs some more research.

Multimodal RAG chunks. Most of the research about RAG is related to text. However, there are various obvious cases where you’d want an image to be returned by an LLM. For example, if you’re implementing a customer support chatbot on your car dealership’s website, you might want to show a picture of a car. However, you don’t want to use a typical multimodal LLM, because that will generate a random new image, and be both expensive and potentially risky.

The use of images and multimodal data with RAG is an emerging research area. There are several distinct sub-areas including:

Using visual features of PDFs for better retrieval (e.g., ColPali)
Ingesting multimodal data for both retrieval and display in results.
Image results display (simpler methods)

The area of showing images from an approved catalog, or portions of those images, in a RAG architecture is an area worthy of further research.

RAG Optimization Papers

Some of the best general research papers with overviews of optimizations for RAG techniques include:

Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Joyce Birkins, Oct 10, 2024, 6 Advanced RAG Optimization Strategies: Analysis of 14 Key Research Papers, https://medium.com/@pamperherself/6-advanced-rag-optimization-strategies-analysis-of-14-key-research-papers-f12329975009
Michael Shen, Muhammad Umar, Kiwan Maeng, G. Edward Suh, Udit Gupta, 16 Dec 2024, Towards Understanding Systems Trade-offs in Retrieval-Augmented Generation Model Inference, https://arxiv.org/abs/2412.11854
Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
Contextual AI Team, March 19, 2024 Introducing RAG 2.0, https://contextual.ai/introducing-rag2/

RAG Best Practices

RAG best practices are practical guidelines on getting the most out of your RAG architecture. This can include accuracy improvements and efficiency optimizations. Research papers that examine the general state of RAG architectures in terms of their best practices include:

Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
Siran Li, Linus Stenzel, Carsten Eickhoff, Seyed Ali Bahrainian, 13 Jan 2025, Enhancing Retrieval-Augmented Generation: A Study of Best Practices, https://arxiv.org/abs/2501.07391 https://github.com/ali-bahrainian/RAG_best_practices (Examines RAG best practices such as model size, prompt wording, chunk size, knowledge base size, and more.)
Harvey Bower, 2024, Debugging RAG Pipelines: Best Practices for High-Performance LLMs, https://www.amazon.com/dp/B0DNWN5RB1
Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296

RAG Evaluation Research

RAG evaluation is the analysis of the LLM-based RAG architecture as a whole, rather than conventional model evaluation that examines only the model. A typical RAG system includes not only an LLM, but a vector database of document chunks, and an orchestrator component. Advanced RAG architectures typically also include a keyword search datastore, reranker, packer, and other components.

Research papers on testing and evaluation of entire RAG systems:

Shahul Es, Jithin James, Luis Espinosa-Anke, Steven Schockaert, 26 Sep 2023, RAGAS: Automated Evaluation of Retrieval Augmented Generation, https://arxiv.org/abs/2309.15217
Shangeetha Sivasothy, Scott Barnett, Stefanus Kurniawan, Zafaryab Rasool, Rajesh Vasa, 24 Sep 2024, RAGProbe: An Automated Approach for Evaluating RAG Applications, https://arxiv.org/abs/2409.19019
Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia, 31 Mar 2024 (v2), ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems, https://arxiv.org/abs/2311.09476
Kevin Wu, Eric Wu, James Zou, 10 Jun 2024 (v2), ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence, https://arxiv.org/abs/2404.10198
Galla, D., Hoda, S., Zhang, M., Quan, W., Yang, T.D., Voyles, J., 2024, CoURAGE: A Framework to Evaluate RAG Systems, In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14763. Springer, Cham. https://doi.org/10.1007/978-3-031-70242-6_37 https://link.springer.com/chapter/10.1007/978-3-031-70242-6_37
Rafael Teixeira de Lima, Shubham Gupta, Cesar Berrospi, Lokesh Mishra, Michele Dolfi, Peter Staar, Panagiotis Vagenas, 29 Nov 2024, Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems, IBM Research, https://arxiv.org/abs/2411.19710
Lilian Weng, July 7, 2024, Extrinsic Hallucinations in LLMs, https://lilianweng.github.io/posts/2024-07-07-hallucination/
Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
Contextual AI Team, March 19, 2024 Introducing RAG 2.0, https://contextual.ai/introducing-rag2/
Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra, 30 Jan 2024 (v3), RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677

RAG Fusion

RAG fusion is a RAG extension that incorporates analyzing multiple versions of the query to return the best context chunks. The model generates multiple “reformulated” versions of the original text query, each of which is sent to the retriever, and a final use of “Reciprocal Rank Fusion” combines all of the returned chunks into a single ranking, like a “reranker” component, but using multiple similar rankings. The main advantage is finding more accurate context for the LLM, and the downside is the many additional calls to the retriever database with slightly modified queries.

Research on RAG fusion algorithms:

Igor Novikov, Jul 23, 2024, RAG Architecture: Advanced RAG, https://pub.towardsai.net/rag-architecture-advanced-rag-3fea83e0d189
Surya Maddula, Sep 2024, Not RAG, but RAG Fusion? Understanding Next-Gen Info Retrieval, https://pub.towardsai.net/not-rag-but-rag-fusion-understanding-next-gen-info-retrieval-477788da02e2
Adrian H. Raudaschl, Oct 6, 2023, Forget RAG, the Future is RAG-Fusion: The Next Frontier of Search: Retrieval Augmented Generation meets Reciprocal Rank Fusion and Generated Queries, https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
Deval Shah, Jul 4, 2024, Reciprocal Rank Fusion (RRF) explained in 4 mins — How to score results from multiple retrieval methods in RAG: Unlock the power of Reciprocal Rank Fusion in Retrieval-Augmented Generation, https://medium.com/@devalshah1619/mathematical-intuition-behind-reciprocal-rank-fusion-rrf-explained-in-2-mins-002df0cc5e2a
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Sanjay Kumar, Apr 2, 2024, RAG Fusion: A New Frontier in Search and Generative AI, https://medium.com/@Sanjaynk7907/rag-fusion-a-new-frontier-in-search-and-generative-ai-ebb24e7e905e
Omar Santos, Jun 15, 2024, Comparing RAG, RAG Fusion, with RAPTOR: Different AI Retrieval-Augmented Implementations, https://becomingahacker.org/comparing-rag-rag-fusion-with-raptor-different-ai-retrieval-augmented-implementations-1aa76fce6a5c

Super RAG

Super RAG is a generalization of retrieval to accept more general information than naive RAG systems. Hence, a “super RAG” system is an embodiment of a more general type of RALM. Research papers on “super RAG” include:

Ayush Thakur, Raghav Gupta, 13 Apr 2024, Introducing Super RAGs in Mistral 8x7B-v1, https://arxiv.org/abs/2404.08940
SuperAgent, 2024, Super-Rag with SAML, https://docs.superagent.sh/overview/rag-retrieval/super-rag-with-saml
Andrew Ditmer, May 13 2024, SuperRAG – How to achieve higher accuracy with Retrieval Augmented Generation, https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/superrag-how-to-achieve-higher-accuracy-with-retrieval-augmented/ba-p/4139004

RAG Survey Papers

General survey papers on RAG architectures include:

Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu, 13 Feb 2022 (v2), A Survey on Retrieval-Augmented Text Generation, https://arxiv.org/abs/2202.01110
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui, 21 Jun 2024 (v6), Retrieval-Augmented Generation for AI-Generated Content: A Survey, https://arxiv.org/abs/2402.19473
Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu, 3 Jul 2024 (v2), Evaluation of Retrieval-Augmented Generation: A Survey, https://arxiv.org/abs/2405.07437
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang, 27 Mar 2024 (v5), Retrieval-Augmented Generation for Large Language Models: A Survey, https://arxiv.org/abs/2312.10997

General Research Papers on RAG

There are rather a lot of research papers on RAG, as it’s a fundamental underpinning technique in generative AI. Here’s a few of them:

Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna, 3 Jun 2024, Demystifying Platform Requirements for Diverse LLM Inference Use Cases, https://arxiv.org/abs/2406.01698 Code: https://github.com/abhibambhaniya/GenZ-LLM-Analyzer (Analysis of cost of serving LLMs, including separate profiles of prefill versus decoding phases, and the cost of extra prompt processing in RAG architectures with prepended information.)
Timo Lehto, June 2024, Developing LLM-powered Applications Using Modern Frameworks, Bachelor’s Thesis, Information and Communications Technology, Jamk University of Applied Sciences, Finland, June 2024, 53 pages., https://www.theseus.fi/bitstream/handle/10024/862271/Lehto_Timo.pdf?sequence=2 (Building LLM-based applications in RAG architecture using LangChain.)
Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543 Project: https://github.com/2471023025/RALM_Survey
Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, 22 Apr 2024, A Survey on Efficient Inference for Large Language Models, https://arxiv.org/abs/2404.14294
Mandar Karhade, Mar 20, 2024, Why RAG Applications Fail in Production, Towards AI, https://pub.towardsai.net/why-rag-applications-fail-in-production-a-technical-deep-dive-15cc976af52c
Priyank Rathod, May 21, 2024, Efficient Usage of RAG Systems in the World of LLMs, https://www.techrxiv.org/doi/full/10.36227/techrxiv.171625877.73379410/v1
SciPhi AI, June 2024 (accessed), R2R: The ultimate open-source RAG framework, https://github.com/SciPhi-AI/R2R
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Bin Cui, 27 Mar 2024 (v2), Retrieval-Augmented Generation for AI-Generated Content: A Survey, https://arxiv.org/abs/2402.19473 Project: https://github.com/hymie122/RAG-Survey
Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe, 12 Jan 2024, The Unreasonable Effectiveness of Easy Training Data for Hard Tasks, https://arxiv.org/abs/2401.06751
Bijit Ghosh, Dec 25, 2023, Advanced RAG for LLMs/SLMs, Medium, https://medium.com/@bijit211987/advanced-rag-for-llms-slms-5bcc6fbba411
Iulia Brezeanu, Jan 5, 2024, How to Cut RAG Costs by 80% Using Prompt Compression, Towards Data Science, https://towardsdatascience.com/how-to-cut-rag-costs-by-80-using-prompt-compression-877a07c6bedb
James Nguyen, Nov 19, 2023, Forget RAG: Embrace agent design for a more intelligent grounded ChatGPT! https://james-tn.medium.com/forget-rag-embrace-agent-design-for-a-more-intelligent-grounded-chatgpt-6c562d903c61
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, Apr 2021, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401
Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang, 3 Jun 2024 (v2), CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion, https://arxiv.org/abs/2405.16444 Code: https://github.com/YaoJiayi/CacheBlend.git (Generalizes prefix KV caching to KV cache fusion with selective recomputation of some KV cache data.)
David Spuler, March 2024, Chapter 6. Training, Fine-Tuning & RAG, in Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Tiernan Ray, June 3, 2024, Make room for RAG: How Gen AI’s balance of power is shifting, https://www.zdnet.com/article/make-room-for-rag-how-gen-ais-balance-of-power-is-shifting/
Shicheng Xu, Liang Pang, Mo Yu, Fandong Meng, Huawei Shen, Xueqi Cheng, Jie Zhou, 12 Jun 2024 (v2), Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation, https://arxiv.org/abs/2402.18150 (Analysis about how LLMs can mishandle information retrieved from a datastore and how to make LLMs better at handling RAG information using a specialized training regime.)
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
Ziyan Jiang, Xueguang Ma, Wenhu Chen, June 2024, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, arXiv preprint arXiv:2406.15319, https://arxiv.org/abs/2406.15319 (Improved accuracy performance of RAG methods when using a long context LLM and longer chunk sizes for the retriever.)
Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
Louis-François Bouchard, Louie Peters, May 2024, Chapter 7: RAG, and Chapter 8, Advanced RAG, in Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG, https://www.amazon.com/Building-LLMs-Production-Reliability-Fine-Tuning/dp/B0D4FFPFW8/
Matt Murphy, Tim Tully, Derek Xiao, January 18, 2024, The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures, Menlo Ventures, https://menlovc.com/perspective/the-modern-ai-stack-design-principles-for-the-future-of-enterprise-ai-architectures/ (Various details about the AI tech stack, organizational AI maturity levels, and several interesting facts: inference is 95% of AI cost now, 60% of organizations are using multi-model methods, RAG is the dominant architecture currently, and AI application development teams are primarily made up of non-ML software engineers leveraging on top of AI models.)
Anirban Ghoshal, July 3, 2024, AWS approach to RAG evaluation could help enterprises reduce AI spending, https://www.infoworld.com/article/3715629/aws-new-approach-to-rag-evaluation-could-help-enterprises-reduce-ai-spending.html
Yi Zhou, Dec 16, 2023, Optimizing GenAI: Comparing Model Training, Fine-Tuning, RAG, and Prompt Engineering, https://medium.com/generative-ai-revolution-ai-native-transformation/optimizing-genai-comparing-model-training-fine-tuning-rag-and-prompt-engineering-7a7c6c65e0f0
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
Chip Huyen, Jul 25, 2024, Building A Generative AI Platform, https://huyenchip.com/2024/07/25/genai-platform.html
Akash Bajwa and Chia Jeng Yang, May 27, 2024, The RAG Stack: Featuring Knowledge Graphs: Reducing Hallucinations To Make LLMs Production-Grade With Complex RAG, https://akashbajwa.substack.com/p/the-rag-stack-featuring-knowledge
Chips Ahoy Capital, Jul 02, 2024, Evolution of Databases in the World of AI Apps, https://chipsahoycapital.substack.com/p/evolution-of-databases-in-the-world
Pavan Belagatti, Jul 31, 2024, Semantic Chunking for Enhanced RAG Applications! https://levelup.gitconnected.com/semantic-chunking-for-enhanced-rag-applications-b6bc92942af0
Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
Louis-François Bouchard, Aug 12, 2024, When to Use GraphRAG, https://louisbouchard.substack.com/p/when-to-use-graphrag
Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo, 17 Jan 2024, Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native, https://arxiv.org/abs/2401.12230
David Spuler, March 2024, Use Cases for FT vs RAG, in Generative AI in C++, https://www.aussieai.com/book/ch6-use-cases-rag-vs-ft
Jason Perlow, Sept. 6, 2024, Understanding RAG: How to integrate generative AI LLMs with your business knowledge, https://www.zdnet.com/article/understanding-rag-how-to-integrate-generative-ai-llms-with-your-business-knowledge/
Sau Sheong, Jun 13, 2024, Programming with AI — RAG: Using RAG in LLM Applications, https://sausheong.com/programming-with-ai-rag-27bf5c19daa7