Aussie AI

Chapter 14. Advanced RAG Architectures

  • Book Excerpt from "RAG Optimization: Accurate and Efficient LLM Applications"
  • by David Spuler and Michael Sharpe

Chapter 14. Advanced RAG Architectures

Overview of Advanced RAG

We’ve already covered some of the more advanced types of RAG architectures, such as combining fine-tuning and RAG. Here are some more possible extensions to a basic RAG architecture.

Citation management. An important part of a production RAG system is to have it emit the citations into its answers. Although it’s relatively easy to store a URL or other identifier in the RAG datastore for every chunk. A simplistic approach is to list the citations at the end of the LLM response. However, it’s a little trickier to know whether or not the LLM has actually used the chunk in its answer, or to try to insert the citation into the middle of the output text as a HREF link.

Reranking. Advanced RAG systems have a few steps after the retriever returns a few chunks of text. The reranker will attempt to decide which is the most relevant. Why do you need a “reranker” if the retriever has just looked it up? Presumably the retriever did its best to order them, right? Yes, that’s why it’s called a “reranker” rather than just a “ranker.”

The reranker is optional, but having one can improve the RAG system overall. It can be a model-based algorithm to review the chunk’s relevance to the query in more detail, since a typical retriever is non-LLM technology. The reranker may also have information that is unavailable to the retriever, such as user feedback data about chunks, or whether the RAG chunk has its KV data cached (i.e., “cache-aware” reranking).

Packing. The packing phase is after the reranker, and will merge the ranked chunks together into one text sequence. This could just be simple string concatenation of text chunks, or they might be incorporated into a more templated structured text (e.g., with separator lines of text, and numbering). Another issue is whether or not to put the citation URLs into the final text for the LLM to use.

Some research has shown that, at least for some models, the best packing is “reverse” order, with the most relevant chunk last. This sounds strange, but it puts the retriever’s best chunk closest to the user’s query at the end, which helps the LLM pay “attention” to that last chunk. However, more advanced models are better at extracting text across longer contexts, so this advice may not be that important for long.

Too much of a good chunk. A typical RAG system does a retrieval from the external datastore for every query. This is probably fine for a Q&A style lookup for factual answers, but is not always optimal for every query from a human, such as in a conversation with a chatbot. There are times where it’s inappropriate for a chatbot to respond with an answer based on some document. Sometimes people just want a chunk-free answer.

The basic approach is just to hope the datastore’s nearest-neighbor vector lookup will fail to match a chunk in such cases, or else let the LLM sort it out and do its best. The more advanced idea of using additional models to determine whether or not a user’s query requires an external RAG lookup is called “adaptive RAG” and it’s a relatively new area of research.

Knowledge graphs. Instead of basic text chunks of documents, a RAG system can use more complex representations of information. One of the rising stars in this type of system is the knowledge graph. This represents information in a hierarchical graph structure, allowing for more advanced reasoning in the results.

Prefix KV caching. There are multiple ways to use caches in RAG, such as using basic datastore lookup caching (and indexing) optimizations for the retriever component. Another deeper way to integrate RAG into a Transformer system is to also store the KV cache data with the RAG chunk text. The idea is to pre-compute the KV data that results from processing any RAG chunk, This ensures that the inference engine does not re-process the same chunk every time it’s retried, but only needs to process the new tokens, which is usually the user’s actual question.

The idea is that the RAG chunk is usually prepended as the prefix of the full input to the LLM. Note that the “prefix” text might include both the “global instructions” along with the RAG chunk, but this still works; it’s just a longer prefix to pre-compute for each chunk. Hence, a RAG prefix KV cache can speed up the latency significantly.

The downside is the need to add an extra caching component that maps the RAG chunk id to a blob of KV cache data (one for every layer of the model), and the inference engine needs to load that KV cache at the start of prefill. There are also difficulties in pre-processing multiple chunks, such as if they are ordered differently, but caching two or more chunks as a single prefix is possible.

Image Results

Most RAG applications are text-to-text, such as answering a user’s question in a support chatbot. But what if your RAG app wakes up as a salesy chatbot that wants to make a few commissions. Wouldn’t it want to show some pictures of the products to the user?

Yes, you want to show images, but no, you don’t want the LLM to create them. Using an advanced multimodal LLM to create images to return in its results would be: (a) slow, (b) expensive, and (c) unpredictable.

You really don’t want to worry about hallucinations in images. Instead, you want to supply a set of canned images that have been carefully vetted by your sales professionals. There are a few ways to do this:

  • Image database component
  • Image results LLM
  • Tie images to chunks or citations

The first point for latency is that your RAG application does not return the image file data. Rather, you’re returning a URL, or a section of HTML that shows an image, and letting the user’s browser load the image from wherever you have these images hosted on a web server. Usually, this would be the same method of serving images that you’re already using for your public website or intranet.

One way to choose which image URL to return is to create a whole new component that takes the user’s query and returns an image URL to display (from a predefined set). This is effectively a vector database that uses semantic embeddings, but returns image URLs or HTML snippets instead of text. Easier said than done!

If you want to chew up even more GPU juice, you could build an LLM for that! You could train an LLM to map input questions to image filenames or URLs. This is obviously going to be slower than an image vector database, but you’ve got a big compute budget, haven’t you?

Another way is to notice that you already have various attributes returned with each chunk by your retriever component (from the vector database). You can add image filenames, URLs, and other metadata like width and height to the existing database of chunks. Note that you don’t always want to show an image, and one of the metadata settings is whether or not to show an image for a given chunk. Another one could be that you want to show two images for a chunk.

Alternatively, rather than a mapping image URLs to each individual chunk, you can create a mapping between citations and the images to display. This is like using a document-level mapping to relevant images, rather than a chunk-level mapping. In either of these ways, you aren’t adding another vectorized embeddings search, but are leveraging off the existing one in the vector database for text chunks.

One final point: you could do the same thing for videos, too! (Also, audio voiceovers, background music, sound effects, GIF animations, SVG designs, or 3D augmented-reality world generation in real-time for goggle gaming.)

Beyond RAG

There’s a lot of different variations on the RAG architecture. Also, RAG architectures can be extended in various ways. Some of the similar capabilities with “augmentation” of the LLM’s input prompt with extra data include:

  • Retrieval Augmented Language Models (RALM) — the most general category including augmentation by basically anything; see more about RALM.
  • Tool-Augmented Language Models (TALM) — use dynamic tool execution to compute extra input data. See more about tool integrations.
  • Data source integrations (“plugins”) — extended ways to search big databases, such as real estate listings or the entire internet, using a RAG-like approach.
  • Agentic RAG — integrating agent features with RAG for action capabilities.
  • Table-Augmented Generation (TAG) — overcoming limitations in processing tabular data, to allow spreadsheets and classic SQL database rows to be used as RAG inputs.

The first three of these RAG extensions aim to provide the LLM with more data than from simple document chunks. RALM can access dynamic data in any possible way, of which the most obvious is via a “plug-in” interface, which generalizes the vector database lookup of RAG. TALM involves extending the input dynamically with tools that perform computations, and inject new data into the overall input sequence, rather than simply inserting verbatim sections of stored data.

Finally, note that classic RAG and these RAG extensions are an inherently “read-only” approach that only generates answers. It doesn’t change anything for the user, and the generalization of that idea is “agents” that can do real-world actions (i.e., they’re “read-write” and can do “actions”). For example, classic RAG could maybe tell you what your symptoms might be caused by, but an LLM agent can also book your doctor’s appointment for you. For more on this combination of technologies, see Chapter 15 covering agentic RAG architectures.

RALM

Retrieval Augmented Language Models (RALM) is the general method of using external data sources to make LLMs more powerful. It improves the “smartness” of the LLM, rather than being a speed optimization. In fact, it is often slower, because accessing a secondary data source requires an extra step.

RALM and RAG are almost the same thing, but RALM is a little more general in allowing any kind of “retrieval” in the architecture. RALM is the general category of methods whereby an LLM is extended using an external data source. This is very similar to RAG architectures, which are a more specific type where the retrieval takes place in a datastore of document sections. Going beyond basic RAG chunk lookups, RALM may also include capabilities such as:

  • Data source integrations (“plug-ins”)
  • Tool Augmented Language Models (TALM)
  • “Hooks” for tool preprocessing

RALM generally refers to a read-only type architecture that simply returns information for the LLM to use, whereas more powerful two-way integrations with tools that “do” something are called “agents.” Read more about “agentic RAG” in Chapter 15.

Research Papers on RALM

Papers on the use of RALM techniques in LLMs and Transformer architectures:

  1. Dakhel, A.M., Nikanjam, A., Khomh, F., Desmarais, M.C., Washizaki, H. (2024). An Overview on Large Language Models. In: Nguyen-Duc, A., Abrahamsson, P., Khomh, F. (eds) Generative AI for Effective Software Development. Springer, Cham. https://doi.org/10.1007/978-3-031-55642-5_1 https://link.springer.com/chapter/10.1007/978-3-031-55642-5_1
  2. Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin, 29 May 2024, Nearest Neighbor Speculative Decoding for LLM Generation and Attribution, https://arxiv.org/abs/2405.19325 (Merging of RALM and speculative decoding.)
  3. Lianming Huang, Shangyu Wu, Yufei Cui, Ying Xiong, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 24 May 2024, RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference, https://arxiv.org/abs/2405.15198 (Early exit classifiers built with pre-computation using a retrieval database.)
  4. Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
  5. Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543 Project: https://github.com/2471023025/RALM_Survey
  6. Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
  7. Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022. https://arxiv.org/abs/2205.12255
  8. Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024, https://openreview.net/pdf?id=CDnv4vg02f
  9. Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, 2024, INFERCEPT: Efficient Intercept Support for Augmented Large Language Model Inference, https://openreview.net/pdf?id=wDDGQabYPQ
  10. Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang, 12 Jun 2024, Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling, https://arxiv.org/abs/2406.08116
  11. Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
  12. Vishal Rajput, Apr 16, 2024, RAG 2.0: Retrieval Augmented Language Models, https://medium.com/aiguys/rag-2-0-retrieval-augmented-language-models-3762f3047256
  13. Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, July 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:60626-60643, 2024, https://proceedings.mlr.press/v235/zhang24cq.html
  14. Gauthier Guinet, Behrooz Omidvar-Tehrani, Anoop Deoras, Laurent Callot, July 2024, Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:16773-16801, 2024, https://proceedings.mlr.press/v235/guinet24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/guinet24a/guinet24a.pdf
  15. Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li, July 2024, C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:22963-23000, 2024, https://proceedings.mlr.press/v235/kang24a.html
  16. Seong-Il Park, Jay-Yoon Lee, 19 Oct 2024, Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models, https://arxiv.org/abs/2410.15107
  17. Rana Shahout, Cong Liang, Shiji Xin, Qianru Lao, Yong Cui, Minlan Yu, Michael Mitzenmacher, 23 Oct 2024, Efficient Inference for Augmented Large Language Models, https://arxiv.org/abs/2410.18248
  18. Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, Jie Tang, 23 Oct 2024, LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering, https://arxiv.org/abs/2410.18050 https://github.com/QingFei1/LongRAG
  19. Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830

RAG Tool Usage

LLMs need to use dynamic tools to create some results. For example, to answer a user asking what time it is, the LLM needs to call a “clock” tool. Searching the internet needs a different type of tool.

Typically, these have been a different part of LLM architectures to RAG. Never the twain shall meet. But recently, there has been some research on merging the requirements for RAG chunks and tool usage. The general idea is:

  • Chunks — static or fixed answers.
  • Tools — dynamic answers that require computation.

Research on RAG architectures using dynamic tools:

  1. Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
  2. Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
  3. Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
  4. Mengsong Wu, Tong Zhu, Han Han, Xiang Zhang, Wenbiao Shao, Wenliang Chen, 21 Mar 2025, Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models, https://arxiv.org/abs/2503.16779 https://github.com/fairyshine/Chain-of-Tools

TAG

Table Augmented Generation (TAG) is an extension of RAG to return live tables of data relevant to a query. This involves direct access to a database system to return whatever rows of data that are related to the user’s query. Compared to regular RAG chunks, this method allows returning much more specific, personalized and up-to-date data that is relevant to the specific query. This method is also often easier to update, because the business database is already being updated, so the extra step of exporting and chunking updated data is avoided.

There are many applicable usages of this TAG architecture, limited only by the variety of databases available. The TAG architecture is very flexible, and is also similar to the use of data source integrations in agentic RAG architectures.

Research papers on TAG include those on the use of Excel spreadsheets and SQL databases:

  1. Sreedevi Gogusetty, Dec 6, 2024, From RAG to TAG: Leveraging the Power of Table-Augmented Generation (TAG): A Leap Beyond Retrieval-Augmented Generation (RAG), https://ai.plainenglish.io/from-rag-to-tag-leveraging-the-power-of-table-augmented-generation-tag-a-leap-beyond-54d1cfadb994 (TAG for augmenting LLMs with queries from database tables, similar to data source plugins.)
  2. Zipeng Qiu, You Peng, Guangxin He, Binhang Yuan, Chen Wang, 29 Nov 2024, TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension, https://arxiv.org/abs/2411.19504
  3. Tom Martin, Oct 15, 2024, From RAG to TAG: Exploring the Power of Table-Augmented Generation (TAG): A Leap Beyond Retrieval-Augmented Generation (RAG), https://ai.plainenglish.io/from-rag-to-tag-exploring-the-power-of-table-augmented-generation-tag-a-leap-beyond-b2c165309f63
  4. Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia, 27 Aug 2024, Text2SQL is Not Enough: Unifying AI and Databases with TAG, https://arxiv.org/abs/2408.14717, Code: https://github.com/TAG-Research/TAG-Bench

RAG Knowledge Graph

A RAG Knowledge Graph architecture, or a “RAG Graph,” is a combination of RAG with a Knowledge Graph. Instead of returning text chunks, the retriever returns a structured “graph” that represents additional knowledge. The advantage of a graph is that it contains concept relationships such as hierarchies.

Research on RAG with Knowledge Graphs:

  1. Dr. Ashish Bamania, Aug 2024, ‘MedGraphRAG’ Is A Complete Game Changer For AI In Medicine A deep-dive into how RAG, GraphRAG, and MedGraphRAG work and how they significantly improve the performance of LLM responses in Medicine, https://levelup.gitconnected.com/medgraphrag-is-a-complete-game-changer-for-ai-in-medicine-c6b41b0effd6
  2. Junde Wu, Jiayuan Zhu, Yunli Qi, 8 Aug 2024, Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2408.04187 Code: https://github.com/MedicineToken/Medical-Graph-RAG/tree/main
  3. Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao, 26 May 2024, GRAG: Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2405.16506
  4. Philip Rathle, Jul 11, 2024, The GraphRAG Manifesto: Adding Knowledge to GenAI, https://neo4j.com/blog/graphrag-manifesto/
  5. Microsoft, Aug 2024 (accessed), GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system, https://github.com/microsoft/graphrag
  6. Chia Jeng Yang, Dec 14, 2023, A first intro to Complex RAG (Retrieval Augmented Generation), https://medium.com/enterprise-rag/a-first-intro-to-complex-rag-retrieval-augmented-generation-a8624d70090f
  7. Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
  8. Lei Liang, Mengshu Sun, Zhengke Gui, Zhongshu Zhu, Zhouyu Jiang, Ling Zhong, Yuan Qu, Peilong Zhao, Zhongpu Bo, Jin Yang, Huaidong Xiong, Lin Yuan, Jun Xu, Zaoyang Wang, Zhiqiang Zhang, Wen Zhang, Huajun Chen, Wenguang Chen, Jun Zhou, 24 Sep 2024 (v2), KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, https://arxiv.org/abs/2409.13731
  9. Xinke Jiang, Rihong Qiu, Yongxin Xu, Wentao Zhang, Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang, 31 Oct 2024, RAGraph: A General Retrieval-Augmented Graph Learning Framework, https://arxiv.org/abs/2410.23855
  10. Cristian-George Crăciun, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel, Mihaela-Claudia Cercel, 5 Dec 2024, GRAF: Graph Retrieval Augmented by Facts for Legal Question Answering, https://arxiv.org/abs/2412.04119

Multimodal RAG

The obvious extension to chunking text documents is for RAG to ingest multimodal data. Whereas text-based RAG is the bread-and-butter of AI projects in business, this next step is an emerging field of research, and is advancing rapidly. Initial work has focused mainly on extending document chunking to the visual representation of those same documents (i.e., direct from PDFs), but the trend towards ingesting all manner of visual data sources is clearly coming soon to a RAG bot near you.

Depending on the product, you might want multimodal data for your RAG project. There’s also these types of data to inventory:

  • Images (e.g., photos, diagrams, drawings)
  • Videos
  • Animations
  • Audio (e.g., music, sound effects)
  • Advanced formats (e.g., 3D CAD/CAM design data)

Each of these non-text data sources has a variety of different available formats. I’m not going to go into the details of image formats and video codecs, because, well, I don’t know anything about that stuff, although I know someone who does.

Research papers on visual RAG:

  1. Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun, 14 Oct 2024, VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents, https://arxiv.org/abs/2410.10594 https://github.com/openbmb/visrag
  2. Junyuan Zhang, Qintong Zhang, Bin Wang, Linke Ouyang, Zichen Wen, Ying Li, Ka-Ho Chow, Conghui He, Wentao Zhang, 3 Dec 2024, OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation, https://arxiv.org/abs/2412.02592 https://github.com/opendatalab/OHR-Bench
  3. Junjie Zhou, Zheng Liu, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao, Chen Jason Zhang, Defu Lian, Yongping Xiong, 19 Dec 2024, MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval, https://arxiv.org/abs/2412.14475
  4. Jaemin Cho, Debanjan Mahata, Ozan Irsoy, Yujie He, Mohit Bansal, 7 Nov 2024, M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding, https://arxiv.org/abs/2411.04952 https://m3docrag.github.io/
  5. Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, Dinesh Manocha, 14 Dec 2024, VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation, https://arxiv.org/abs/2412.10704

Information Retrieval Algorithms

What’s next after vector databases? The generic research field is “information retrieval” and it predates AI research by decades. People have been scanning through documents with computers for over 60 years. Some of that older research is now getting re-purposed to extend RAG retrievers beyond keyword search (e.g., BM25) and vector databases.

The idea of vector databases is to have a single embedding vector representing each query and document chunk, with a focus on text for both. When you think about it like that, you can see that it’s more general than keyword search, but also obviously there are ways to go beyond that.

The focus of this research is similar to multimodal RAG, but mostly limited to better indexing by examining visual elements of PDFs, rather than surfacing image portions in results. Some of the contenders in this space are:

  • ColPali
  • ColBERT
  • ColQwen

And of course there are research papers for that!

  1. Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo, 7 Oct 2024 (v3), ColPali: Efficient Document Retrieval with Vision Language Models, https://arxiv.org/abs/2407.01449
  2. Omar Khattab, Matei Zaharia, 4 Jun 2020 (v2), ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, https://arxiv.org/abs/2004.12832
  3. Jaemin Cho, Debanjan Mahata, Ozan Irsoy, Yujie He, Mohit Bansal, 7 Nov 2024, M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding, https://arxiv.org/abs/2411.04952 https://m3docrag.github.io/

Embedding Models

The area of embedding models is a research area in itself. Choosing and tweaking an embedding model is an important part of RAG implementation. Some of the choices include:

  • Matryoska embeddings (the de facto standard)
  • Nomic embed
  • OpenAI embeddings
  • CDE embeddings

Model evaluation for embeddings models is also a specialty. The best known evaluation benchmark is Massive Text Embedding Benchmark (MTEB).

Research papers on embedding models include:

  1. HF, February 23, 2024, Introduction to Matryoshka Embedding Models, https://huggingface.co/blog/matryoshka
  2. Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi, 8 Feb 2024 (v4), Matryoshka Representation Learning, https://arxiv.org/abs/2205.13147 https://github.com/RAIVNLab/MRL
  3. OpenAI, January 25, 2024, New embedding models and API updates, https://openai.com/index/new-embedding-models-and-api-updates/
  4. Nomic Team, 2024, Introducing Nomic Embed: A Truly Open Embedding Model, https://www.nomic.ai/blog/posts/nomic-embed-text-v1
  5. Zach Nussbaum, John X. Morris, Brandon Duderstadt, Andriy Mulyar, 2 Feb 2024, Nomic Embed: Training a Reproducible Long Context Text Embedder, https://arxiv.org/abs/2402.01613
  6. Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, Han Xiao, 19 Sep 2024 (v3), jina-embeddings-v3: Multilingual Embeddings With Task LoRA, https://arxiv.org/abs/2409.10173
  7. John X. Morris, Alexander M. Rush, 8 Nov 2024 (v4), Contextual Document Embeddings, https://arxiv.org/abs/2410.02525
  8. Niklas Muennighoff, Nouamane Tazi, Loïc Magne, Nils Reimers, 19 Mar 2023 (v3), MTEB: Massive Text Embedding Benchmark, https://arxiv.org/abs/2210.07316 https://github.com/embeddings-benchmark/mteb

Advanced RAG Research Papers

Research papers on advanced RAG architectures:

  1. Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
  2. Akash Bajwa and Chia Jeng Yang, May 27, 2024, The RAG Stack: Featuring Knowledge Graphs: Reducing Hallucinations To Make LLMs Production-Grade With Complex RAG, https://akashbajwa.substack.com/p/the-rag-stack-featuring-knowledge
  3. Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz, 31 Jul 2024, Adaptive Retrieval-Augmented Generation for Conversational Systems, https://arxiv.org/abs/2407.21712 (Deciding whether or not to include a RAG external data request in the inference of a chatbot in a multi-turn conversation.)
  4. Igor Novikov, Jul 23, 2024, RAG Architecture: Advanced RAG, https://pub.towardsai.net/rag-architecture-advanced-rag-3fea83e0d189
  5. Vishal Rajput, Apr 16, 2024, RAG 2.0: Retrieval Augmented Language Models, https://medium.com/aiguys/rag-2-0-retrieval-augmented-language-models-3762f3047256
  6. Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
  7. Chandini Jain, Aug 15, 2024, The magic of RAG is in the retrieval, https://www.infoworld.com/article/3484132/the-magic-of-rag-is-in-the-retrieval.html (Quality of RAG answers is more dependent on the retriever than the LLM, needing both high quality data availability and accurate retriever query lookup.)
  8. Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, Dhagash Mehta, 9 Aug 2024, HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction, https://arxiv.org/abs/2408.04948
  9. Florian June, Jul 14, 2024, Three Practical Challenges of RAG and Their Mitigation Ideas: Strategies for Overcoming Obstacles in Real-World RAG Projects https://ai.gopubby.com/three-practical-challenges-of-rag-and-their-mitigation-ideas-5cc8e6dd7e30
  10. Matei Zaharia, Omar Khattab, Lingjiao Chen, Jared Quincy Davis, Heather Miller, Chris Potts, James Zou, Michael Carbin, Jonathan Frankle, Naveen Rao, Ali Ghodsi, Feb 18, 2024, The Shift from Models to Compound AI Systems, https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/
  11. Dr. Ashish Bamania, Aug 2024, ‘MedGraphRAG’ Is A Complete Game Changer For AI In Medicine A deep-dive into how RAG, GraphRAG, and MedGraphRAG work and how they significantly improve the performance of LLM responses in Medicine, https://levelup.gitconnected.com/medgraphrag-is-a-complete-game-changer-for-ai-in-medicine-c6b41b0effd6
  12. Junde Wu, Jiayuan Zhu, Yunli Qi, 8 Aug 2024, Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2408.04187 Code: https://github.com/MedicineToken/Medical-Graph-RAG/tree/main
  13. Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao, 26 May 2024, GRAG: Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2405.16506
  14. Philip Rathle, Jul 11, 2024, The GraphRAG Manifesto: Adding Knowledge to GenAI, https://neo4j.com/blog/graphrag-manifesto/
  15. Tomaž Bratanič, Mar 12, 2024, Implementing Advanced Retrieval RAG Strategies With Neo4j, https://neo4j.com/developer-blog/advanced-rag-strategies-neo4j/
  16. Microsoft, Aug 2024 (accessed), GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system, https://github.com/microsoft/graphrag
  17. Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, July 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:60626-60643, 2024, https://proceedings.mlr.press/v235/zhang24cq.html
  18. Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li, July 2024, C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:22963-23000, 2024, https://proceedings.mlr.press/v235/kang24a.html
  19. Ahmed Besbes, Aug 24, 2024, What Nobody Tells You About RAGs, https://towardsdatascience.com/what-nobody-tells-you-about-rags-b35f017e1570
  20. Ayush RoyChowdhury, Mulong Luo,, Prateek Sahu,, Sarbartha Banerjee, Mohit Tiwari, Aug 2024, ConfusedPilot: Confused Deputy Risks in RAG-based LLMs, https://confusedpilot.info/confused_pilot_new.pdf
  21. Ziyan Jiang, Xueguang Ma, Wenhu Chen, June 2024, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, arXiv preprint arXiv:2406.15319, https://arxiv.org/abs/2406.15319 (Improved accuracy performance of RAG methods when using a long context LLM and longer chunk sizes for the retriever.)
  22. Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak, 5 Aug 2024, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545 https://github.com/IntelLabs/RAGFoundry
  23. Jiajie Jin, Yutao Zhu, Xinyu Yang, Chenghao Zhang, Zhicheng Dou, 22 May 2024, FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, https://arxiv.org/abs/2405.13576 https://github.com/RUC-NLPIR/FlashRAG
  24. David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, Stéphane Clinchant, 1 Jul 2024, BERGEN: A Benchmarking Library for Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01102
  25. Ayush Thakur, Raghav Gupta, 13 Apr 2024, Introducing Super RAGs in Mistral 8x7B-v1, https://arxiv.org/abs/2404.08940
  26. SuperAgent, 2024, Super-Rag with SAML, https://docs.superagent.sh/overview/rag-retrieval/super-rag-with-saml
  27. Andrew Ditmer, May 13 2024, SuperRAG – How to achieve higher accuracy with Retrieval Augmented Generation, https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/superrag-how-to-achieve-higher-accuracy-with-retrieval-augmented/ba-p/4139004
  28. Chia Jeng Yang, Dec 14, 2023, A first intro to Complex RAG (Retrieval Augmented Generation), https://medium.com/enterprise-rag/a-first-intro-to-complex-rag-retrieval-augmented-generation-a8624d70090f
  29. Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
  30. Chandini Jain, August 28, 2024, The magic of RAG is in the retrieval, https://edt.infoworld.com/q/1tldUPQDxjluYqjeyhS98AV4/wv
  31. NirDiamant, Aug 2024, Advanced RAG Techniques: Elevating Your Retrieval-Augmented Generation Systems, https://github.com/NirDiamant/RAG_Techniques
  32. Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia, 27 Aug 2024, Text2SQL is Not Enough: Unifying AI and Databases with TAG, https://arxiv.org/abs/2408.14717 https://github.com/TAG-Research/TAG-Bench
  33. Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi, 19 Jul 2024 (v2), Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation https://arxiv.org/abs/2404.06910 (Process each RAG chunk in parallel and choose a final output.)
  34. Zheng Wang, Shu Xian Teo, Jieer Ouyang, Yongjun Xu, Wei Shi, 26 May 2024, M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions, https://arxiv.org/abs/2405.16420
  35. Shenggang Li, Jul 30, 2024, Mem0: Is This the Future of AI Memory Management? https://ai.gopubby.com/mem0-is-this-the-future-of-ai-memory-management-1e228dc8220a
  36. C Yang, S Fujita, 2024, Adaptive Control of Retrieval-Augmented Generation for LLMs Through Reflective Tags, https://www.preprints.org/manuscript/202408.2152/download/final_file
  37. Thuwarakesh Murallie, Aug 2024, How to Achieve Near Human-Level Performance in Chunking for RAGs: The costly yet powerful splitting technique for superior RAG retrieval, https://towardsdatascience.com/agentic-chunking-for-rags-091beccd94b1
  38. Dom Couldwell, Sep 03, 2024 Dealing with ‘day two’ issues in generative AI deployments, https://www.infoworld.com/article/3493255/dealing-with-day-two-issues-in-generative-ai-deployments.html
  39. Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela, 17 Apr 2024 (v2), Generative Representational Instruction Tuning, https://arxiv.org/abs/2402.09906
  40. Tan Yu, Anbang Xu, Rama Akkiraju, 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666
  41. Florian June, Feb 3, 2024, Advanced RAG 02: Unveiling PDF Parsing, https://pub.towardsai.net/advanced-rag-02-unveiling-pdf-parsing-b84ae866344e
  42. Lior Solomon, Sep 2024, Gen AI testing strategies and tools, https://medium.com/ai-in-grc/gen-ai-testing-strategies-and-tools-257383e5cbfb
  43. Vivedha Elango, Sep 2024, Search in the age of AI- Retrieval methods for Beginners, https://ai.gopubby.com/search-in-the-age-of-ai-retrieval-methods-for-beginners-557621e12ded
  44. Ali Forootani, Danial Esmaeili Aliabadi, Daniela Thraen, 11 Sep 2024, Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education, https://arxiv.org/abs/2409.07110
  45. Louis Bouchard, Sep 13, 2024, Top RAG Techniques You Should Know (Wang et al., 2024), https://www.louisbouchard.ai/top-rag-techniques/
  46. Sascha Heyer, Sep 2024, RAG API: 30 lines of code is all you need for RAG. The easiest way to get started with RAG. https://medium.com/google-cloud/google-cloud-rag-api-c7e3c9931b3e
  47. Florian June, Sep 2024, Kotaemon Unveiled: Innovations in RAG Framework for Document QA: PDF Parsing, GraphRAG, Agent-Based Reasoning, and Insights, https://ai.gopubby.com/kotaemon-unveiled-innovations-in-rag-framework-for-document-qa-0b6d67e4b9b7
  48. Michael D. Skarlinski, James D. Braza, SamCox, Michaela Hinks, Manvitha Ponnapati, Samuel G. Rodriques, Jon M. Laurent, Michael J. Hammerling, Andrew D. White, Sep 2024, Language Agents Achieve Superhuman Synthesis of Scientific Knowledge, https://storage.googleapis.com/fh-public/paperqa/Language_Agents_Science.pdf https://github.com/Future-House/paper-qa
  49. Pathway, Sep 2024, 2024 Top RAG Frameworks, https://pathway.com/rag-frameworks
  50. Anthropic, 20 Sept 2024, Introducing Contextual Retrieval, https://www.anthropic.com/news/contextual-retrieval
  51. Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
  52. Lei Liang, Mengshu Sun, Zhengke Gui, Zhongshu Zhu, Zhouyu Jiang, Ling Zhong, Yuan Qu, Peilong Zhao, Zhongpu Bo, Jin Yang, Huaidong Xiong, Lin Yuan, Jun Xu, Zaoyang Wang, Zhiqiang Zhang, Wen Zhang, Huajun Chen, Wenguang Chen, Jun Zhou, 24 Sep 2024 (v2), KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, https://arxiv.org/abs/2409.13731
  53. Surya Maddula, Sep 2024, Not RAG, but RAG Fusion? Understanding Next-Gen Info Retrieval. https://pub.towardsai.net/not-rag-but-rag-fusion-understanding-next-gen-info-retrieval-477788da02e2
  54. Adrian H. Raudaschl, Oct 6, 2023, Forget RAG, the Future is RAG-Fusion: The Next Frontier of Search: Retrieval Augmented Generation meets Reciprocal Rank Fusion and Generated Queries, https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
  55. Deval Shah, Jul 4, 2024, Reciprocal Rank Fusion (RRF) explained in 4 mins — How to score results from multiple retrieval methods in RAG: Unlock the power of Reciprocal Rank Fusion in Retrieval-Augmented Generation. https://medium.com/@devalshah1619/mathematical-intuition-behind-reciprocal-rank-fusion-rrf-explained-in-2-mins-002df0cc5e2a
  56. Vishal Rajput, Sep 27, 2024, Why Scaling RAGs For Production Is So Hard? https://medium.com/aiguys/why-scaling-rags-for-production-is-so-hard-a2f540785e97
  57. Chirag Agrawal, Sep 20, 2024, Unlocking the Power of Efficient Vector Search in RAG Applications, https://pub.towardsai.net/unlocking-the-power-of-efficient-vector-search-in-rag-applications-c2e3a0c551d5
  58. Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong, 3 Oct 2024, UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation, https://arxiv.org/abs/2410.02719
  59. Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky, 6 Oct 2024, Inference Scaling for Long-Context Retrieval Augmented Generation, https://arxiv.org/abs/2410.04343
  60. Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan O. Arik, 8 Oct 2024, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983
  61. Zhangchi Feng, Dongdong Kuang, Zhongyuan Wang, Zhijie Nie, Yaowei Zheng, Richong Zhang, 15 Oct 2024 (v2), EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations, https://arxiv.org/abs/2410.10315 https://github.com/BUAADreamer/EasyRAG
  62. Barhoumi Mosbeh, Sep 29, 2024, Anthropic’s New RAG Approach, https://pub.towardsai.net/anthropics-new-rag-approach-e0c24a68893b
  63. Tianyang Zhang, Zhuoxuan Jiang, Shengguang Bai, Tianrui Zhang, Lin Lin, Yang Liu, Jiawei Ren, 21 Oct 2024, RAG4ITOps: A Supervised Fine-Tunable and Comprehensive RAG Framework for IT Operations and Maintenance, https://arxiv.org/abs/2410.15805
  64. Ran Xu, Hui Liu, Sreyashi Nag, Zhenwei Dai, Yaochen Xie, Xianfeng Tang, Chen Luo, Yang Li, Joyce C. Ho, Carl Yang, Qi He, 23 Oct 2024, SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains, https://arxiv.org/abs/2410.17952
  65. Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
  66. Kibeom Lee, Oct 2024, Retrieval-Augmented Generation: Enhancing LLMs with Dynamic Information Access, https://sendbird.com/developer/tutorials/rag (Covers BM25 “Best Match 25” vector search for RAG.)
  67. Damian Gil, Apr 17, 2024, Advanced Retriever Techniques to Improve Your RAGs, https://towardsdatascience.com/advanced-retriever-techniques-to-improve-your-rags-1fac2b86dd61
  68. Vectorize, October 29, 2024, Multimodal RAG Patterns Every AI Developer Should Know, https://vectorize.io/multimodal-rag-patterns/
  69. Tolga Şakar and Hakan Emekci, 30 October 2024, Maximizing RAG efficiency: A comparative analysis of RAG methods, Natural Language Processing. doi:10.1017/nlp.2024.53, https://www.cambridge.org/core/journals/natural-language-processing/article/maximizing-rag-efficiency-a-comparative-analysis-of-rag-methods/D7B259BCD35586E04358DF06006E0A85 https://www.cambridge.org/core/services/aop-cambridge-core/content/view/D7B259BCD35586E04358DF06006E0A85/S2977042424000530a.pdf/div-class-title-maximizing-rag-efficiency-a-comparative-analysis-of-rag-methods-div.pdf
  70. Sebastian Petrus, Sep 4, 2024, Top 10 RAG Frameworks Github Repos 2024, https://sebastian-petrus.medium.com/top-10-rag-frameworks-github-repos-2024-12b2a81f4a49
  71. Jason Perlow, Nov. 6, 2024, The best open-source AI models: All your free-to-use options explained: Here are the best open-source and free-to-use AI models for text, images, and audio, organized by type, application, and licensing considerations. https://www.zdnet.com/article/the-best-open-source-ai-models-all-your-free-to-use-options-explained/
  72. Ziting Wang, Haitao Yuan, Wei Dong, Gao Cong, Feifei Li, 1 Nov 2024, CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation, https://arxiv.org/abs/2411.00744
  73. Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
  74. Emilia David, November 8, 2024, Multimodal RAG is growing, here’s the best way to get started, https://venturebeat.com/ai/multimodal-rag-is-growing-heres-the-best-way-to-get-started/
  75. Shubham Sharma. November 12, 2024, How agentic RAG can be a game-changer for data processing and retrieval, https://venturebeat.com/ai/how-agentic-rag-can-be-a-game-changer-for-data-processing-and-retrieval/
  76. Alden Do Rosario, Nov 2024, Dear IT Departments, Please Stop Trying To Build Your Own RAG, https://pub.towardsai.net/dear-it-departments-please-stop-trying-to-build-your-own-rag-4546b4638273
  77. Cobus Greyling, Nov 2024, Four Levels of RAG — Research from Microsoft. Improving Retrieval-Augmented Generation (RAG) involves classifying queries based on user intent & focusing on context. Also utilising SLMs and fine-tuning to deliver more accurate & relevant results. https://cobusgreyling.medium.com/four-levels-of-rag-research-from-microsoft-fdc54388f0ff
  78. Rupali Patil, Nov 10, 2024, RAGate: Adaptive RAG for Conversational AI, https://pub.towardsai.net/ragate-adaptive-rag-for-conversational-ai-94b5ca469b7d
  79. Shalin Shah, Srikanth Ryali, Ramasubbu Venkatesh, 8 Nov 2024, Multi-Document Financial Question Answering using LLMs, https://arxiv.org/abs/2411.07264
  80. Alexandria Leto, Cecilia Aguerrebere, Ishwar Bhati, Ted Willke, Mariano Tepper, Vy Ai Vo, 11 Nov 2024, Toward Optimal Search and Retrieval for RAG, https://arxiv.org/abs/2411.07396
  81. Jiejun Tan, Zhicheng Dou, Wen Wang, Mang Wang, Weipeng Chen, Ji-Rong Wen, 5 Nov 2024, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems, https://arxiv.org/abs/2411.02959
  82. Louis-François Bouchard, Nov 22, 2024, Advanced RAG Evaluation Techniques for Optimal LLM Performance. Why RAG Evaluation Matters and Techniques to Leverage, https://louisbouchard.substack.com/p/advanced-rag-evaluation-techniques
  83. Sonal Prabhune, Donald J. Berndt, 7 Nov 2024, Deploying Large Language Models With Retrieval Augmented Generation, https://arxiv.org/abs/2411.11895
  84. Mohammad Hassan Heydari, Arshia Hemmat, Erfan Naman, Afsaneh Fatemi. 25 Nov 2024, Context Awareness Gate For Retrieval Augmented Generation, https://arxiv.org/abs/2411.16133
  85. Shengming Zhao, Yuheng Huang, Jiayang Song, Zhijie Wang, Chengcheng Wan, Lei Ma, 29 Nov 2024, Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems, https://arxiv.org/abs/2411.19463
  86. Matvey Arye, Avthar Sewrathan, 29 Oct 2024, Vector Databases Are the Wrong Abstraction, https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction/
  87. Chaitanya Sharma, 28 May 2025, Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers, https://arxiv.org/abs/2506.00054

 

Online: Table of Contents

PDF: Free PDF book download

Buy: RAG Optimization: Accurate and Efficient LLM Applications

RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization