Aussie AI

Chapter 2. Smarter RAG

Book Excerpt from "RAG Optimization: Accurate and Efficient LLM Applications"

by David Spuler and Michael Sharpe

Chapter 2. Smarter RAG

Why RAG is Smart

The whole point of RAG applications is to make them “smarter” than the basic LLM. The intention is to be equivalent to fine-tuning additional company-specific knowledge into the LLM. But RAG doesn’t do any training, but finds relevant extra information “chunks” in its databases, and presents them to the LLM for it to use as context.

The purpose of the extra content is to allow the LLM to be smarter in its answers. This achieves improved results in answer quality in multiple ways:

Proprietary data usage
Updated information
Reduced hallucinations

For this to work well, the basic RAG setup is required. Several components must be functioning correctly in order to get a good answer. The basic areas that are important to accuracy include:

Data completeness — enough chunks of relevant data.
Data accuracy — answers that are correct and up-to-date in the documents.
Retriever lookup — returning relevant document chunks for the user’s question.
RAG prompt — must contain: (a) global instructions, (b) chunks, and (c) the user’s query.
System instructions — prompt’s global instructions to the LLM are clear and effective.

And let us not forget, you need an LLM at the end that’s powerful enough to understand that big prompt text containing three major sections and generate an answer in lovely written prose.

RAG Accuracy Optimizations

There are several areas to configure and optimize to make your RAG application smarter. The main areas for improving accuracy are:

Documents used for chunks
Vector and keyword lookup algorithms
Reranking module
Combined prompting and packing
The main LLM underneath it all

Surprisingly, using a bigger LLM model is not highest on the list, because the RAG application is supposed to get its “smarts” from the RAG text chunks that it receives in the prompt. Accuracy of the results is not supposed to rely on the innate knowledge that has been pre-trained into the model parameters, nor has that LLM been “fine-tuned” with your company-specific data. It’s supposed to be an off-the-shelf LLM of reasonable capability, whether it’s an open source LLM or one accessed via a commercial API wrapper.

Some example strategies for improving an existing RAG application’s accuracy can include:

System prompt improvements
Chunking sizes and shapes
Overlapping chunks
Keyword lookup extensions
Query expansion (parallel lookup)
Pre-summarization of chunks

However, some of these ideas are becoming less important as LLM context windows expand. Early LLMs had a context limit of about 4,000 tokens, but now 128k tokens is quite common, and these are called “long context” LLMs. Note that some models are exceeding the 1M token threshold, which are called “ultralong context” models. Using a longer context LLM can be a powerful way to increase accuracy:

Larger chunks!
Long RAG optimizations
Mini-RAG (single document only)

See Chapter 10 for more details about long context RAG architectures and the single document variants that have become possible.

Going even further, if you want to add more to your RAG application, and give yourself some more coding work to do, here are some ideas:

Automatic prompt optimization component
Refusal module (safety)
Prompt shield (jailbreak prevention)

I’m sure you can think of a few more.

Data Chunk Pipeline

Data in the main documents is important for accuracy, as you won’t get very far without good data. Some of the main issues with optimizing chunks out of documents include:

Ensure you have enough RAG documents that there are chunks for all the (reasonable) questions you’re likely to get.
Cull any irrelevant or ambiguous chunks of data.
Use larger chunk sizes (to ensure each chunk fully covers its topic).

The next step is ensuring that the best chunks for each user’s query are being found by the two lookup engines (vector database and keyword database). Techniques include:

Ensure the embedding model is capturing the semantic meaning of all terms or jargon, some of which may be specific to your industry or company.
Tune the accuracy of the vector database lookup.
Tweak the settings of your keyword lookup datastore.
Return more chunks from either database in the retriever (to give the LLM more context).

The final step is to review the mechanisms whereby the data chunks are being arranged into a prompt that gets sent to the LLM. Areas to examine include:

Review the main meta-prompt text to ensure the LLM is getting strong global instructions.
Ensure that text chunks are clearly separated from the (a) global instructions, and (b) user query text, in the prompt template text.
Test the “reverse” packing algorithm (worst-to-best) for ordering the chunks in the prompt text.

One underappreciated issue with chunking and RAG is the amount of manual work that is required. Yeah, AI is supposed to be smart, but come on, it’s not that smart. You have a bigger brain and it needs to do this:

Test your RAG application like crazy.
Manually tweak keywords to fix any lookup problems.
Find new text chunks to fill holes.

Overall, testing of the pipeline whereby chunks are added to queries and then used in answers is very important for creating a smart RAG application.

Prompt Optimization

RAG systems rely on LLMs underneath for the heavy lifting based on prompting. There are several ways that modifications to prompting can improve the accuracy of a RAG system.

Basic prompt engineering — changes to the system prompts.
Automatic prompt optimization — LLM-based prompt rewriting.
Query enhancement — using modified prompts in retrieval.

The simplest idea is just to tweak the words in the overall RAG system prompt. Any of the various advanced prompt engineering techniques can be worth considering. The details of prompt engineering are examined in detail in Chapter 7.

Automatic Prompt Optimization. The idea of using LLMs to automatically improve human queries into better prompts has been gaining traction in general LLM usage. This technique is also called “programmatic prompting” in some research papers. There hasn’t been as much research on the area in relation to RAG architectures, but there’s no reason that it wouldn’t be applicable.

The idea is simply to take the user’s query as LLM input and output a better prompt for AI. Surprisingly, or perhaps unsurprisingly, LLMs are great at tweaking the words in a prompt to get a better answer from an AI engine.

The downside is obviously that it’s an extra LLM call, leading to extra cost and increased response-time latency, although the prompt optimizer LLM can be a smaller model than used for the final RAG answers.

References on Automatic Prompt Optimization. Research papers on prompt optimization include:

Cameron R. Wolfe, Nov 04, 2024, Automatic Prompt Optimization. Practical techniques for improving prompt quality without manual effort, https://cameronrwolfe.substack.com/p/automatic-prompt-optimization
Akshay Nambi , Tanuja Ganu , December 17, 2024, PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts, https://www.microsoft.com/en-us/research/blog/promptwizard-the-future-of-prompt-optimization-through-feedback-driven-self-evolving-prompts/
Shuzheng Gao, Chaozheng Wang, Cuiyun Gao, Xiaoqian Jiao, Chun Yong Chong, Shan Gao, Michael Lyu, 2 Jan 2025, The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation, https://arxiv.org/abs/2501.01329
Wenxin Luo, Weirui Wang, Xiaopeng Li, Weibo Zhou, Pengyue Jia, Xiangyu Zhao, 12 Jan 2025, TAPO: Task-Referenced Adaptation for Prompt Optimization, https://arxiv.org/abs/2501.06689
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Can Wang, Dianbo Sui, Bolin Zhang, Xiaoyu Liu, Jiabao Kang, Zhidong Qiao, Zhiying Tu, Jan 2025, A Framework for Effective Invocation Methods of Various LLM Services, Proceedings of the 31st International Conference on Computational Linguistics, pages 6953–6965, January 19–24, 2025, Association for Computational Linguistics, https://aclanthology.org/2025.coling-main.464.pdf
Yuanheng Fang, Guoqing Chao, Wenqiang Lei, Shaobo Li, Dianhui Chu, 21 Jan 2025, CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning, https://arxiv.org/abs/2501.12226 (CoT with integration of clustering and prompt optimization techniques.)
Krish Maniar and William Fu-Hinthorn, LangChain, Jan 28, 2025, Exploring Prompt Optimization, https://blog.langchain.dev/exploring-prompt-optimization/ (Long article evaluating various LLMs for automatic prompt optimization.)
Mingze Kong, Zhiyong Wang, Yao Shu, Zhongxiang Dai, 2 Feb 2025, Meta-Prompt Optimization for LLM-Based Sequential Decision Making, https://arxiv.org/abs/2502.00728
Jinyu Xiang, Jiayi Zhang, Zhaoyang Yu, Fengwei Teng, Jinhao Tu, Xinbing Liang, Sirui Hong, Chenglin Wu, Yuyu Luo, 7 Feb 2025. Self-Supervised Prompt Optimization, https://arxiv.org/abs/2502.06855 https://github.com/geekan/MetaGPT
Yupeng Chang, Yi Chang, Yuan Wu, 20 Feb 2025, Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization, https://arxiv.org/abs/2502.14211 https://github.com/llm172/Transfer-Prompting
Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa, 3 Mar 2025, Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers, https://arxiv.org/abs/2503.01163 https://github.com/shiralab/OPTS
Leixian Shen, Haotian Li, Yifang Wang, Xing Xie, Huamin Qu, 4 Mar 2025, Prompting Generative AI with Interaction-Augmented Instructions, https://arxiv.org/abs/2503.02874
Thilak Shekhar Shriyan, Janavi Srinivasan, Suhail Ahmed, Richa Sharma, Arti Arya, March 2025, SwarmPrompt: Swarm Intelligence-Driven Prompt Optimization Using Large Language Models, Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025), Volume 3, pages 86-93, https://www.scitepress.org/Papers/2025/130903/130903.pdf
Dengyun Peng, Yuhang Zhou, Qiguang Chen, Jinhao Liu, Jingjing Chen, Libo Qin, 19 Mar 2025 (v3), DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective, https://arxiv.org/abs/2503.13413 https://github.com/sfasfaffa/DLPO
Vagner Figueredo de Santana, Sara Berger, Tiago Machado, Maysa Malfiza Garcia de Macedo, Cassia Sampaio Sanctos, Lemara Williams, and Zhaoqing Wu, 2025, Can LLMs Recommend More Responsible Prompts? In Proceedings of the 30th International Conference on Intelligent User Interfaces (IUI '25). Association for Computing Machinery, New York, NY, USA, 298–313. https://doi.org/10.1145/3708359.3712137 https://dl.acm.org/doi/full/10.1145/3708359.3712137 https://dl.acm.org/doi/pdf/10.1145/3708359.3712137
Jiale Cheng, Ruiliang Lyu, Xiaotao Gu, Xiao Liu, Jiazheng Xu, Yida Lu, Jiayan Teng, Zhuoyi Yang, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang, 26 Mar 2025, VPO: Aligning Text-to-Video Generation Models with Prompt Optimization, https://arxiv.org/abs/2503.20491
Jian Zhang, Zhangqi Wang, Haiping Zhu, Jun Liu, Qika Lin, Erik Cambria, 21 Mar 2025, MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization, https://arxiv.org/abs/2503.16874
AD Al Hauna, AP Yunus, M Fukui, S Khomsah - International Journal on Robotics, Apr 2025, Enhancing LLM Efficiency: A Literature Review of Emerging Prompt Optimization Strategies, https://doi.org/10.33093/ijoras.2025.7.1.9 https://mmupress.com/index.php/ijoras/article/view/1311 PDF: https://mmupress.com/index.php/ijoras/article/view/1311/834
Ximing Dong, Shaowei Wang, Dayi Lin, Ahmed E. Hassan, 15 May 2025, Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization, https://arxiv.org/abs/2505.10736
Zixiao Zhu, Hanzhang Zhou, Zijian Feng, Tianjiao Li, Chua Jia Jim Deryl, Mak Lee Onn, Gee Wah Ng, Kezhi Mao, 20 May 2025 (v2), Rethinking Prompt Optimizers: From Prompt Merits to Optimization, https://arxiv.org/abs/2505.09930 https://github.com/MidiyaZhu/MePO
Yumin Choi, Jinheon Baek, Sung Ju Hwang, 14 May 2025, System Prompt Optimization with Meta-Learning, https://arxiv.org/abs/2505.09666
Ziyu Zhou, Yihang Wu, Jingyuan Yang, Zhan Xiao, Rongjun Li, 13 May 2025, Evaluating the Effectiveness of Black-Box Prompt Optimization as the Scale of LLMs Continues to Grow, https://arxiv.org/abs/2505.08303
Chun-Pai Yang, Kan Zheng, Shou-De Lin, 11 May 2025, PLHF: Prompt Optimization with Few-Shot Human Feedback, https://arxiv.org/abs/2505.07886

Query Enhancement

The idea of query enhancement is a RAG-specific optimization to the retrieval phase of the architecture. This technique is also called “query expansion” in some resources. Generally, the idea is to convert the user’s query into multiple queries that are then used by the retriever in parallel. The idea is to improve the accuracy of retrieved chunks by:

Capturing semantic meanings of user queries.
Resolving ambiguity or unclear wordings.
Broadening topics if user queries are too specific.

There are two ways to run a query expansion mode:

1. One query — rewrite the user’s query into a better query.

2. Multiple queries — send multiple improved queries to the chunk lookup.

Multiple queries can be sent to the retrieval modules and then the different sets of chunks can be combined into a final set of chunks by the reranker, once returned by each query. Depending on the design, the extra queries may go to the vector database lookup module, or to the keyword datastore, or to both. The aim is to return a broader set of document chunks for use by the final LLM phase.

Generation of alternative queries via query expansion is another step in the RAG pipeline, which can slow down the overall algorithm, and also introduce extra processing cost in both the generation of alternative queries and then looking them up in the retrieval modules. However, the extra LLM cost can be somewhat mitigated by using a simpler LLM, since this is a relatively low-difficulty task for a small model. Also possible is the use of non-LLM heuristic approaches such as synonym-based query rewriting.

RQ-RAG

One of the major papers on query expansion introduced RQ-RAG in 2024, which worked by “rewriting, decomposing, and clarifying ambiguities” in the user’s prompt text. The system breaks complicated questions down into multiple simpler questions, and uses the simpler questions for RAG lookups.

Their work went somewhat beyond just query expansion, whereby they trained a 7B Llama2 model with extra capabilities to enhance user queries, which were then used to return chunks with improved relevance, and ultimately this led to the RQ-RAG system giving better answers.

The RQ-RAG system also used the LLM to detect whether the user query was actually a question for the RAG data set. For example, a general query like “Hi, how are you?” is not a question for the RAG dataset, and can be sent straight to the LLM without involving any RAG chunk data. Or perhaps your system could save the tokens and reply without the LLM, “Enough about me, do you have a real question?”

Query Enhancement References

References on query enhancement include:

Hamin Koo, Minseon Kim, Sung Ju Hwang, 17 Jul 2024, Optimizing Query Generation for Enhanced Document Retrieval in RAG, https://arxiv.org/abs/2407.12325
L Jourdain, S Hellal, T Marini, 2024, Why just search when you can expand? Enhancing RAG with Query Expansion Strategies and Document Aggregation, Conference on Artificial Intelligence for Defense, Nov 2024, Rennes, France. 2024, https://hal.science/hal-05046658/, PDF: https://hal.science/hal-05046658/document
Sejong Kim, Hyunseo Song, Hyunwoo Seo, Hyunjun Kim, 19 Mar 2025, Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems, https://arxiv.org/abs/2503.15191
Joohyun Lee, Minji Roh, 23 Nov 2024, Multi-Reranker: Maximizing performance of retrieval-augmented generation in the FinanceRAG challenge, https://arxiv.org/abs/2411.16732
Tuana Celik, November 1, 2024, Advanced RAG: Query Expansion, https://haystack.deepset.ai/cookbook/query-expansion
Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu, 31 Mar 2024, RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation, https://arxiv.org/abs/2404.00610

RAG Fusion

RAG fusion is a RAG extension based on query expansion that incorporates analyzing multiple versions of the query to return the best context chunks. The model generates multiple “reformulated” versions of the original text query, each of which is sent to the retriever, and a final use of “Reciprocal Rank Fusion” combines all of the returned chunks into a single ranking, like a “reranker” component, but using multiple similar rankings.

The main advantage of RAG fusion is finding more accurate context for the LLM to user in answering the user’s question. However, the downside is the many additional calls to the retriever database with slightly modified queries.

Research on RAG fusion algorithms:

Igor Novikov, Jul 23, 2024, RAG Architecture: Advanced RAG, https://pub.towardsai.net/rag-architecture-advanced-rag-3fea83e0d189
Surya Maddula, Sep 2024, Not RAG, but RAG Fusion? Understanding Next-Gen Info Retrieval, https://pub.towardsai.net/not-rag-but-rag-fusion-understanding-next-gen-info-retrieval-477788da02e2
Adrian H. Raudaschl, Oct 6, 2023, Forget RAG, the Future is RAG-Fusion: The Next Frontier of Search: Retrieval Augmented Generation meets Reciprocal Rank Fusion and Generated Queries, https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
Deval Shah, Jul 4, 2024, Reciprocal Rank Fusion (RRF) explained in 4 mins — How to score results from multiple retrieval methods in RAG: Unlock the power of Reciprocal Rank Fusion in Retrieval-Augmented Generation, https://medium.com/@devalshah1619/mathematical-intuition-behind-reciprocal-rank-fusion-rrf-explained-in-2-mins-002df0cc5e2a
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Sanjay Kumar, Apr 2, 2024, RAG Fusion: A New Frontier in Search and Generative AI, https://medium.com/@Sanjaynk7907/rag-fusion-a-new-frontier-in-search-and-generative-ai-ebb24e7e905e
Omar Santos, Jun 15, 2024, Comparing RAG, RAG Fusion, with RAPTOR: Different AI Retrieval-Augmented Implementations, https://becomingahacker.org/comparing-rag-rag-fusion-with-raptor-different-ai-retrieval-augmented-implementations-1aa76fce6a5c

Hypothetical Document Embeddings (HyDE)

HyDe is an extension of the idea of query expansion, whereby an entire LLM answer is used for retrieval. This is far more general than query enhancement or prompt optimization, which try to rewrite the query.

Here’s an insightful idea: answer it! A full LLM answer is used, rather than tweaking a few words in the query. The RAG retrieval process becomes a two-stage sequence:

1. Generate an answer via an LLM (without RAG chunks), and

2. Submit this answer for chunk retrieval (vector-based, keyword, or both).

Hence, the initial answer from an LLM becomes the expanded query. The retrieval phase becomes the task of finding other answers similar to our hypothetical answer. Note that the LLM used for this initial answer need not be as powerful as the one used to process all the RAG document chunks for the fully-drafted answer.

Isn’t this the same as just asking a non-RAG LLM for the answer? Well, no. The initial LLM answer is used in the chunk retrieval processes, but is not included as a chunk in the final LLM processing phase. The auto-generated answer is not one of the candidate inputs at the end.

Doesn’t this encourage hallucinations? Again, no. The first LLM that generates the preliminary answer is relying on its parametric knowledge only, so admittedly, we haven’t got the hallucination-reducing benefits of using RAG chunks in this first phase. However, it’s not going to hallucinate by default, anyway, and furthermore, its answers don’t get passed as input to the final LLM. Even in the unlikely case where the initial LLM does hallucinate its answer, this would only cause the chunk retrieval process to return irrelevant information, which the more powerful final LLM would probably sort out.

Does it work? The theory is that the user’s question plus the generated answer provide a better vector lookup than just the question alone. Your mileage may vary, because it depends how unique the RAG data set is. If the source data is very specific to a particular domain, and that domain is not part of the LLM training, it might not help. Having said that, even a hallucinated answer might provide some guidance, although there’s probably a better way in such cases.

Reference papers on the HyDE lookup mechanism include:

Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan, 20 Dec 2022, HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels, arXiv preprint arXiv:2212.10496, https://arxiv.org/abs/2212.10496, https://github.com/texttron/hyde
Igor Novikov, Jul 23, 2024, RAG Architecture: Advanced RAG, https://pub.towardsai.net/rag-architecture-advanced-rag-3fea83e0d189
Mark Craddock, Oct 28, 2023, HYDE: Revolutionising Search with Hypothetical Document Embeddings, https://medium.com/prompt-engineering/hyde-revolutionising-search-with-hypothetical-document-embeddings-3474df795af8
Zilliz, Jul 25, 2024, Improving Information Retrieval and RAG with Hypothetical Document Embeddings (HyDE), https://zilliz.com/learn/improve-rag-and-information-retrieval-with-hyde-hypothetical-document-embeddings

Manual Curation

This is the part we don’t say out loud when the boss is in earshot, but sometimes we have to...curate things. This is a highly advanced scientific process about which there are many research papers written. Generally, the areas in need of a carbon-based neural network approach include:

Adding extra keyword associations to chunks (or removing some).
Fixing keyword mappings in the query lookup.
Editing a problem in one of the chunks.
Adding a missing snippet of text to a chunk.
Scrummaging through MarComm PDF glossies to find a useful paragraph.

In some situations, it might be useful to provide extra hits to the logic which is chunking the data, prior to running the chunking process. For example, add a comment in the HTML source to mark a chunk boundary, or a comment in the HTML source could be used to identify the length of the current section and then a determination can be performed by the chunking algorithm to decide if the content needs to be chunked. Extra keywords can be provided in comments to help with the chunking, too, where they are used to provide an alternative matching word for retrieval.

This curation can be updated based on data from the RAG system. It is necessary to periodically re-chunk, or re-ingest the source data. So, anything added to the source will eventually get back into the RAG system.

Even “links” between various parts of the source data could be provided in the source. For example, perhaps a concept is mentioned in an introductory part of the documentation and appears in more detail later. It might be useful to add some context in comments to relate the two areas more explicitly. For example, overlapping chunks would not work if the data is separated in the source data, and these can be manually merged. Or you can go looking for another document that better covers the user’s query.

It’s like dumpster diving inside a disk drive, where it’s a win when you find an FAQ document.

FAQ documents are particularly useful for the times where certain questions are asked over and over again, but there’s not yet a chunk in the database. If the RAG monitoring system is capturing the right data, then the frequently asked questions can be detected. Once FAQs are known, the best chunks corresponding to those questions can be included in the set passed to the LLM when generating the answer. In fact, the LLM can be used to determine if the user question matches an FAQ, too. And if you can’t find a pre-written document that answers your user FAQ, you could, you know, write one.

Yikes. We have writers for that.

References on RAG Accuracy

General research papers and articles on RAG “smartness” strategies include:

Chaitanya Sharma, 28 May 2025, Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers, https://arxiv.org/abs/2506.00054
Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG
Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
Chandini Jain, Aug 15, 2024, The magic of RAG is in the retrieval, https://www.infoworld.com/article/3484132/the-magic-of-rag-is-in-the-retrieval.html
Ziyan Jiang, Xueguang Ma, Wenhu Chen, June 2024, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, arXiv preprint arXiv:2406.15319, https://arxiv.org/abs/2406.15319 (More accurate RAG with a long context LLM and larger chunk sizes.)
Andrew Ditmer, May 13 2024, SuperRAG – How to achieve higher accuracy with Retrieval Augmented Generation, https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/superrag-how-to-achieve-higher-accuracy-with-retrieval-augmented/ba-p/4139004 (Iterating through up to 50 documents with testing of a user’s query.)
Shengming Zhao, Yuheng Huang, Jiayang Song, Zhijie Wang, Chengcheng Wan, Lei Ma, 29 Nov 2024, Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems, https://arxiv.org/abs/2411.19463

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: RAG Optimization: Accurate and Efficient LLM Applications

RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:

Smarter RAG
Faster RAG
Cheaper RAG
Agentic RAG
RAG reasoning

Get your copy from Amazon: RAG Optimization

Aussie AI

Chapter 2. Smarter RAG