Aussie AI

RAG Evaluation

  • Last Updated 29 August, 2025
  • by David Spuler, Ph.D.

RAG evaluation is the analysis of the LLM-based RAG architecture as a whole, rather than conventional model evaluation that examines only the model. A typical RAG system includes not only an LLM, but a vector database of document chunks, and an orchestrator component. Advanced RAG architectures typically also include a keyword search datastore, reranker, packer, and other components.

See also more research on related areas:

Research on RAG Evaluation

  • Shahul Es, Jithin James, Luis Espinosa-Anke, Steven Schockaert, 26 Sep 2023, RAGAS: Automated Evaluation of Retrieval Augmented Generation, https://arxiv.org/abs/2309.15217
  • Shangeetha Sivasothy, Scott Barnett, Stefanus Kurniawan, Zafaryab Rasool, Rajesh Vasa, 24 Sep 2024, RAGProbe: An Automated Approach for Evaluating RAG Applications, https://arxiv.org/abs/2409.19019
  • Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia, 31 Mar 2024 (v2), ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems, https://arxiv.org/abs/2311.09476
  • Kevin Wu, Eric Wu, James Zou, 10 Jun 2024 (v2), ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence, https://arxiv.org/abs/2404.10198
  • Galla, D., Hoda, S., Zhang, M., Quan, W., Yang, T.D., Voyles, J. (2024). CoURAGE: A Framework to Evaluate RAG Systems. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14763. Springer, Cham. https://doi.org/10.1007/978-3-031-70242-6_37 https://link.springer.com/chapter/10.1007/978-3-031-70242-6_37
  • Rafael Teixeira de Lima, Shubham Gupta, Cesar Berrospi, Lokesh Mishra, Michele Dolfi, Peter Staar, Panagiotis Vagenas, 29 Nov 2024, Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems, IBM Research, https://arxiv.org/abs/2411.19710
  • Lilian Weng, July 7, 2024, Extrinsic Hallucinations in LLMs, https://lilianweng.github.io/posts/2024-07-07-hallucination/
  • Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
  • Contextual AI Team, March 19, 2024 Introducing RAG 2.0, https://contextual.ai/introducing-rag2/
  • Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra, 30 Jan 2024 (v3), RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406
  • Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
  • Chaitanya Sharma, 28 May 2025, Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers, https://arxiv.org/abs/2506.00054
  • Quentin Romero Lauro, Shreya Shankar, Sepanta Zeighami, Aditya Parameswaran, 18 Apr 2025, RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines, https://arxiv.org/abs/2504.13587
  • Jeongsoo Lee, Daeyong Kwon, Kyohoon Jin, 23 Aug 2025, GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation, https://arxiv.org/abs/2508.16994
  • Mohita Chowdhury, Yajie Vera He, Jared Joselowitz, Aisling Higham, Ernest Lim, 18 Jul 2025, ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems, https://arxiv.org/abs/2501.08208
  • Gr\'egoire Martinon, Alexandra Lorenzo de Brionne, J\'er\^ome Bohard, Antoine Lojou, Damien Hervault, Nicolas J-B. Brunel (ENSIIE, LaMME), 29 Jul 2025, Towards a rigorous evaluation of RAG systems: the challenge of due diligence, https://arxiv.org/abs/2507.21753
  • Jiaxuan Liang, Shide Zhou, and Kailong Wang, 26 Jul 2025, OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation Tools, https://arxiv.org/abs/2508.05650

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: