Aussie AI

MiniRAG Architectures

Last Updated 20 August, 2025

by David Spuler, Ph.D.

Research on MiniRAG Architectures

Research papers include:

Jérôme DIAZ, Dec 2024, Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models. In this article we will explore why 128K tokens (and more) models can’t fully replace using RAG. https://towardsdatascience.com/why-retrieval-augmented-generation-is-still-relevant-in-the-era-of-long-context-language-models-e36f509abac5
Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky, 17 Oct 2024 (v2), Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach, https://arxiv.org/abs/2407.16833
Tan Yu, Anbang Xu, Rama Akkiraju, 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang, 20 Nov 2023 (v3), Lost in the Middle: How Language Models Use Long Contexts, https://arxiv.org/abs/2307.03172 (Information is best placed at the start, or otherwise at the end, of a long context.)
Brian J Chan, Chao-Ting Chen, Jui-Hung Cheng, Hen-Hsen Huang, 20 Dec 2024, Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, https://arxiv.org/abs/2412.15605 (Mini-RAG architecture preloading the entire knowledge into the LLM context and then using KV caching.)
Xinze Li, Yixin Cao, Yubo Ma, Aixin Sun, 27 Dec 2024, Long Context vs. RAG for LLMs: An Evaluation and Revisits, https://arxiv.org/abs/2501.01880 (Long context, summarization-based RAG, and classic chunked RAG have different strengths and weaknesses for different types of query.)
Tianyu Fan, Jingyuan Wang, Xubin Ren, Chao Huang, 14 Jan 2025 (v2), MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation, https://arxiv.org/abs/2501.06713 https://github.com/HKUDS/MiniRAG (Uses the name "mini RAG" but is about knowledge graphs not long context RAG.)
Isuru Lakshan Ekanayaka, Jan 2025, Retrieval-Augmented Generation (RAG) vs. Cache-Augmented Generation (CAG): A Deep Dive into Faster, Smarter Knowledge Integration, https://pub.towardsai.net/retrieval-augmented-generation-rag-vs-0b4bc63c1653
Dr. Ashish Bamania Jan 10, 2025, Cache-Augmented Generation (CAG) Is Here To Replace RAG: A deep dive into how a novel technique called Cache-Augmented Generation (CAG) works and reduces/ eliminates the need for Retrieval-augmented generation (RAG). https://levelup.gitconnected.com/cache-augmented-generation-cag-is-here-to-replace-rag-3d25c52360b2
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 12 Apr 2021 (v4), Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu, 27 Jan 2025, Parametric Retrieval Augmented Generation, https://arxiv.org/abs/2501.15915 https://github.com/oneal2000/prag (Parametric RAG (PRAG) is training the RAG documents into model parameters, rather than prepending documents using long context RAG, and this means a shorter inference token length.)
Cristian Leo, Feb 2025, Don’t Do RAG: Cache is the future: CAG or RAG? Let’s explore Cached Augmented Generation, its math, and trade-offs. Let’s dig into its research paper to see what it excels at, and how you could leverage it. https://levelup.gitconnected.com/dont-do-rag-cache-is-the-future-d1e995f0c76f
Manpreet Singh, Feb 2025, Goodbye RAG? Gemini 2.0 Flash Have Just Killed It! https://ai.gopubby.com/goodbye-rag-gemini-2-0-flash-have-just-killed-it-96301113c01f
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
Javier Ramos, June 2025, You Don’t Need RAG! Build a Q&A AI Agent in 30 Minutes 🚀, https://itnext.io/you-dont-need-rag-build-a-q-a-agent-in-30-minutes-and-without-a-thinking-model-52545408f495
Alisa Fortin, Aug 18, 2025, URL context tool for Gemini API now generally available, https://developers.googleblog.com/en/url-context-tool-for-gemini-api-now-generally-available/
Latent Space, Aug 20, 2025, "RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma: What actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows, https://www.latent.space/p/chroma