Aussie AI
MiniRAG Architectures
-
Last Updated 20 August, 2025
-
by David Spuler, Ph.D.
Research on MiniRAG Architectures
Research papers include:
- Jérôme DIAZ, Dec 2024, Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models. In this article we will explore why 128K tokens (and more) models can’t fully replace using RAG. https://towardsdatascience.com/why-retrieval-augmented-generation-is-still-relevant-in-the-era-of-long-context-language-models-e36f509abac5
- Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky, 17 Oct 2024 (v2), Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach, https://arxiv.org/abs/2407.16833
- Tan Yu, Anbang Xu, Rama Akkiraju, 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666
- Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang, 20 Nov 2023 (v3), Lost in the Middle: How Language Models Use Long Contexts, https://arxiv.org/abs/2307.03172 (Information is best placed at the start, or otherwise at the end, of a long context.)
- Brian J Chan, Chao-Ting Chen, Jui-Hung Cheng, Hen-Hsen Huang, 20 Dec 2024, Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, https://arxiv.org/abs/2412.15605 (Mini-RAG architecture preloading the entire knowledge into the LLM context and then using KV caching.)
- Xinze Li, Yixin Cao, Yubo Ma, Aixin Sun, 27 Dec 2024, Long Context vs. RAG for LLMs: An Evaluation and Revisits, https://arxiv.org/abs/2501.01880 (Long context, summarization-based RAG, and classic chunked RAG have different strengths and weaknesses for different types of query.)
- Tianyu Fan, Jingyuan Wang, Xubin Ren, Chao Huang, 14 Jan 2025 (v2), MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation, https://arxiv.org/abs/2501.06713 https://github.com/HKUDS/MiniRAG (Uses the name "mini RAG" but is about knowledge graphs not long context RAG.)
- Isuru Lakshan Ekanayaka, Jan 2025, Retrieval-Augmented Generation (RAG) vs. Cache-Augmented Generation (CAG): A Deep Dive into Faster, Smarter Knowledge Integration, https://pub.towardsai.net/retrieval-augmented-generation-rag-vs-0b4bc63c1653
- Dr. Ashish Bamania Jan 10, 2025, Cache-Augmented Generation (CAG) Is Here To Replace RAG: A deep dive into how a novel technique called Cache-Augmented Generation (CAG) works and reduces/ eliminates the need for Retrieval-augmented generation (RAG). https://levelup.gitconnected.com/cache-augmented-generation-cag-is-here-to-replace-rag-3d25c52360b2
- Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 12 Apr 2021 (v4), Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401
- Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu, 27 Jan 2025, Parametric Retrieval Augmented Generation, https://arxiv.org/abs/2501.15915 https://github.com/oneal2000/prag (Parametric RAG (PRAG) is training the RAG documents into model parameters, rather than prepending documents using long context RAG, and this means a shorter inference token length.)
- Cristian Leo, Feb 2025, Don’t Do RAG: Cache is the future: CAG or RAG? Let’s explore Cached Augmented Generation, its math, and trade-offs. Let’s dig into its research paper to see what it excels at, and how you could leverage it. https://levelup.gitconnected.com/dont-do-rag-cache-is-the-future-d1e995f0c76f
- Manpreet Singh, Feb 2025, Goodbye RAG? Gemini 2.0 Flash Have Just Killed It! https://ai.gopubby.com/goodbye-rag-gemini-2-0-flash-have-just-killed-it-96301113c01f
- Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
- Javier Ramos, June 2025, You Don’t Need RAG! Build a Q&A AI Agent in 30 Minutes 🚀, https://itnext.io/you-dont-need-rag-build-a-q-a-agent-in-30-minutes-and-without-a-thinking-model-52545408f495
- Alisa Fortin, Aug 18, 2025, URL context tool for Gemini API now generally available, https://developers.googleblog.com/en/url-context-tool-for-gemini-api-now-generally-available/
- Latent Space, Aug 20, 2025, "RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma: What actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows, https://www.latent.space/p/chroma
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home