Aussie AI

Chapter 15. Agentic RAG

Book Excerpt from "RAG Optimization: Accurate and Efficient LLM Applications"

by David Spuler and Michael Sharpe

Chapter 15. Agentic RAG

What is Agentic RAG?

Agentic RAG is basically an agent with access to extra components like a vector lookup and/or a keyword search capability over a datastore or other external data integration capability. These RAG-inspired components do the “R” part and the agent does the rest of it, such as triggering actions or overall planning. Hence, the basic idea is:

RAG — does the “retrieval” of useful information.
Agents — does the “actions” based on this info.

An agentic RAG architecture ends up being just a different organization of the same RAG components, but agents are much cooler today, so you’ll get more research grants.

Why Agentic RAG?

The main advantage of agentic RAG is to avoid all the data preparation required for the data chunks in classic RAG. Instead, the agentic RAG system uses a constrained search tool to “retrieve” the data dynamically, then the agent’s LLM wordsmiths it. The “constrained search tool” can be as simple as the “search” endpoint on the customers documentation website that a company already hosts. It’s possible to inject extra context into the search, too, such as adding product version.

But does agentic RAG really avoid everything in data preparation? Yes and no. You still need to “rank” results, and still need to “ground” the LLM. Also, there’s probably still a need to build caches, and perhaps even use vector databases. So, it’s an open question whether agentic RAG is an advanced new architecture with powerful capabilities, or whether it’s a trendy conflation of two topics.

The main reason to extend classic RAG into agentic RAG would be to leverage the promise of agents. With the right tools and implementation, the agent can fit into multiple different workflows.

The disadvantage of agentic RAG would be that things still get complicated from merging two major AI architectures (agents and RAG), the data likely still needs to be “organized” to get full value out of it, and latency optimizations such as caches likely still need to be built.

Agentic RAG extends the set of use cases that can be covered to those in both RAG and agent domains. All the RAG use cases apply and additional doors might open due to agentic abilities to plug-and-play in various workflows. And let us not forget the main reason: agents are hot right now!

What are Agents?

The idea of agents tends to evoke the idea of AI engines that take over the planet. After all, agents are the AI apps that “do” things, whereas LLMs are supposed to just sit there and write poetry.

Actually, agents are just software, and aren’t necessarily an architecture you need to shy away from. These architectures are already well established in various industry frameworks. The features that are needed include:

Data sources (e.g., read your email inbox, search the internet).
Software integrations (e.g., your company database, HR system, internal financials, etc.)
LLM queries (the basic level of intelligence).
Combining it together (i.e., planning, scheduling, delegating, etc.)

Although single agents have the potential to be powerful, it’s becoming clear that the main industry direction is “agentic architectures” and the use of multiple agents with specially trained LLMs for particular tasks. For example, Salesforce has multiple agents for different activities, and Apple Intelligence has a “multi-LoRA” architecture with many small on-phone LLMs, backed up by a larger LLM framework in the cloud called Private Cloud Compute (PCC).

More formally, an Agent is an LLM, with access to “tools,” “memories” and prompting, which together give the LLM goals to achieve. Each of these is important:

Tools — simply the way agents are given access to data sources and an environment. The exact nature of the tools will determine exactly what the agent will be able to do.
Memories — special tools that allow for state to be stored and retrieved later. This is in addition to the LLM context.
Prompts — the glue for the agent to bring everything together, and thereby effectively defines the goals of the LLM and how it should use the tooling to do so.

It sounds deceptively simple, and it’s easy to think a RAG agent is little more than an LLM, a set of tools to manipulate a vector database, and a prompt. That would be a somewhat naive implementation. In reality, “RAG Agent” likely have multiple sub-agents in them which all work towards parts of the RAG implementation and flow.

Another area where RAG techniques are important is agent is for managing large tool results. In many applications, tools can return a lot of data, far too much data to fit into a context window of an LLM. In such situations, the tools will actually return results in chunks and those chunks organized in some manner. Depending on the data and the organization, vector databases make a lot of sense. Sometimes, these chunks will be explicitly iterated over, and other times only specific chunks will be retrieved using RAG mechanism to satisfy the goal.

What Are Agentic Architectures?

The term “agentic architectures” is a hot area that is all the rage at the moment on Arxiv. It’s a somewhat vague concept, but generally encompasses the use of LLM agent technologies with these important aspects:

Multiple agents
Cooperation of multiple types of agents
Workflow pathways
Planning
Scheduling, sequencing, and chaining
Retrievers (i.e., data source “read” capabilities)
Actions (“write” capabilities)

In the most basic form, it’s very much like RAG, except the agent goes and fetches the data from somewhere, typically from something that’s not a database, such as the web. The good thing is that it requires no work to keep it up-to-date. The bad things is, it has access to the web, but there are ways to constrain the scope of a web search.

As an example, if you’re trying to produce a chatbot that is an expert on “recipes,” the agent might be specifically aware of only a few “cooking” websites and may even use the search capabilities of those sites. This is an agentic architecture with “retrievers” and you can see the analogy to RAG retrievers.

You could characterize RAG as an agent that knows how to retrieve data from your vector store or other stores. However, in general agentic architectures are often more diverse, and there are typically multiple agents involved.

As another example, perhaps you want to have a website which generates 5-course meals where each course complements each other. Perhaps also catering to some special needs. You might end up with a couple of agents that search for recipes, and an agent that has some knowledge about what foods complement each other, and another that knows about wine pairings, and so on. These agents are often arranged in a structure whereby the output of one agent feeds into another agent and the results are further refined.

Each “agent” is also running an LLM query, which focuses the agent to a specific “role” but also has knowledge of the original “question”. Once the structure has all been traversed, the result is generated by a final LLM summary.

An agentic architecture can be interactive, and there are often multiple places in the sequence where the user can be involved. For example, deep down, it’s possible that there may be a choice between a “beef dish” or a “chicken dish” and it’s possible for the LLM to ask a followup question (if trained to do so), whereby the user can indicate which food they prefer.

More complex agentic architectures are not a static structure or single toolchain. Advanced agentic structures can contain loops, decision points, interactivity or approval choices, feedback points, and steps for the user to fulfil.

Many of the software development AI tools that generate entire fully-coded applications are agentic systems. They can have agents that make up a typical software team: there is a PM Agent, a coding agent, an architect agent, a testing agent, a documentation agent, and more. The user describes the software they want built, in as much detail as possible, and the PM agent will refine the requirements, often in a loop with the user, then each requirement will be “designed” by the architect agent, coded by the coding agent, tested and documented by the respective agents. Along the way, bugs can occur, so a “debugging agent” can come into the mix.

At the end of a full cycle of auto-development, the user will be brought back in to test the generated application. Behind the scenes there is yet another agent, one which builds the code and deploys the executable. Once the user accepts a requirement as complete, the agentic system loops around to the next requirement.

Overall, the “structure” of an agentic architecture is very dynamic. Each agent can have access to a different LLM. The coding agent might be a code-completing LLM, whereas the “architect agent” might be a coding agent with perhaps a design pattern RAG-based LLM. Agentic architectures can get very expensive fast!

References

Research papers on the agentic architecture in general, not specific to RAG:

Anthropic, Dec 2024, Building effective agents, https://www.anthropic.com/engineering/building-effective-agents
Arun Shankar, Oct 2024, Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch, https://medium.com/google-cloud/designing-cognitive-architectures-agentic-workflow-patterns-from-scratch-63baa74c54bc
Anita Kirkovska, David Vargas, Jul 11, 2024, Agentic Workflows in 2024: The ultimate guide, https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
Shuofei Qiao, Runnan Fang, Zhisong Qiu, Xiaobin Wang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, 10 Oct 2024, Benchmarking Agentic Workflow Generation, https://arxiv.org/abs/2410.07869
A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, 2024, Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model, 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, Chenglin Wu, 14 Oct 2024, AFlow: Automating Agentic Workflow Generation, https://arxiv.org/abs/2410.10762 https://github.com/geekan/MetaGPT
Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li, 21 Jun 2024, FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents, https://arxiv.org/abs/2406.14884
Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou, 20 May 2024 (v2), AgentScope: A Flexible yet Robust Multi-Agent Platform, https://arxiv.org/abs/2402.14034 https://github.com/modelscope/agentscope
Bryson Masse, October 31, 2024, Microsoft’s agentic AI tool OmniParser rockets up the open source charts, https://venturebeat.com/ai/microsofts-agentic-ai-tool-omniparser-rockets-up-the-open-source-charts/

Research papers on combining RAG and agents into agentic RAG architectures:

Shubham Sharma, November 12, 2024, How agentic RAG can be a game-changer for data processing and retrieval, https://venturebeat.com/ai/how-agentic-rag-can-be-a-game-changer-for-data-processing-and-retrieval/
Chidaksh Ravuru, Sagar Srinivas Sakhinana, Venkataramana Runkana, 18 Aug 2024, Agentic Retrieval-Augmented Generation for Time Series Analysis, https://arxiv.org/abs/2408.14484
Jisoo Jang and Wen-Syan Li, 2024, AU-RAG: Agent-based Universal Retrieval Augmented Generation. In Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP 2024). Association for Computing Machinery, New York, NY, USA, 2–11. https://doi.org/10.1145/3673791.3698416 https://dl.acm.org/doi/abs/10.1145/3673791.3698416
Anita Kirkovska, David Vargas, Jul 11, 2024, Agentic Workflows in 2024: The ultimate guide, https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou, 20 May 2024 (v2), AgentScope: A Flexible yet Robust Multi-Agent Platform, https://arxiv.org/abs/2402.14034 https://github.com/modelscope/agentscope
Shubham Sharma. November 12, 2024, How agentic RAG can be a game-changer for data processing and retrieval, https://venturebeat.com/ai/how-agentic-rag-can-be-a-game-changer-for-data-processing-and-retrieval/
Chidaksh Ravuru, Sagar Srinivas Sakhinana, Venkataramana Runkana, 18 Aug 2024, Agentic Retrieval-Augmented Generation for Time Series Analysis, https://arxiv.org/abs/2408.14484
Jisoo Jang and Wen-Syan Li, 2024, AU-RAG: Agent-based Universal Retrieval Augmented Generation. In Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP 2024). Association for Computing Machinery, New York, NY, USA, 2–11. https://doi.org/10.1145/3673791.3698416 https://dl.acm.org/doi/abs/10.1145/3673791.3698416
Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic, Sep 2024, Agents, Google Whitepaper, https://www.kaggle.com/whitepaper-agents
Hui Wu, Xiaoyang Wang, Zhong Fan, 14 Jan 2025, Addressing the sustainable AI trilemma: a case study on LLM agents and RAG, https://arxiv.org/abs/2501.08262
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
Peter Baile Chen, Yi Zhang, Michael Cafarella, Dan Roth, 30 Jan 2025, Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method, https://arxiv.org/abs/2501.18539
Zitao Li, Fei Wei, Yuexiang Xie, Dawei Gao, Weirui Kuang, Zhijian Ma, Bingchen Qian, Yaliang Li, Bolin Ding, 13 Feb 2025, KIMAs: A Configurable Knowledge Integrated Multi-Agent System, https://arxiv.org/abs/2502.09596
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee, 20 May 2025 (v3), AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges, https://arxiv.org/abs/2505.10468
Antonio Martinez, Apr 3, 2025, Building an Agentic LLM with RAG Using OpenVINO™, https://medium.com/openvino-toolkit/building-an-agentic-llm-with-rag-using-openvino-4d98bef28205
Thang Nguyen, Peter Chin, Yu-Wing Tai, 26 May 2025, MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning, https://arxiv.org/abs/2505.20096

• Online: Table of Contents

• PDF: Free PDF book download

• Buy: RAG Optimization: Accurate and Efficient LLM Applications

RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:

Smarter RAG
Faster RAG
Cheaper RAG
Agentic RAG
RAG reasoning

Get your copy from Amazon: RAG Optimization

Aussie AI

Chapter 15. Agentic RAG