Aussie AI
Chapter 21. Agentic Architectures
-
Book Excerpt from "Generative AI Applications: Planning, Design and Implementation"
-
by David Spuler
Chapter 21. Agentic Architectures
Single Agent Architectures
The idea of agents tends to evoke the idea of AI engines that take over the planet. After all, agents are the AI apps that “do” things, whereas LLMs are supposed to just sit there and write poetry.
Actually, agents are just software, and aren’t necessarily an architecture you need to shy away from. These architectures are already well established in various industry frameworks. The features that are needed include:
- Data sources (e.g., read your email inbox, search the internet).
- Software integrations (e.g., your company database, HR system, internal financials, etc.)
- LLM queries (the basic level of intelligence).
- Combining it together (i.e., planning, scheduling, delegating, etc.)
Although single agents have the potential to be powerful, it’s becoming clear that the main industry direction is “agentic architectures” and the use of multiple agents with specially trained LLMs for particular tasks. For example, Salesforce has multiple agents for different activities, and Apple Intelligence has a “multi-LoRA” architecture with many small on-phone LLMs, backed up by a larger LLM framework in the cloud called Private Cloud Compute (PCC).
Types of Agents
There are, in fact, several different types of agents, each with their requisite levels of difficulty. There’s not really any widely accepted categorization of agents, but rather than be discouraged, I’ll invent my own:
- Read-only agents (report agents)
- Read-write agents (action agents)
And we can further sub-categorize based on how the agent gets kicked off:
- Manual agents
- Scheduled agents
- Triggered agents
And another dimension is whether the agent needs approval for its “write” actions:
- Supervised agents (human-in-the-loop approval needed)
- Unattended agents (automated or “autonomous agents”)
So, that’s about 12 distinct species of agents, but they’re not that different. If you’re building an agent, it needs several components:
- Integration to its own LLM
- Data source integration (to “read”)
- Output integration (to “report”)
- Action integration (to “write” or “act”)
This is getting quite tricky, so some examples might help.
- Imagine an agent that integrates with the email subsystem, and launches whenever it received an incoming email, and then integrates with outgoing email API, so as to let you set up an automatic reply to people sending you emails, so that you can tell them you’re on vacation.
- An agent software runs in the background and is only triggered when a text comes into your phone, and the agent then accesses the speaker integration, an “action,” so that the phone goes: Ping!
- Another type of manually-launched agent could, once it receives your query request, scour the internet and return you the top ten blue links about that topic.
- Imagine the safety improvement of having a fully autonomous software agent that watches the wheels in your car (“read”), detecting the signs of slippage, and automatically turns your brakes on and off in quick succession (an “action”), so as to re-gain traction, without the driver doing anything.
Oh, wait! Those are things we’ve had for years. We’ve had AI agents all around us, and never knew it?
Report Agents
Report agents are “read-only” agents that only create output such as a report after some analysis. The agent finds some data that you want, and creates a report on it. Example use cases would include:
- Email inbox summary
- News headline summarization
- Company stock online research
- Research paper literature reviews
Let’s say we want an AI engine that reads the news headlines in the morning, and shows you a summary. In order to implement this idea, the report agent needs:
1. Scheduler (to wake it up in the early hours).
2. Data source integrations (to download the news headlines from somewhere).
3. LLM query interface (to send the news headlines to the LLM to summarize).
4. Display the text summary as its “report”
Note that, without the scheduler, this is effectively a RAG architecture with an integrated data source. If we had to launch this report manually, it’s not really an agent.
Action Agents
Action agents are general agents that can do something that changes the outside world, with an “action” or “read-write” capability. Some example use cases include:
- Sending an email or text on our behalf
- Trading your own stocks using an automated algorithm
- Booking flights or an entire vacation
- Completing and filing a tax return (yeah, right!)
Imagine if we extended our news headlines report agent so that it not only summarized the news (i.e., “read”), but also then emailed us a report every morning. This adds one more step, which is the “action” or the “write” operation of sending the email. This needs one more component, which is an integration with the email service, so that the agent can send out an email.
In this case, the agent is actually running “unattended” because the scheduler wakes it up, and then it sends an email without needing human approval. Other types of AI agents could be programmed to require a human to approve the actions.
Another variation would be a “triggered agent” where some event starts the agent running, rather than a scheduler. For example, an incoming email might trigger some type of email-responding agent, or a summary notification to pop up on your phone.
Agentic Architectures
The term “agentic architectures” is a hot area that is all the rage at the moment on Arxiv. It’s a somewhat vague concept, but generally encompasses the use of LLM agent technologies with these important aspects:
- Multiple agents
- Cooperation of multiple types of agents
- Workflow pathways
- Planning
- Scheduling, sequencing, and chaining
- Retrievers (i.e., data source “read” capabilities)
- Actions (“write” capabilities)
In the most basic form, it’s very much like RAG, except the agent goes and fetches the data from somewhere, typically from something that’s not a database, such as the web. The good thing is that it requires no work to keep it up-to-date. The bad things is, it has access to the web, but there are ways to constrain the scope of a web search.
As an example, if you’re trying to produce a chatbot that is an expert on “recipes,” the agent might be specifically aware of only a few “cooking” websites and may even use the search capabilities of those sites. This is an agentic architecture with “retrievers” and you can see the analogy to RAG retrievers.
You could characterize RAG as an agent that knows how to retrieve data from your vector store or other stores. However, in general agentic architectures are often more diverse, and there are typically multiple agents involved.
As another example, perhaps you want to have a website which generates 5-course meals where each course complements each other. Perhaps also catering to some special needs. You might end up with a couple of agents that search for recipes, and an agent that has some knowledge about what foods complement each other, and another that knows about wine pairings, and so on. These agents are often arranged in a structure whereby the output of one agent feeds into another agent and the results are further refined.
Each “agent” is also running an LLM query, which focuses the agent to a specific “role” but also has knowledge of the original “question”. Once the structure has all been traversed, the result is generated by a final LLM summary.
An agentic architecture can be interactive, and there are often multiple places in the sequence where the user can be involved. For example, deep down, it’s possible that there may be a choice between a “beef dish” or a “chicken dish” and it’s possible for the LLM to ask a followup question (if trained to do so), whereby the user can indicate which food they prefer.
More complex agentic architectures are not a static structure or single toolchain. Advanced agentic structures can contain loops, decision points, interactivity or approval choices, feedback points, and steps for the user to fulfil.
Many of the software development AI tools that generate entire fully-coded applications are agentic systems. They can have agents that make up a typical software team: there is a PM Agent, a coding agent, an architect agent, a testing agent, a documentation agent, and more. The user describes the software they want built, in as much detail as possible, and the PM agent will refine the requirements, often in a loop with the user, then each requirement will be “designed” by the architect agent, coded by the coding agent, tested and documented by the respective agents. Along the way, bugs can occur, so a “debugging agent” can come into the mix.
At the end of a full cycle of auto-development, the user will be brought back in to test the generated application. Behind the scenes there is yet another agent, one which builds the code and deploys the executable. Once the user accepts a requirement as complete, the agentic system loops around to the next requirement.
Overall, the “structure” of an agentic architecture is very dynamic. Each agent can have access to a different LLM. The coding agent might be a code-completing LLM, whereas the “architect agent” might be a coding agent with perhaps a design pattern RAG-based LLM. Agentic architectures can get very expensive fast!
Security of Agents
A common concern with agents on everyone’s computer (or phone) is that they are a security vulnerability. If an AI engine can send emails, then so can the hackers, after they take over your device.
This is true, and security must be reviewed carefully, but it’s important to note that this is nothing new. It’s always been the case that hackers, if they gained control of your device, could send out emails. There’s nothing about AI agent architectures that inherently makes the situation either more or less insecure than it has been in the past.
But what about the agent itself going rogue? Could it start sending out lots of emails? Technically, it could, yes, but what would cause it to? Why would it be more likely to go rogue than plain old Microsoft Outlook? Maybe it has some potential to do so because we’ve made automation easier, but it would require a human failure in the software design. And programmers never write any bugs, so that should be reassuring.
References
- Arun Shankar, Oct 2024, Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch, https://medium.com/google-cloud/designing-cognitive-architectures-agentic-workflow-patterns-from-scratch-63baa74c54bc
- Anita Kirkovska, David Vargas, Jul 11, 2024, Agentic Workflows in 2024: The ultimate guide, https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
- Shuofei Qiao, Runnan Fang, Zhisong Qiu, Xiaobin Wang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, 10 Oct 2024, Benchmarking Agentic Workflow Generation, https://arxiv.org/abs/2410.07869
- A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, 2024, Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model, 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
- Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
- Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, Chenglin Wu, 14 Oct 2024, AFlow: Automating Agentic Workflow Generation, https://arxiv.org/abs/2410.10762 https://github.com/geekan/MetaGPT
- Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li, 21 Jun 2024, FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents, https://arxiv.org/abs/2406.14884
- Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou, 20 May 2024 (v2), AgentScope: A Flexible yet Robust Multi-Agent Platform, https://arxiv.org/abs/2402.14034 https://github.com/modelscope/agentscope
- Bryson Masse, October 31, 2024, Microsoft’s agentic AI tool OmniParser rockets up the open source charts, https://venturebeat.com/ai/microsofts-agentic-ai-tool-omniparser-rockets-up-the-open-source-charts/
|
• Online: Table of Contents • PDF: Free PDF book download • Buy: Generative AI Applications: Planning, Design and Implementation |
|
The new Generative AI Applications book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI Applications |