Aussie AI

13. Tools

Book Excerpt from "The Sweetest Lesson: Your Brain vs AI"

by David Spuler, Ph.D.

13. Tools

“Right now, I think the frontier is these AI tools.”

— Sam Altman, July, 2025.

What are Tools?

Humans like to think that we are distinct from the “animals” because we use tools. Sadly, this does not distinguish us from the AIs, because they’ve also evolved to use tools.

LLMs require tools to do more advanced things, just like humans. For example, if someone asks you the time, you look at your watch (or your phone). If you ask an LLM "What is the time?" there is nothing in its training data set that could possibly answer this correctly. The only way is to use a tool called a “clock” and it must be integrated into the LLM, and executed by the AI Engine as part of answering your query. Some of the things that are hard for an LLM to do without tools include:

Having real-time or up-to-date information (e.g., stock prices or the latest AI research papers).
Computation-related questions beyond basic arithmetic.
Information that’s only in a different place (e.g., the company’s internal ERP database).
Time-specific or locale-specific information that differs from its training.

Another type of tool that LLMs can use are those that perform an action for you, such as sending an email. These are called “agents” in the AI industry, rather than “tools” as the terminology.

That’s not where the naming confusion ends. The AI research industry seems to make some distinctions about several different types of tools. For example, there are these AI research areas called:

Retrieval Augmented Generation (RAG) — find a paragraph of text from a set of documents.
Tool Augmented Language Models (TALM) — run a dynamic computation tool like a calculator.
Tools “Hooks” — run a dynamic preprocessing tool on your text input.
Plugins — log into someone’s database for information (e.g., real estate listings).
Search plugins — run a full internet query to return some blue links for AI to read.

Come on, they don’t need different names when they’re all just tools! The AI engine calls out for help from one or other of these various different pieces of software, whether it needs a BMI calculator, a corporate HR document, or a full search of the internet. Well, maybe there is a key distinction between two different classes of “tools” if we exclude the “action agents” types, and focus on tools for helping the LLM just to answer questions:

1. Look up more information — RAG, internet search, database plugins.

2. Dynamic computations — clocks, calculators, and many more.

One way to think about this: static search tools are looking for extra human-written information, whereas dynamic tools are creating their own! Types of dynamic calculation tools include:

Clocks
Calculators (arithmetic)
Converters (e.g., pounds to kilograms)
Calendars (date or day calculations)

Modern LLMs use literally thousands of tools. All of these tools are pieces of software that sit alongside the AI model in the backend servers.

AI engines don’t normally tell you what tools they’ve used to answer. How unfair is that! We’re not allowed to use AI to answer questions in job interviews, but the AIs are allowed to sneak around using any tools they like.

There are a lot of tools that an AI can use, far beyond the basic ones. For example, if you ask your AI engine to compute the “Flesch-Kincaid” reading level metric for a paragraph of text, it has to use a tool to compute that.

Integration of Tools

There’s a lot of variability in architectures for using tools, but the basic ideas are to use them for:

The beginning — preprocessing (“hooks”).
The middle — interim computations.
The end — finalization or formatting.

The main one is the muddle in the middle. Your LLM will run a sequenece something like this:

1. Read your input prompt ("Should I eat jelly beans in order of wavelength?").

2. Decide to get the results of the tool.

3. Insert “tool tokens” into the output text.

4. Run the tool or tools (by finding tool tokens in the LLM’s output).

5. Merge the tool output into the final output text results.

Like humans, an AI needs to learn to look at its watch if someone asks the time. Specific training data sets are required that tell the AI what tool to use, and when. This is the “deciding” phase for tool usage.

The AI engine has to recognize in the LLM output that a tool must be executed. There are a variety of ways to do this:

Tool-specific tokens — i.e., the LLM can emit a “trigger” token to run a tool. Note that PEFT could be used here to fine-tune new tool capabilities, by only adding a few new tool-triggering tokens to the vocabulary.)
Placeholder patterns — i.e., output something like an “--insert current time here--” special pattern is another way, and the engine then looks for these patterns, which avoids adding tool tokens to the vocabulary, but is inefficient in that there are multiple text tokens in the output).
Code generation — there are various AI models that will generate code, such as in Python, that can be executed to generate the answer. This is a general solution, because Python can call various submodules and can thereby generate many tools.
Multi-level planning — the AI first generates a plan of how to answer the query, including what tools to use, and then runs any tools, and then does another inference query to collate it into a final answer.

Sometimes, there are two steps of LLM inference being done, with both the deciding and merging steps above. Alterantively, the tool integration can be simpler with the tool’s output results just inserted verbatim into the middle of the output.

Computers as Tools

Lately, there’s been a huge amount of work on LLM things called “GUI agents” or “computer usage models” in the AI space. The idea is that the LLM can actually look at your screen and tap away on your keyword and click the mouse, too.

Phones, too!

A computer or smartphone is a great tool for an LLM. Anything that a computer can do, the LLM can now do. Amusingly, it’s a really tough problem to solve, which is odd, since the AI engine is already running inside a great big computer. Lots of research papers! The technical issues to solve include:

Understanding the screen (as an image)
Integration with input devices (keyboard/mouse)
Context of the screen at a high level (e.g., which app is running?)

There’s another major problem to solve: mistakes. LLMs always make mistakes, and that’s more of a problem if the agent is doing something for you on your computer.

LLM Hooks

LLM hooks are integrations of tools that perform preprocessing on prompt inputs. The use of hooks is a special case of Tool-Augmented Language Models (TALM). The idea with pre-processing hooks is that the tools can augment the input prompts with extra information that the LLM can use, which is similar to RAG-based retrieval, but is based on non-LLM tools that perform dynamic computation.

Tool Augmented Language Models (TALM) is the use of non-LLM tools to augment the processing of LLMs. Tools can be used to compute more data based on the prompts, and can be used on conjunction with reasoning like Chain-of-Thought, RAG retrievals, or agentic architectures. Hence, the idea of “hooks” is a special type of limited tool, that doesn’t scour the internet for more information, but only performs some dynamic computation on the prompt text as input.

Using “hooks” to launch a tool to preprocess the user’s input prompt is a lesser-known technique. You can use heuristics to decide that a tool is used, and it’s called before the LLM via “hooks” in the code. This idea skips the “deciding” step above in favor of non-LLM methods. This is faster than having an LLM decide to launch a tool, but it’s not always as accurate in choosing whether or not to use a tool.

References

Preprocessing Hooks. Research papers on LLM “hooks” for integrations to dynamic tools:

Damien de Mijolla, Wen Yang, Philippa Duckett, Christopher Frye, Mark Worrall, 8 Dec 2024, Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt, https://arxiv.org/abs/2412.05967
Muhayy Ud Din, Jan Rosell, Waseem Akram, Isiah Zaplana, Maximo A Roa, Lakmal Seneviratne, Irfan Hussain, 10 Dec 2024, Ontology-driven Prompt Tuning for LLM-based Task and Motion Planning, https://arxiv.org/abs/2412.07493 https://muhayyuddin.github.io/llm-tamp/ (Detecting objects in the prompt text and then using a RALM algorithm to query an ontology database.)
Julian Perry, Surasakdi Siripong, Thanakorn Phonchai, 15 Jan 2025, Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning, https://arxiv.org/abs/2501.08597 (Augment training data dynamically by retrieving extra information.)
Liu, Z., Zheng, Y., Yin, Z. et al., 2025, ArithmeticGPT: empowering small-size large language models with advanced arithmetic skills, Mach Learn 114, 24, 2025, https://doi.org/10.1007/s10994-024-06681-1 https://link.springer.com/article/10.1007/s10994-024-06681-1 https://github.com/ai4ed/ArithmeticGPT (Integrate a calculator into the processing.)
Sam Lin, Wenyue Hua, Lingyao Li, Zhenting Wang, Yongfeng Zhang, 17 Feb 2025. ADO: Automatic Data Optimization for Inputs in LLM Prompts, https://arxiv.org/pdf/2502.11436 (Reformulating the input context such as by semantical marking of relevant content or formatting changes.)
Andrew Neeser, Kaylen Latimer, Aadyant Khatri, Chris Latimer, Naren Ramakrishnan, 16 Feb 2025, QuOTE: Question-Oriented Text Embeddings, https://arxiv.org/abs/2502.10976 (Augmenting RAG chunks with additional information, such as questions the chunk might answer.)
Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley A. Malin, Sricharan Kumar, 26 Feb 2025, Automatic Prompt Optimization via Heuristic Search: A Survey, https://arxiv.org/abs/2502.18746 (Survey of auto prompting, from basic LLM enhancements to some methods quite similar to RALM and TALM.)
Leixian Shen, Haotian Li, Yifang Wang, Xing Xie, Huamin Qu, 4 Mar 2025, Prompting Generative AI with Interaction-Augmented Instructions, https://arxiv.org/abs/2503.02874

TALM Research. Research papers on Tool-Augmented Language Model (TALM) technologies:

Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo, 4 Jun 2024 (v2), Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution, https://arxiv.org/abs/2406.00059 Code: https://github.com/conveyor-sys/conveyor (Speeding up inference by partially running tools in parallel to the LLM query procesisng, rather than sequentially after the LLM request, by detecting tool requests deep inside the decoding algorithm and starting them off immediately, before the LLM has finished generating the fully decoed output.)
Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
Aaron Parisi, Yao Zhao, and Noah Fiedel, 2022, Talm: Tool augmented language models, arXiv preprint arXiv:2205.12255, https://arxiv.org/abs/2205.12255
Simranjit Singh, Andreas Karatzas, Michael Fore, Iraklis Anagnostopoulos, Dimitrios Stamoulis, 7 May 2024, An LLM-Tool Compiler for Fused Parallel Function Calling, https://arxiv.org/abs/2405.17438
Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, July 2024, InferCept: Efficient Intercept Support for Augmented Large Language Model Inference, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:81-95, 2024, https://proceedings.mlr.press/v235/abhyankar24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abhyankar24a/abhyankar24a.pdf
Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov, 18 Feb 2024, Tool-Augmented LLMs as a Universal Interface for IDEs, https://arxiv.org/abs/2402.11635
Florian Dietz, Dietrich Klakow, 1 Jan 2025, IGC: Integrating a Gated Calculator into an LLM to Solve Arithmetic Tasks Reliably and Efficiently, https://arxiv.org/abs/2501.00684
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen, 21 Feb 2024 (v4), ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, https://arxiv.org/abs/2309.17452
Aiyao He, Sijia Cui, Shuai Xu, Yanna Wang, Bo Xu, 13 May 2025, TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers, https://arxiv.org/abs/2505.08402

Tools. Tool integration papers (also known as “function calls”):

Junzhi Chen, Juhao Liang, Benyou Wang, 9 May 2024, Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning, https://arxiv.org/abs/2405.05955
reiinakano, November 12, 2019, Teaching a neural network to use a calculator, https://reiinakano.com/2019/11/12/solving-probability.html (Integrate SymPy calculator into the results of a neural network, by looking for the equals sign.)
Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu, 2023, ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings, Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track, https://proceedings.neurips.cc/paper_files/paper/2023/hash/8fd1a81c882cd45f64958da6284f4a3f-Abstract-Conference.html
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al., 2023, ToolLLM: Facilitating large language models to master 16000+ real-world APIs, arXiv preprint arXiv:2307.16789, https://arxiv.org/abs/2307.16789
Yu Gu, Yiheng Shu, Hao Yu, Xiao Liu, Yuxiao Dong, Jie Tang, Jayanth Srinivasa, Hugo Latapie, Yu Su, 22 Feb 2024, Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments, https://arxiv.org/abs/2402.14672
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom, 9 Feb 2023, Toolformer: Language Models Can Teach Themselves to Use Tools, https://arxiv.org/abs/2302.04761
Cobus Greyling, June 16, 2023, Practical Examples of OpenAI Function Calling, https://cobusgreyling.medium.com/practical-examples-of-openai-function-calling-a6419dc38775
University of California, Berkeley, 2024, Berkeley Function-Calling Leaderboard, https://gorilla.cs.berkeley.edu/leaderboard.html https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard
Shishir Patil, May 10, 2024, Teaching Large Language Models to Use Tools at Scale, Ph.D. Thesis, Electrical Engineering and Computer Sciences, University of California, Berkeley, Technical Report No. UCB/EECS-2024-85, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-85.html https://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-85.pdf
Thomas Reid, Jul 31, 2024, Ollama’s Latest Update: Tool Use: Everything you need to know about function calling in Ollama, https://ai.gopubby.com/ollamas-latest-update-tool-use-7b809e15be5c
Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang, 8 Aug 2024, ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities, https://arxiv.org/abs/2408.04682 Code: https://github.com/apple/ToolSandbox
Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, 1 Sep 2024, TinyAgent: Function Calling at the Edge, https://arxiv.org/abs/2409.00608 https://github.com/SqueezeAILab/TinyAgent
Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang Liu, Jun Zhao, 23 Sep 2024 (v2), CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance, https://arxiv.org/abs/2409.13202
Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li, 8 Oct 2024 (v2), ToolGen: Unified Tool Retrieval and Calling via Generation, https://arxiv.org/abs/2410.03439
Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar, 14 Apr 2024, Towards Practical Tool Usage for Continually Learning LLMs, https://arxiv.org/abs/2404.09339
Amy Marks, Jun 11, 2024, Clarifying Function Calling / Tool Use in LLMs, https://medium.com/@aevalone/clarifying-function-calling-tool-use-in-llms-6511af510f99
In Gim, Seung-seob Lee, Lin Zhong, 9 Dec 2024, Asynchronous LLM Function Calling, https://arxiv.org/abs/2412.07017 (Overlap LLM computations and tool execution.)
Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu, 22 Dec 2024, Teaching LLMs to Refine with Tools, https://arxiv.org/abs/2412.16871
Wenjun Li, Dexun Li, Kuicai Dong, Cong Zhang, Hao Zhang, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Liu, 18 Feb 2025, Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger, https://arxiv.org/abs/2502.12961 (Examining the decision whether or not to launch a tool, and the inefficiency of non-needed tool calls.)
Mengsong Wu, Tong Zhu, Han Han, Xiang Zhang, Wenbiao Shao, Wenliang Chen, 21 Mar 2025, Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models, https://arxiv.org/abs/2503.16779 https://github.com/fairyshine/Chain-of-Tools
Wang et. al., 2025, Function Calling in Large Language Models: Industrial Practices, Challenges, and Future Directions, https://openreview.net/pdf?id=LNxVGPedFW
Beong-woo Kwak, Minju Kim, Dongha Lim, Hyungjoo Chae, Dongjin Kang, Sunghwan Kim, Dongil Yang, Jinyoung Yeo, 29 May 2025, ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions, https://arxiv.org/abs/2505.23662 https://github.com/bwookwak/ToolHaystack

LLM Computer Usage. Research on computer usage agent LLMs, or “GUI agents”:

Anthropic, 23 Oct 2024, Developing a computer use model, https://www.anthropic.com/news/developing-computer-use
Anirban Ghoshal, 23 Oct 2024, How Anthropic’s new ‘computer use’ ability could further AI automation, https://www.cio.com/article/3583260/how-anthropics-new-computer-use-ability-could-further-ai-automation.html
Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, Junjie Gao, Junjun Shan, Kangning Liu, Shudan Zhang, Shuntian Yao, Siyi Cheng, Wentao Yao, Wenyi Zhao, Xinghan Liu, Xinyi Liu, Xinying Chen, Xinyue Yang, Yang Yang, Yifan Xu, Yu Yang, Yujia Wang, Yulin Xu, Zehan Qi, Yuxiao Dong, Jie Tang, 28 Oct 2024, AutoGLM: Autonomous Foundation Agents for GUIs, https://arxiv.org/abs/2411.00820
Shuai Wang, Weiwen Liu, Jingxuan Chen, Weinan Gan, Xingshan Zeng, Shuai Yu, Xinlong Hao, Kun Shao, Yasheng Wang, Ruiming Tang, 7 Nov 2024, GUI Agents with Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2411.04890
Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou, 15 Nov 2024, The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use, https://arxiv.org/abs/2411.10323 https://github.com/showlab/computer_use_ootb
Show Lab, Nov 2024, ShowUI: ShowUI is a lightweight (2B) vision-language-action model designed for GUI agents, https://huggingface.co/showlab/ShowUI-2B
Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang, 27 Nov 2024, Large Language Model-Brained GUI Agents: A Survey, https://arxiv.org/abs/2411.18279
Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu, 23 Feb 2024 (v2), SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents, https://arxiv.org/abs/2401.10935
Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan, 8 Apr 2024, Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs, https://arxiv.org/abs/2404.05719
Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong, 5 Dec 2024, Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction, https://arxiv.org/abs/2412.04454 https://aguvis-project.github.io/
Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A. Rossi, Franck Dernoncourt, 18 Dec 2024, GUI Agents: A Survey, https://arxiv.org/abs/2412.13501
Hao Wen, Shizuo Tian, Borislav Pavlov, Wenjie Du, Yixuan Li, Ge Chang, Shanhui Zhao, Jiacheng Liu, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li, 24 Dec 2024, AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation, https://arxiv.org/abs/2412.18116
X Hu, T Xiong, B Yi, Z Wei, R Xiao, Y Chen, J Ye, M Tao, Dec 2024, OS Agents: A Survey on MLLM-Based Agents for General Computing Devices Use, https://www.preprints.org/frontend/manuscript/3842b6163d82801988adf663ee18b6d5/download_pub
Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu, 8 Jan 2025, InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection, https://arxiv.org/abs/2501.04575
Maxwell Zeff, January 23, 2025, OpenAI launches Operator, an AI agent that performs tasks autonomously, https://techcrunch.com/2025/01/23/openai-launches-operator-an-ai-agent-that-performs-tasks-autonomously/
Matt Marshall, February 22, 2025, The rise of browser-use agents: Why Convergence’s Proxy is beating OpenAI’s Operator, https://venturebeat.com/ai/the-rise-of-browser-use-agents-why-convergences-proxy-is-beating-openais-operator/
Wenjia Jiang, Yangyang Zhuang, Chenxi Song, Xu Yang, Chi Zhang, 4 Mar 2025, AppAgentX: Evolving GUI Agents as Proficient Smartphone Users, https://arxiv.org/abs/2503.02268
Apoorv Agrawal, May 23, 2025, Why Cars Drive Themselves Before Computers Do: Robocars are ready; robot secretaries aren’t… yet, https://apoorv03.com/p/autonomy