Aussie AI

LLM GUI Agents

  • Last Updated 1 April, 2025
  • by David Spuler, Ph.D.

GUI agents are LLM-based agents that can read and/or manipulate the GUI. The first GUI agents were read-only, but advanced GUI agents can now not only read the screen, but can also click the mouse button or enter keystrokes. This means that the LLM can now launch and control any apps on a PC or phone.

Early GUI agents were read-only, examining what was on the screen as context for a user's query. It was useful to see what app or window the person was looking at on the screen when issuing a query. There were two methods of examining the screen's display:

  • Image-based (i.e., using a screen snapshot)
  • Internal hierarchy analysis (i.e., examining the internal representation of windows)

Recently, more advanced GUI agents have been released that can also have full control of the input devices, such as moving the mouse, clicking, or entering keystrokes. These advanced "computer usage" agents are theoretically capable of doing anything that a human user can do, but automated via an LLM.

Related areas of LLM research include:

Survey Papers on GUI Agents

Recent survey papers on computer usage and GUI agents:

  • Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang, 27 Nov 2024, Large Language Model-Brained GUI Agents: A Survey, https://arxiv.org/abs/2411.18279
  • Shuai Wang, Weiwen Liu, Jingxuan Chen, Weinan Gan, Xingshan Zeng, Shuai Yu, Xinlong Hao, Kun Shao, Yasheng Wang, Ruiming Tang, 7 Nov 2024, GUI Agents with Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2411.04890
  • Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A. Rossi, Franck Dernoncourt, 18 Dec 2024, GUI Agents: A Survey, https://arxiv.org/abs/2412.13501
  • Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, Thilo Stadelmann, 27 Jan 2025, AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants, https://arxiv.org/abs/2501.16150

Research on LLM GUI Agents

Research papers on GUI agents:

More AI Research

Read more about: