English
Related papers

Related papers: Programming with Pixels: Can Computer-Use Agents d…

200 papers

Computer-Using Agents (CUAs) aim to autonomously operate computer systems to complete real-world tasks. However, existing agentic systems remain difficult to scale and lag behind human performance. A key limitation is the absence of…

Computer-Using Agents (CUA) enable users to automate increasingly-complex tasks using graphical interfaces such as browsers. As many potential tasks require personal data, we propose Computer-Using Personal Agents (CUPAs) that have access…

Computer-Using Agents (CUAs) are rapidly extending large language models (LLMs) beyond text-based reasoning toward action execution in more complex environments, such as web browsers and graphical user interfaces (GUIs). However, existing…

Agents for computer use (ACUs) are an emerging class of systems capable of executing complex tasks on digital devices -- such as desktops, mobile phones, and web platforms -- given instructions in natural language. These agents can automate…

Computer-Use Agents (CUA) are becoming increasingly capable of autonomously operating digital environments through Graphical User Interfaces (GUI). Yet, most GUI remain designed primarily for humans--prioritizing aesthetics and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-20 Kevin Qinghong Lin , Siyuan Hu , Linjie Li , Zhengyuan Yang , Lijuan Wang , Philip Torr , Mike Zheng Shou

Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces (such as VS Code and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Himangi Mittal , Gaurav Mittal , Nelson Daniel Troncoso , Yu Hu

Graphical User Interface (GUI) agents adopt an end-to-end paradigm that maps a screenshot to an action sequence, thereby automating repetitive tasks in virtual environments. However, existing GUI agents are evaluated almost exclusively on…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Chunyi Li , Longfei Li , Zicheng Zhang , Xiaohong Liu , Min Tang , Weisi Lin , Guangtao Zhai

As coding agents have seen rapid capability and adoption gains, users are applying them to general tasks beyond software engineering. In this post, we investigate whether coding agents can successfully generalize to end-to-end business…

Software Engineering · Computer Science 2026-04-16 Maksim Ivanov , Abhijay Rana , Gokul Prabhakaran

Computer-use agent (CUA) frameworks, powered by large language models (LLMs) or multimodal LLMs (MLLMs), are rapidly maturing as assistants that can perceive context, reason, and act directly within software environments. Among their most…

Cryptography and Security · Computer Science 2025-10-13 Weidi Luo , Qiming Zhang , Tianyu Lu , Xiaogeng Liu , Bin Hu , Hung-Chun Chiu , Siyuan Ma , Yizhe Zhang , Xusheng Xiao , Yinzhi Cao , Zhen Xiang , Chaowei Xiao

While current Computer Use Agent (CUA) benchmarks measure task completion effectively, they provide limited assessment of enterprise deployment readiness, emphasizing functional correctness over the operational reliability required for…

Software Engineering · Computer Science 2025-11-24 Horia Cristescu , Charles Park , Trong Canh Nguyen , Sergiu Talmacel , Alexandru-Gabriel Ilie , Stefan Adam

Computer-use agents (CUAs) automate on-screen work, as illustrated by GPT-5.4 and Claude. Yet their reliability on complex, low-frequency interactions is still poor, limiting user trust. Our analysis of failure cases from advanced models…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Miaosen Zhang , Xiaohan Zhao , Zhihong Tan , Zhou Huoshen , Yijia Fan , Yifan Yang , Kai Qiu , Bei Liu , Justin Wagle , Chenzhong Yin , Mingxi Cheng , Ji Li , Qi Dai , Chong Luo , Xu Yang , Xin Geng , Baining Guo

Real-world software engineering tasks require coding agents that can operate on massive repositories, sustain long-horizon sessions, and reliably coordinate complex toolchains at test time. Existing research-grade coding agents offer…

Computation and Language · Computer Science 2026-02-04 Sherman Wong , Zhenting Qi , Zhaodong Wang , Nathan Hu , Samuel Lin , Jun Ge , Erwin Gao , Wenlin Chen , Yilun Du , Minlan Yu , Ying Zhang

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly separates trusted task…

Unlike traditional automation tools or static LLM-based systems, agents combine decision-making and tool utilization to accomplish complex tasks, showing great potential in software engineering. However, existing studies largely focus on…

Software Engineering · Computer Science 2025-11-04 Zhuowen Yin , Cuifeng Gao , Chunsong Fan , Wenzhang Yang , Yinxing Xue , Lijun Zhang

Computer-use agents (CUAs) that interact with real computer systems can perform automated tasks but face critical safety risks. Ambiguous instructions may trigger harmful actions, and adversarial users can manipulate tool execution to…

Artificial Intelligence · Computer Science 2026-02-04 Tianyu Chen , Chujia Hu , Ge Gao , Dongrui Liu , Xia Hu , Wenjie Wang

Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limited set of software with limited economic value, such as basic…

Machine Learning · Computer Science 2026-04-08 Pranjal Aggarwal , Graham Neubig , Sean Welleck

Usability testing with experts and potential users can assess the effectiveness, efficiency, and user satisfaction of graphical user interfaces (GUIs) but doing so remains a costly and time-intensive process. Prior work has used computer…

Computation and Language · Computer Science 2026-04-30 Alice Gao , Weixi Tong , Rishab Vempati , Katharina Reinecke , R. Benjamin Shapiro , Tianyi Zhang , Jason Wu

Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications…

Computer use agents (CUA) are systems that automatically interact with graphical user interfaces (GUIs) to complete tasks. CUA have made significant progress with the advent of large vision-language models (VLMs). However, these agents…

Artificial Intelligence · Computer Science 2025-06-04 Man Luo , David Cobbley , Xin Su , Shachar Rosenman , Vasudev Lal , Shao-Yen Tseng , Phillip Howard

Autonomous agents that operate computers via Graphical User Interfaces (GUIs) often struggle with efficiency and reliability on complex, long-horizon tasks. While augmenting these agents with planners can improve task decomposition, they…

Computation and Language · Computer Science 2026-02-23 Linxin Song , Yutong Dai , Viraj Prabhu , Jieyu Zhang , Taiwei Shi , Li Li , Junnan Li , Silvio Savarese , Zeyuan Chen , Jieyu Zhao , Ran Xu , Caiming Xiong
‹ Prev 1 2 3 10 Next ›