English
Related papers

Related papers: Executable Agentic Memory for GUI Agent

200 papers

Despite recent progress, Graphic User Interface (GUI) agents powered by Large Language Models (LLMs) struggle with complex mobile tasks due to limited app-specific knowledge. While UI Transition Graphs (UTGs) offer structured navigation…

Contemporary GUI agents, while increasingly capable due to advances in Large Vision-Language Models (VLMs), often operate with a critical limitation: they treat each task in isolation, lacking a mechanism to systematically learn from past…

Artificial Intelligence · Computer Science 2026-04-13 Runze Li , Yuwen Zhai , Bo Xu , LiWu Xu , Nian Shi , Wei Zhang , Ran Lin , Liang Wang

Recent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs). These agents demonstrate strong reasoning and adaptability,…

Artificial Intelligence · Computer Science 2025-04-16 Wenjia Jiang , Yangyang Zhuang , Chenxi Song , Xu Yang , Joey Tianyi Zhou , Chi Zhang

To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates…

Artificial Intelligence · Computer Science 2026-04-15 Zhaofen Wu , Hanrong Zhang , Fulin Lin , Wujiang Xu , Xinran Xu , Yankai Chen , Henry Peng Zou , Shaowen Chen , Weizhi Zhang , Xue Liu , Philip S. Yu , Hongwei Wang

Existing Graphical User Interface (GUI) agents operate through step-by-step calls to vision language models--taking a screenshot, reasoning about the next action, executing it, then repeating on the new page--resulting in high costs and…

Artificial Intelligence · Computer Science 2026-02-25 Hongbin Zhong , Fazle Faisal , Luis França , Tanakorn Leesatapornwongsa , Adriana Szekeres , Kexin Rong , Suman Nath

Manipulative communication, such as gaslighting, guilt-tripping, and emotional coercion, is often difficult for individuals to recognize. Existing agentic AI systems lack the structured, longitudinal memory to track these subtle,…

Artificial Intelligence · Computer Science 2026-03-06 Ratna Kandala , Niva Manchanda , Akshata Kishore Moharir , Ananth Kandala

Large Language Models~(LLMs) have demonstrated capabilities across various applications but face challenges such as hallucination, limited reasoning abilities, and factual inconsistencies, especially when tackling complex, domain-specific…

Large-scale social simulators are essential for studying complex social patterns. Prior work explores hybrid methods to scale up simulations, combining large language models (LLM)-based agents with numerical agent-based models (ABM).…

Artificial Intelligence · Computer Science 2026-05-11 Xuan Zhou , Yanhui Sun , Hantao Yao , Allen He , Yongdong Zhang , Wu Liu

Mobile Graphical User Interface (GUI) agents aim to autonomously complete tasks within or across apps based on user instructions. While recent Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens and perform…

Artificial Intelligence · Computer Science 2025-11-20 Linqiang Guo , Wei Liu , Yi Wen Heng , Tse-Hsun , Chen , Yang Wang

Large language model (LLM)-based agents have demonstrated strong capabilities in complex reasoning and problem solving through multi-step interactions, yet most deployed agents remain behaviorally static, with knowledge acquired during…

Artificial Intelligence · Computer Science 2026-05-19 Yuxin Jin , Siyuan Zhang , Hanchen Wang , Lu Qin , Ying Zhang , Wenjie Zhang

Effective tool pre-selection via retrieval is essential for AI agents to select from a vast array of tools when identifying and planning actions in the context of complex user queries. Despite its central role in planning, this aspect…

Artificial Intelligence · Computer Science 2025-11-14 Sahil Bansal , Sai Shruthi Sistla , Aarti Arikatala , Sebastian Schreiber

In this paper, we aim to improve the reasoning ability of large language models (LLMs) over knowledge graphs (KGs) to answer complex questions. Inspired by existing methods that design the interaction strategy between LLMs and KG, we…

Computation and Language · Computer Science 2024-02-20 Jinhao Jiang , Kun Zhou , Wayne Xin Zhao , Yang Song , Chen Zhu , Hengshu Zhu , Ji-Rong Wen

While Large Language Models (LLMs) have demonstrated strong zero-shot reasoning capabilities, their deployment as embodied agents still faces fundamental challenges in long-horizon planning. Unlike open-ended text generation, embodied…

Computation and Language · Computer Science 2026-05-19 Xiang Li , Ning Yan , Masood Mortazavi

Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of…

Computation and Language · Computer Science 2026-04-21 Yujie Luo , Zhuoyun Yu , Xuehai Wang , Yuqi Zhu , Ningyu Zhang , Lanning Wei , Lun Du , Da Zheng , Huajun Chen

Memory emerges as the core module in the Large Language Model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative…

Long-horizon GUI agents are a key step toward real-world deployment, yet effective interaction memory under prevailing paradigms remains under-explored. Replaying full interaction sequences is redundant and amplifies noise, while summaries…

Graphical User Interface (GUI) agents powered by Multimodal Large Language Models (MLLMs) promise human-like interaction with software applications, yet long-horizon tasks remain challenging due to memory limitations. Existing approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Zikang Liu , Junyi Li , Wayne Xin Zhao , Dawei Gao , Yaliang Li , Ji-rong Wen

Autonomous Graphical User Interface (GUI) agents often struggle with multi-step tasks due to constrained context windows and static policies that fail to adapt to dynamic environments. To address these limitations, this work proposes the…

Machine Learning · Computer Science 2026-05-19 Shilong Jin , Lanjun Wang , Zhuosheng Zhang

GUI agents are beginning to operate the web, mobile, and desktop as interactive worlds, where successful control depends on carrying forward visual, procedural, and task-level evidence beyond the fleeting present screen. Yet most agents…

Computation and Language · Computer Science 2026-05-12 Guibin Zhang , Yaohui Ling , Fanci Meng , Kun Wang , Shuicheng Yan

Mobile GUI agents powered by large foundation models enable autonomous task execution, but frequent updates altering UI appearance and reorganizing workflows cause agents trained on historical data to fail. Despite surface changes,…

Artificial Intelligence · Computer Science 2026-02-03 Libo Sun , Jiwen Zhang , Siyuan Wang , Zhongyu Wei
‹ Prev 1 2 3 10 Next ›