Related papers: Mobile-Agent-E: Self-Evolving Mobile Assistant for…

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile device agent. Mobile-Agent first leverages visual perception…

Computation and Language · Computer Science 2024-04-19 Junyang Wang , Haiyang Xu , Jiabo Ye , Ming Yan , Weizhou Shen , Ji Zhang , Fei Huang , Jitao Sang

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

In the field of MLLM-based GUI agents, compared to smartphones, the PC scenario not only features a more complex interactive environment, but also involves more intricate intra- and inter-app workflows. To address these issues, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2025-02-24 Haowei Liu , Xi Zhang , Haiyang Xu , Yuyang Wanyan , Junyang Wang , Ming Yan , Ji Zhang , Chunfeng Yuan , Changsheng Xu , Weiming Hu , Fei Huang

AppAgent: Multimodal Agents as Smartphone Users

Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone…

Computer Vision and Pattern Recognition · Computer Science 2023-12-25 Chi Zhang , Zhao Yang , Jiaxuan Liu , Yucheng Han , Xin Chen , Zebiao Huang , Bin Fu , Gang Yu

MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation

Smartphones have become indispensable in people's daily lives, permeating nearly every aspect of modern society. With the continuous advancement of large language models (LLMs), numerous LLM-based mobile agents have emerged. These agents…

Computation and Language · Computer Science 2025-09-05 Gowen Loo , Chang Liu , Qinghong Yin , Xiang Chen , Jiawei Chen , Jingyuan Zhang , Yu Tian

Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction

Large Language Model (LLM)-based mobile agents have made significant performance advancements. However, these agents often follow explicit user instructions while overlooking personalized needs, leading to significant limitations for real…

Computation and Language · Computer Science 2026-01-29 Shuoxin Wang , Chang Liu , Gowen Loo , Lifan Zheng , Kaiwen Wei , Xinyi Zeng , Jingyuan Zhang , Yu Tian

Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control

Building agents that autonomously operate mobile devices has attracted increasing attention. While Vision-Language Models (VLMs) show promise, most existing approaches rely on direct state-to-action mappings, which lack structured reasoning…

Artificial Intelligence · Computer Science 2026-02-09 Zhe Wu , Hongjin Lu , Junliang Xing , Changhao Zhang , Yuxuan Li , Yin Zhu , Yuhao Yang , Yuheng Jing , Kai Li , Kun Shao , Jianye Hao , Jun Wang , Yuanchun Shi

MobiAgent: A Systematic Framework for Customizable Mobile Agents

With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in…

Multiagent Systems · Computer Science 2025-09-03 Cheng Zhang , Erhu Feng , Xi Zhao , Yisheng Zhao , Wangbo Gong , Jiahui Sun , Dong Du , Zhichao Hua , Yubin Xia , Haibo Chen

MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices

The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively…

Artificial Intelligence · Computer Science 2024-07-08 Jiayi Zhang , Chuang Zhao , Yihan Zhao , Zhaoyang Yu , Ming He , Jianping Fan

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Large Language Model (LLM)-based agents have recently shown impressive capabilities in complex reasoning and tool use via multi-step interactions with their environments. While these agents have the potential to tackle complicated tasks,…

Artificial Intelligence · Computer Science 2025-11-04 Jiaye Lin , Yifu Guo , Yuzhen Han , Sen Hu , Ziyi Ni , Licheng Wang , Mingguang Chen , Hongzhang Liu , Ronghao Chen , Yangfan He , Daxin Jiang , Binxing Jiao , Chen Hu , Huacan Wang

MobileUse: A GUI Agent with Hierarchical Reflection for Autonomous Mobile Operation

Recent advances in Multimodal Large Language Models (MLLMs) have enabled the development of mobile agents that can understand visual inputs and follow user instructions, unlocking new possibilities for automating complex tasks on mobile…

Robotics · Computer Science 2025-07-24 Ning Li , Xiangmou Qu , Jiamu Zhou , Jun Wang , Muning Wen , Kounianhua Du , Xingyu Lou , Qiuying Peng , Jun Wang , Weinan Zhang

Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation

Mobile agents show immense potential, yet current state-of-the-art (SoTA) agents exhibit inadequate success rates on real-world, long-horizon, cross-application tasks. We attribute this bottleneck to the agents' excessive reliance on…

Artificial Intelligence · Computer Science 2026-03-13 Yuxiang Zhou , Jichang Li , Yanhao Zhang , Haonan Lu , Guanbin Li

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Recent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs). These agents demonstrate strong reasoning and adaptability,…

Artificial Intelligence · Computer Science 2025-04-16 Wenjia Jiang , Yangyang Zhuang , Chenxi Song , Xu Yang , Joey Tianyi Zhou , Chi Zhang

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as…

Computation and Language · Computer Science 2024-06-04 Junyang Wang , Haiyang Xu , Haitao Jia , Xi Zhang , Ming Yan , Weizhou Shen , Ji Zhang , Fei Huang , Jitao Sang

Foundations and Recent Trends in Multimodal Mobile Agents: A Survey

Mobile agents are essential for automating tasks in complex and dynamic mobile environments. As foundation models evolve, the demands for agents that can adapt in real-time and process multimodal data have grown. This survey provides a…

Artificial Intelligence · Computer Science 2025-09-16 Biao Wu , Yanda Li , Zhiwei Zhang , Yunchao Wei , Meng Fang , Ling Chen

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

Mobile GUI agents exhibit substantial potential to facilitate and automate the execution of user tasks on mobile phones. However, exist mobile GUI agents predominantly privilege autonomous operation and neglect the necessity of active user…

Artificial Intelligence · Computer Science 2025-10-10 Haitao Jia , Ming He , Zimo Yin , Likang Wu , Jianping Fan , Jitao Sang

Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems

AI Agents are changing the way work gets done, both in consumer and enterprise domains. However, the design patterns and architectures to build highly capable agents or multi-agent systems are still developing, and the understanding of the…

Artificial Intelligence · Computer Science 2024-07-19 Tamer Abuelsaad , Deepak Akkil , Prasenjit Dey , Ashish Jagmohan , Aditya Vempaty , Ravi Kokku

MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions

Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following…

Multiagent Systems · Computer Science 2025-02-25 Yuxuan Liu , Hongda Sun , Wei Liu , Jian Luan , Bo Du , Rui Yan

S1-NexusAgent: a Self-Evolving Agent Framework for Multidisciplinary Scientific Research

Modern scientific research relies on large-scale data, complex workflows, and specialized tools, which existing LLMs and tool-based agents struggle to handle due to limitations in long-horizon planning, robust goal maintenance, and…

Artificial Intelligence · Computer Science 2026-02-11 NexusAgent Team

MobileAgent: enhancing mobile control via human-machine interaction and SOP integration

Agents centered around Large Language Models (LLMs) are now capable of automating mobile device operations for users. After fine-tuning to learn a user's mobile operations, these agents can adhere to high-level user instructions online.…

Human-Computer Interaction · Computer Science 2024-01-18 Tinghe Ding

MemEvolve: Meta-Evolution of Agent Memory Systems

Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill…

Computation and Language · Computer Science 2025-12-23 Guibin Zhang , Haotian Ren , Chong Zhan , Zhenhong Zhou , Junhao Wang , He Zhu , Wangchunshu Zhou , Shuicheng Yan