Related papers: WebLists: Extracting Structured Information From C…

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information…

Artificial Intelligence · Computer Science 2025-10-09 Jingbo Yang , Bairu Hou , Wei Wei , Shiyu Chang , Yujia Bao

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2)…

Machine Learning · Computer Science 2024-02-27 Izzeddin Gur , Hiroki Furuta , Austin Huang , Mustafa Safdari , Yutaka Matsuo , Douglas Eck , Aleksandra Faust

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Efficiently solving real-world problems with LLMs increasingly hinges on their ability to interact with dynamic web environments and autonomously acquire external information. While recent research like Search-R1 and WebDancer demonstrates…

Computation and Language · Computer Science 2025-10-15 Tao Yu , Zhengbo Zhang , Zhiheng Lyu , Junhao Gong , Hongzhu Yi , Xinming Wang , Yuxuan Zhou , Jiabing Yang , Ping Nie , Yan Huang , Wenhu Chen

WALT: Web Agents that Learn Tools

Web agents promise to automate complex browser tasks, but current methods remain brittle -- relying on step-by-step UI interactions and heavy LLM reasoning that break under dynamic layouts and long horizons. Humans, by contrast, exploit…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Viraj Prabhu , Yutong Dai , Matthew Fernandez , Jing Gu , Krithika Ramakrishnan , Yanqi Luo , Silvio Savarese , Caiming Xiong , Junnan Li , Zeyuan Chen , Ran Xu

WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation

Large Language Model (LLM) web agents often struggle with long-horizon web navigation and web task completion in new websites, producing inefficient action sequences unless fine-tuned on environment-specific data. We show that…

Machine Learning · Computer Science 2025-12-23 Jiali Cheng , Anjishnu Kumar , Roshan Lal , Rishi Rajasekaran , Hani Ramezani , Omar Zia Khan , Oleg Rokhlenko , Sunny Chiu-Webster , Gang Hua , Hadi Amiri

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens of thousands of tokens. This saturates context limits and increases computational cost…

Computation and Language · Computer Science 2025-10-06 Imene Kerboua , Sahar Omidi Shayegan , Megh Thakkar , Xing Han Lù , Léo Boisvert , Massimo Caccia , Jérémy Espinas , Alexandre Aussem , Véronique Eglin , Alexandre Lacoste

Mind2Web: Towards a Generalist Agent for the Web

We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated…

Computation and Language · Computer Science 2023-12-12 Xiang Deng , Yu Gu , Boyuan Zheng , Shijie Chen , Samuel Stevens , Boshi Wang , Huan Sun , Yu Su

WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks

Powered by a large language model (LLM), a web browsing agent operates web browsers in a human-like manner and offers a highly transparent path toward automating a wide range of everyday tasks. As web agents become increasingly capable and…

Computation and Language · Computer Science 2025-06-03 Atsuyuki Miyai , Zaiying Zhao , Kazuki Egashira , Atsuki Sato , Tatsumi Sunada , Shota Onohara , Hiromasa Yamanishi , Mashiro Toyooka , Kunato Nishina , Ryoma Maeda , Kiyoharu Aizawa , Toshihiko Yamasaki

LongDA: Benchmarking LLM Agents for Long-Document Data Analysis

We introduce LongDA, a data analysis benchmark for evaluating LLM-based agents under documentation-intensive analytical workflows. In contrast to existing benchmarks that assume well-specified schemas and inputs, LongDA targets real-world…

Digital Libraries · Computer Science 2026-01-13 Yiyang Li , Zheyuan Zhang , Tianyi Ma , Zehong Wang , Keerthiram Murugesan , Chuxu Zhang , Yanfang Ye

Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

Agentic web search increasingly faces two distinct demands: deep reasoning over a single target, and structured aggregation across many entities and heterogeneous sources. Current systems struggle on both fronts. Breadth-oriented tasks…

Artificial Intelligence · Computer Science 2026-05-01 Yuxuan Huang , Yihang Chen , Zhiyuan He , Yuxiang Chen , Ka Yiu Lee , Huichi Zhou , Weilin Luo , Meng Fang , Jun Wang

Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory

Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical…

Artificial Intelligence · Computer Science 2025-10-24 Shiqi He , Yue Cui , Xinyu Ma , Yaliang Li , Bolin Ding , Mosharaf Chowdhury

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

Large Language Model (LLM) agents are rapidly improving to handle increasingly complex web-based tasks. Most of these agents rely on general-purpose, proprietary models like GPT-4 and focus on designing better prompts to improve their…

Computation and Language · Computer Science 2024-12-06 Junhong Shen , Atishay Jain , Zedian Xiao , Ishan Amlekar , Mouad Hadji , Aaron Podolny , Ameet Talwalkar

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take actions that optimize long-term…

Artificial Intelligence · Computer Science 2026-03-10 ELita Lobo , Xu Chen , Jingjing Meng , Nan Xi , Yang Jiao , Chirag Agarwal , Yair Zick , Yan Gao

WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning

Large language models (LLMs)-empowered web agents enables automating complex, real-time web navigation tasks in enterprise environments. However, existing web agents relying on supervised fine-tuning (SFT) often struggle with generalization…

Computation and Language · Computer Science 2025-06-10 Yuchen Zhuang , Di Jin , Jiaao Chen , Wenqi Shi , Hanrui Wang , Chao Zhang

WAREX: Web Agent Reliability Evaluation on Existing Benchmarks

Recent advances in browser-based LLM agents have shown promise for automating tasks ranging from simple form filling to hotel booking or online shopping. Current benchmarks measure agent performance in controlled environments, such as…

Artificial Intelligence · Computer Science 2025-10-07 Su Kara , Fazle Faisal , Suman Nath

Prune4Web: DOM Tree Pruning Programming for Web Agent

Web automation employs intelligent agents to execute high-level tasks by mimicking human interactions with web interfaces. Despite the capabilities of recent Large Language Model (LLM)-based web agents, navigating complex, real-world…

Artificial Intelligence · Computer Science 2025-11-27 Jiayuan Zhang , Kaiquan Chen , Zhihao Lu , Enshen Zhou , Qian Yu , Jing Zhang

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Autonomy via agents using large language models (LLMs) for personalized, standardized tasks boosts human efficiency. Automating web tasks (like booking hotels within a budget) is increasingly sought after. Fulfilling practical needs, the…

Artificial Intelligence · Computer Science 2025-05-27 Ke Yang , Yao Liu , Sapana Chaudhary , Rasool Fakoor , Pratik Chaudhari , George Karypis , Huzefa Rangwala

Executable Code Actions Elicit Better LLM Agents

Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions…

Computation and Language · Computer Science 2024-06-10 Xingyao Wang , Yangyi Chen , Lifan Yuan , Yizhe Zhang , Yunzhu Li , Hao Peng , Heng Ji

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution…

Computation and Language · Computer Science 2024-03-12 Xueyu Hu , Ziyu Zhao , Shuang Wei , Ziwei Chai , Qianli Ma , Guoyin Wang , Xuwu Wang , Jing Su , Jingjing Xu , Ming Zhu , Yao Cheng , Jianbo Yuan , Jiwei Li , Kun Kuang , Yang Yang , Hongxia Yang , Fei Wu

An Index-based Approach for Efficient and Effective Web Content Extraction

As web agents (e.g., Deep Research) routinely consume massive volumes of web pages to gather and analyze information, LLM context management -- under large token budgets and low signal density -- emerges as a foundational, high-importance,…

Information Retrieval · Computer Science 2025-12-09 Yihan Chen , Benfeng Xu , Xiaorui Wang , Zhendong Mao