Related papers: NaviQAte: Functionality-Guided Web Application Nav…

Automated Functional Testing based on the Navigation of Web Applications

Web applications are becoming more and more complex. Testing such applications is an intricate hard and time-consuming activity. Therefore, testing is often poorly performed or skipped by practitioners. Test automation can help to avoid…

Software Engineering · Computer Science 2011-08-12 Boni García , Juan Carlos Dueñas

End-to-End Goal-Driven Web Navigation

We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a website,…

Artificial Intelligence · Computer Science 2016-05-23 Rodrigo Nogueira , Kyunghyun Cho

DocQAC: Adaptive Trie-Guided Decoding for Effective In-Document Query Auto-Completion

Query auto-completion (QAC) has been widely studied in the context of web search, yet remains underexplored for in-document search, which we term DocQAC. DocQAC aims to enhance search productivity within long documents by helping users…

Information Retrieval · Computer Science 2026-04-21 Rahul Mehta , Kavin R , Indrajit Pal , Tushar Abhishek , Pawan Goyal , Manish Gupta

WebQA: Multihop and Multimodal QA

Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation. In this work, we introduce…

Computation and Language · Computer Science 2022-03-29 Yingshan Chang , Mridu Narang , Hisami Suzuki , Guihong Cao , Jianfeng Gao , Yonatan Bisk

WebCanvas: Benchmarking Web Agents in Online Environments

For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the…

Computation and Language · Computer Science 2024-07-17 Yichen Pan , Dehan Kong , Sida Zhou , Cheng Cui , Yifei Leng , Bing Jiang , Hangyu Liu , Yanyi Shang , Shuyan Zhou , Tongshuang Wu , Zhengyang Wu

A Survey on Web Application Testing: A Decade of Evolution

As one of the most popular software applications, a web application is a program, accessible through the web, to dynamically generate content based on user interactions or contextual data, for example, online shopping platforms, social…

Software Engineering · Computer Science 2025-04-28 Tao Li , Rubing Huang , Chenhui Cui , Dave Towey , Lei Ma , Yuan-Fang Li , Wen Xia

WebQuest: A Benchmark for Multimodal QA on Web Page Sequences

The rise of powerful multimodal LLMs has enhanced the viability of building web agents which can, with increasing levels of autonomy, assist users to retrieve information and complete tasks on various human-computer interfaces. It is hence…

Information Retrieval · Computer Science 2024-09-26 Maria Wang , Srinivas Sunkara , Gilles Baechler , Jason Lin , Yun Zhu , Fedir Zubach , Lei Shu , Jindong Chen

NavAI: A Generalizable LLM Framework for Navigation Tasks in Virtual Reality Environments

Navigation is one of the fundamental tasks for automated exploration in Virtual Reality (VR). Existing technologies primarily focus on path optimization in 360-degree image datasets and 3D simulators, which cannot be directly applied to…

Software Engineering · Computer Science 2026-01-07 Xue Qin , Matthew DiGiovanni

Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up Questions

Graphical user interfaces (GUI) automation agents are emerging as powerful tools, enabling humans to accomplish increasingly complex tasks on smart devices. However, users often inadvertently omit key information when conveying tasks, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Ziming Cheng , Zhiyuan Huang , Junting Pan , Zhaohui Hou , Mingjie Zhan

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically…

Computation and Language · Computer Science 2024-06-10 Hongliang He , Wenlin Yao , Kaixin Ma , Wenhao Yu , Yong Dai , Hongming Zhang , Zhenzhong Lan , Dong Yu

Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs

Web applications are critical to modern software ecosystems, yet ensuring their reliability remains challenging due to the complexity and dynamic nature of web interfaces. Recent advances in large language models (LLMs) have shown promise…

Software Engineering · Computer Science 2026-02-20 Nguyen-Khang Le , Quan Minh Bui , Minh Ngoc Nguyen , Hiep Nguyen , Trung Vo , Son T. Luu , Shoshin Nomura , Minh Le Nguyen

FlowSearch: Advancing deep research with dynamic structured knowledge flow

Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for…

Artificial Intelligence · Computer Science 2026-01-13 Yusong Hu , Runmin Ma , Yue Fan , Jinxin Shi , Zongsheng Cao , Yuhao Zhou , Jiakang Yuan , Shuaiyu Zhang , Shiyang Feng , Xiangchao Yan , Shufei Zhang , Wenlong Zhang , Lei Bai , Bo Zhang

Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types

Visual Question-Answering (VQA) has become key to user experience, particularly after improved generalization capabilities of Vision-Language Models (VLMs). But evaluating VLMs for an application requirement using a standardized framework…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Neelabh Sinha , Vinija Jain , Aman Chadha

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a…

Artificial Intelligence · Computer Science 2026-01-29 Le Zhang , Yixiong Xiao , Xinjiang Lu , Jingjia Cao , Yusai Zhao , Jingbo Zhou , Lang An , Zikan Feng , Wanxiang Sha , Yu Shi , Congxi Xiao , Jian Xiong , Yankai Zhang , Hua Wu , Haifeng Wang

FinQAPT: Empowering Financial Decisions with End-to-End LLM-driven Question Answering Pipeline

Financial decision-making hinges on the analysis of relevant information embedded in the enormous volume of documents in the financial domain. To address this challenge, we developed FinQAPT, an end-to-end pipeline that streamlines the…

Information Retrieval · Computer Science 2024-11-04 Kuldeep Singh , Simerjot Kaur , Charese Smiley

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

Graphical User Interface (GUI) Agents, powered by large language and vision-language models, hold promise for enabling end-to-end automation in digital environments. However, their progress is fundamentally constrained by the scarcity of…

Machine Learning · Computer Science 2025-09-22 Musen Lin , Minghao Liu , Taoran Lu , Lichen Yuan , Yiwei Liu , Haonan Xu , Yu Miao , Yuhao Chao , Zhaojian Li

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Cătălina Cangea , Eugene Belilovsky , Pietro Liò , Aaron Courville

WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

Training web agents to navigate complex, real-world websites requires them to master $\textit{subtasks}$ - short-horizon interactions on multiple UI components (e.g., choosing the correct date in a date picker, or scrolling in a container…

Machine Learning · Computer Science 2026-05-20 Sanjari Srivastava , Gang Li , Cheng Chang , Rishu Garg , Manpreet Kaur , Charlene Y. Lee , Yuezhang Li , Yining Mao , Ignacio Cases , Yanan Xie , Peng Qi

Interactive Query Formulation using Query By Navigation

Effective information disclosure in the context of databases with a large conceptual schema is known to be a non-trivial problem. In particular the formulation of ad-hoc queries is a major problem in such contexts. Existing approaches for…

Information Retrieval · Computer Science 2021-05-21 H. A. Proper

NavBench: Probing Multimodal Large Language Models for Embodied Navigation

Multimodal Large Language Models (MLLMs) have demonstrated strong generalization in vision-language tasks, yet their ability to understand and act within embodied environments remains underexplored. We present NavBench, a benchmark to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Yanyuan Qiao , Haodong Hong , Wenqi Lyu , Dong An , Siqi Zhang , Yutong Xie , Xinyu Wang , Qi Wu