English
Related papers

Related papers: NaviQAte: Functionality-Guided Web Application Nav…

200 papers

Web applications are becoming more and more complex. Testing such applications is an intricate hard and time-consuming activity. Therefore, testing is often poorly performed or skipped by practitioners. Test automation can help to avoid…

Software Engineering · Computer Science 2011-08-12 Boni García , Juan Carlos Dueñas

We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a website,…

Artificial Intelligence · Computer Science 2016-05-23 Rodrigo Nogueira , Kyunghyun Cho

Query auto-completion (QAC) has been widely studied in the context of web search, yet remains underexplored for in-document search, which we term DocQAC. DocQAC aims to enhance search productivity within long documents by helping users…

Information Retrieval · Computer Science 2026-04-21 Rahul Mehta , Kavin R , Indrajit Pal , Tushar Abhishek , Pawan Goyal , Manish Gupta

Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation. In this work, we introduce…

Computation and Language · Computer Science 2022-03-29 Yingshan Chang , Mridu Narang , Hisami Suzuki , Guihong Cao , Jianfeng Gao , Yonatan Bisk

For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the…

Computation and Language · Computer Science 2024-07-17 Yichen Pan , Dehan Kong , Sida Zhou , Cheng Cui , Yifei Leng , Bing Jiang , Hangyu Liu , Yanyi Shang , Shuyan Zhou , Tongshuang Wu , Zhengyang Wu

As one of the most popular software applications, a web application is a program, accessible through the web, to dynamically generate content based on user interactions or contextual data, for example, online shopping platforms, social…

Software Engineering · Computer Science 2025-04-28 Tao Li , Rubing Huang , Chenhui Cui , Dave Towey , Lei Ma , Yuan-Fang Li , Wen Xia

The rise of powerful multimodal LLMs has enhanced the viability of building web agents which can, with increasing levels of autonomy, assist users to retrieve information and complete tasks on various human-computer interfaces. It is hence…

Information Retrieval · Computer Science 2024-09-26 Maria Wang , Srinivas Sunkara , Gilles Baechler , Jason Lin , Yun Zhu , Fedir Zubach , Lei Shu , Jindong Chen

Navigation is one of the fundamental tasks for automated exploration in Virtual Reality (VR). Existing technologies primarily focus on path optimization in 360-degree image datasets and 3D simulators, which cannot be directly applied to…

Software Engineering · Computer Science 2026-01-07 Xue Qin , Matthew DiGiovanni

Graphical user interfaces (GUI) automation agents are emerging as powerful tools, enabling humans to accomplish increasingly complex tasks on smart devices. However, users often inadvertently omit key information when conveying tasks, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Ziming Cheng , Zhiyuan Huang , Junting Pan , Zhaohui Hou , Mingjie Zhan

The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically…

Computation and Language · Computer Science 2024-06-10 Hongliang He , Wenlin Yao , Kaixin Ma , Wenhao Yu , Yong Dai , Hongming Zhang , Zhenzhong Lan , Dong Yu

Web applications are critical to modern software ecosystems, yet ensuring their reliability remains challenging due to the complexity and dynamic nature of web interfaces. Recent advances in large language models (LLMs) have shown promise…

Software Engineering · Computer Science 2026-02-20 Nguyen-Khang Le , Quan Minh Bui , Minh Ngoc Nguyen , Hiep Nguyen , Trung Vo , Son T. Luu , Shoshin Nomura , Minh Le Nguyen

Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for…

Visual Question-Answering (VQA) has become key to user experience, particularly after improved generalization capabilities of Vision-Language Models (VLMs). But evaluating VLMs for an application requirement using a standardized framework…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Neelabh Sinha , Vinija Jain , Aman Chadha

Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a…

Financial decision-making hinges on the analysis of relevant information embedded in the enormous volume of documents in the financial domain. To address this challenge, we developed FinQAPT, an end-to-end pipeline that streamlines the…

Information Retrieval · Computer Science 2024-11-04 Kuldeep Singh , Simerjot Kaur , Charese Smiley

Graphical User Interface (GUI) Agents, powered by large language and vision-language models, hold promise for enabling end-to-end automation in digital environments. However, their progress is fundamentally constrained by the scarcity of…

Machine Learning · Computer Science 2025-09-22 Musen Lin , Minghao Liu , Taoran Lu , Lichen Yuan , Yiwei Liu , Haonan Xu , Yu Miao , Yuhao Chao , Zhaojian Li

Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Cătălina Cangea , Eugene Belilovsky , Pietro Liò , Aaron Courville

Training web agents to navigate complex, real-world websites requires them to master $\textit{subtasks}$ - short-horizon interactions on multiple UI components (e.g., choosing the correct date in a date picker, or scrolling in a container…

Effective information disclosure in the context of databases with a large conceptual schema is known to be a non-trivial problem. In particular the formulation of ad-hoc queries is a major problem in such contexts. Existing approaches for…

Information Retrieval · Computer Science 2021-05-21 H. A. Proper

Multimodal Large Language Models (MLLMs) have demonstrated strong generalization in vision-language tasks, yet their ability to understand and act within embodied environments remains underexplored. We present NavBench, a benchmark to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Yanyuan Qiao , Haodong Hong , Wenqi Lyu , Dong An , Siqi Zhang , Yutong Xie , Xinyu Wang , Qi Wu
‹ Prev 1 2 3 10 Next ›