English
Related papers

Related papers: Benchmarking Real-Time Question Answering via Exec…

200 papers

We introduce REALTIME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). REALTIME QA inquires about the current world, and QA systems need to answer…

Computation and Language · Computer Science 2024-02-29 Jungo Kasai , Keisuke Sakaguchi , Yoichi Takahashi , Ronan Le Bras , Akari Asai , Xinyan Yu , Dragomir Radev , Noah A. Smith , Yejin Choi , Kentaro Inui

Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance. However, retrieving relevant answers from a huge corpus of documents is…

Computation and Language · Computer Science 2020-02-13 Amin Ahmad , Noah Constant , Yinfei Yang , Daniel Cer

Retrieval-Augmented Generation (RAG) systems face challenges with complex, multihop questions, and agentic frameworks such as Search-R1 (Jin et al., 2025), which operates iteratively, have been proposed to address these complexities.…

Benchmarking is crucial for testing and validating any system, even more so in real-time systems. Typical real-time applications adhere to well-understood abstractions: they exhibit a periodic behavior, operate on a well-defined working…

Software Engineering · Computer Science 2022-08-02 Mattia Nicolella , Shahin Roozkhosh , Denis Hoornaert , Andrea Bastoni , Renato Mancuso

Evaluating large language models (LLMs) on question answering often relies on static benchmarks that reward memorization and understate the role of retrieval, failing to capture the dynamic nature of world knowledge. We present…

Computation and Language · Computer Science 2025-11-07 Heng Zhou , Ao Yu , Yuchen Fan , Jianing Shi , Li Kang , Hejia Geng , Yongting Zhang , Yutao Fan , Yuhao Wu , Tiancheng He , Yiran Qin , Lei Bai , Zhenfei Yin

Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science. While recent work has begun to explore multi-task time series question answering (QA), current…

Temporal Knowledge Graph Question Answering (TKGQA) is challenging because it requires multi-hop reasoning under complex temporal constraints. Recent LLM-based approaches have improved semantic modeling for this task, but many still rely on…

Computation and Language · Computer Science 2026-03-26 Xufei Lv , Jiahui Yang , Haoyuan Sun , Xialin Su , Zhiliang Tian , Yifu Gao , Linbo Qiao , Houde Liu

We introduce DeepSearchQA, a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks across 17 different fields. Unlike traditional benchmarks that target single answer retrieval or broad-spectrum…

Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the…

Artificial Intelligence · Computer Science 2026-05-28 Susanna Cifani , Mario Luca Bernardi , Marta Cimitile

Current temporal knowledge graph question answering (TKGQA) methods primarily focus on implicit temporal constraints, lacking the capability of handling more complex temporal queries, and struggle with limited reasoning abilities and error…

Computation and Language · Computer Science 2025-09-05 Zhaoyan Gong , Juan Li , Zhiqiang Liu , Lei Liang , Huajun Chen , Wen Zhang

Question answering (QA) plays a central role in financial education, yet existing large language model (LLM) approaches often fail to capture the nuanced and specialized reasoning required for financial problem-solving. The financial domain…

Computation and Language · Computer Science 2025-09-15 Andy Zhu , Yingjun Du

The conventional paradigm in neural question answering (QA) for narrative content is limited to a two-stage process: first, relevant text passages are retrieved and, subsequently, a neural network for machine comprehension extracts the…

Computation and Language · Computer Science 2019-08-13 Bernhard Kratzwald , Anna Eigenmann , Stefan Feuerriegel

Benchmarking the performance of information retrieval (IR) is mostly conducted with a fixed set of documents (static corpora). However, in realistic scenarios, this is rarely the case and the documents to be retrieved are constantly updated…

Information Retrieval · Computer Science 2024-10-08 Chaeeun Kim , Soyoung Yoon , Hyunji Lee , Joel Jang , Sohee Yang , Minjoon Seo

Understanding accurate atomic temporal event is essential for video comprehension. However, current video-language benchmarks often fall short to evaluate Large Multi-modal Models' (LMMs) temporal event understanding capabilities, as they…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Yuqi Liu , Qin Jin , Tianyuan Qu , Xuan Liu , Yang Du , Bei Yu , Jiaya Jia

We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a multi-agent retrieval framework for grounded legal question answering that decomposes queries into structured sub-problems, retrieves evidence…

Artificial Intelligence · Computer Science 2026-03-31 Ziqi Wang , Boqin Yuan

We introduce REAL, a benchmark and framework for multi-turn agent evaluations on deterministic simulations of real-world websites. REAL comprises high-fidelity, deterministic replicas of 11 widely-used websites across domains such as…

In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the…

Computation and Language · Computer Science 2024-06-07 Matteo Gabburo , Nicolaas Paul Jedema , Siddhant Garg , Leonardo F. R. Ribeiro , Alessandro Moschitti

Question Answering (QA) systems have traditionally relied on structured text data, but the rapid growth of multimedia content (images, audio, video, and structured metadata) has introduced new challenges and opportunities for…

Information Retrieval · Computer Science 2025-10-24 Rahul Raja , Arpita Vats

Existing temporal QA benchmarks focus on simple fact-seeking queries from news corpora, while reasoning-intensive retrieval benchmarks lack temporal grounding. However, real-world information needs often require reasoning about temporal…

Information Retrieval · Computer Science 2026-01-15 Abdelrahman Abdallah , Mohammed Ali , Muhammad Abdul-Mageed , Adam Jatowt

We propose DailyQA, an automatically updated dynamic dataset that updates questions weekly and contains answers to questions on any given date. DailyQA utilizes daily updates from Wikipedia revision logs to implement a fully automated…

Information Retrieval · Computer Science 2025-05-26 Jiehan Cheng , Zhicheng Dou
‹ Prev 1 2 3 10 Next ›