Related papers: Benchmarking Real-Time Question Answering via Exec…

RealTime QA: What's the Answer Right Now?

We introduce REALTIME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). REALTIME QA inquires about the current world, and QA systems need to answer…

Computation and Language · Computer Science 2024-02-29 Jungo Kasai , Keisuke Sakaguchi , Yoichi Takahashi , Ronan Le Bras , Akari Asai , Xinyan Yu , Dragomir Radev , Noah A. Smith , Yejin Choi , Kentaro Inui

ReQA: An Evaluation for End-to-End Answer Retrieval Models

Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance. However, retrieving relevant answers from a huge corpus of documents is…

Computation and Language · Computer Science 2020-02-13 Amin Ahmad , Noah Constant , Yinfei Yang , Daniel Cer

Test-Time Strategies for More Efficient and Accurate Agentic RAG

Retrieval-Augmented Generation (RAG) systems face challenges with complex, multihop questions, and agentic frameworks such as Search-R1 (Jin et al., 2025), which operates iteratively, have been proposed to address these complexities.…

Information Retrieval · Computer Science 2026-03-16 Brian Zhang , Deepti Guntur , Zhiyang Zuo , Abhinav Sharma , Shreyas Chaudhari , Wenlong Zhao , Franck Dernoncourt , Puneet Mathur , Ryan Rossi , Nedim Lipka

RT-Bench: an Extensible Benchmark Framework for the Analysis and Management of Real-Time Applications

Benchmarking is crucial for testing and validating any system, even more so in real-time systems. Typical real-time applications adhere to well-understood abstractions: they exhibit a periodic behavior, operate on a well-defined working…

Software Engineering · Computer Science 2022-08-02 Mattia Nicolella , Shahin Roozkhosh , Denis Hoornaert , Andrea Bastoni , Renato Mancuso

LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge

Evaluating large language models (LLMs) on question answering often relies on static benchmarks that reward memorization and understate the role of retrieval, failing to capture the dynamic nature of world knowledge. We present…

Computation and Language · Computer Science 2025-11-07 Heng Zhou , Ao Yu , Yuchen Fan , Jianing Shi , Li Kang , Hejia Geng , Yongting Zhang , Yutao Fan , Yuhao Wu , Tiancheng He , Yiran Qin , Lei Bai , Zhenfei Yin

TSAQA: Time Series Analysis Question And Answering Benchmark

Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science. While recent work has begun to explore multi-task time series question answering (QA), current…

Artificial Intelligence · Computer Science 2026-02-02 Baoyu Jing , Sanhorn Chen , Lecheng Zheng , Boyu Liu , Zihao Li , Jiaru Zou , Tianxin Wei , Zhining Liu , Zhichen Zeng , Ruizhong Qiu , Xiao Lin , Yuchen Yan , Dongqi Fu , Jingchao Ni , Jingrui He , Hanghang Tong

Let the Agent Search: Autonomous Exploration Beats Rigid Workflows in Temporal Question Answering

Temporal Knowledge Graph Question Answering (TKGQA) is challenging because it requires multi-hop reasoning under complex temporal constraints. Recent LLM-based approaches have improved semantic modeling for this task, but many still rely on…

Computation and Language · Computer Science 2026-03-26 Xufei Lv , Jiahui Yang , Haoyuan Sun , Xialin Su , Zhiliang Tian , Yifu Gao , Linbo Qiao , Houde Liu

DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents

We introduce DeepSearchQA, a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks across 17 different fields. Unlike traditional benchmarks that target single answer retrieval or broad-spectrum…

Computation and Language · Computer Science 2026-01-30 Nikita Gupta , Riju Chatterjee , Lukas Haas , Connie Tao , Andrew Wang , Chang Liu , Hidekazu Oiwa , Elena Gribovskaya , Jan Ackermann , John Blitzer , Sasha Goldshtein , Dipanjan Das

Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution

Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the…

Artificial Intelligence · Computer Science 2026-05-28 Susanna Cifani , Mario Luca Bernardi , Marta Cimitile

RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models

Current temporal knowledge graph question answering (TKGQA) methods primarily focus on implicit temporal constraints, lacking the capability of handling more complex temporal queries, and struggle with limited reasoning abilities and error…

Computation and Language · Computer Science 2025-09-05 Zhaoyan Gong , Juan Li , Zhiqiang Liu , Lei Liang , Huajun Chen , Wen Zhang

A Role-Aware Multi-Agent Framework for Financial Education Question Answering with LLMs

Question answering (QA) plays a central role in financial education, yet existing large language model (LLM) approaches often fail to capture the nuanced and specialized reasoning required for financial problem-solving. The financial domain…

Computation and Language · Computer Science 2025-09-15 Andy Zhu , Yingjun Du

RankQA: Neural Question Answering with Answer Re-Ranking

The conventional paradigm in neural question answering (QA) for narrative content is limited to a two-stage process: first, relevant text passages are retrieved and, subsequently, a neural network for machine comprehension extracts the…

Computation and Language · Computer Science 2019-08-13 Bernhard Kratzwald , Anna Eigenmann , Stefan Feuerriegel

Exploring the Practicality of Generative Retrieval on Dynamic Corpora

Benchmarking the performance of information retrieval (IR) is mostly conducted with a fixed set of documents (static corpora). However, in realistic scenarios, this is rarely the case and the documents to be retrieved are constantly updated…

Information Retrieval · Computer Science 2024-10-08 Chaeeun Kim , Soyoung Yoon , Hyunji Lee , Joel Jang , Sohee Yang , Minjoon Seo

RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models

Understanding accurate atomic temporal event is essential for video comprehension. However, current video-language benchmarks often fall short to evaluate Large Multi-modal Models' (LMMs) temporal event understanding capabilities, as they…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Yuqi Liu , Qin Jin , Tianyuan Qu , Xuan Liu , Yang Du , Bei Yu , Jiaya Jia

L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search

We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a multi-agent retrieval framework for grounded legal question answering that decomposes queries into structured sub-problems, retrieves evidence…

Artificial Intelligence · Computer Science 2026-03-31 Ziqi Wang , Boqin Yuan

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

We introduce REAL, a benchmark and framework for multi-turn agent evaluations on deterministic simulations of real-world websites. REAL comprises high-fidelity, deterministic replicas of 11 widely-used websites across domains such as…

Artificial Intelligence · Computer Science 2025-04-18 Divyansh Garg , Shaun VanWeelden , Diego Caples , Andis Draguns , Nikil Ravi , Pranav Putta , Naman Garg , Tomas Abraham , Michael Lara , Federico Lopez , James Liu , Atharva Gundawar , Prannay Hebbar , Youngchul Joo , Jindong Gu , Charles London , Christian Schroeder de Witt , Sumeet Motwani

Measuring Retrieval Complexity in Question Answering Systems

In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the…

Computation and Language · Computer Science 2024-06-07 Matteo Gabburo , Nicolaas Paul Jedema , Siddhant Garg , Leonardo F. R. Ribeiro , Alessandro Moschitti

Multimedia-Aware Question Answering: A Review of Retrieval and Cross-Modal Reasoning Architectures

Question Answering (QA) systems have traditionally relied on structured text data, but the rapid growth of multimedia content (images, audio, video, and structured metadata) has introduced new challenges and opportunities for…

Information Retrieval · Computer Science 2025-10-24 Rahul Raja , Arpita Vats

TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval

Existing temporal QA benchmarks focus on simple fact-seeking queries from news corpora, while reasoning-intensive retrieval benchmarks lack temporal grounding. However, real-world information needs often require reasoning about temporal…

Information Retrieval · Computer Science 2026-01-15 Abdelrahman Abdallah , Mohammed Ali , Muhammad Abdul-Mageed , Adam Jatowt

DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes

We propose DailyQA, an automatically updated dynamic dataset that updates questions weekly and contains answers to questions on any given date. DailyQA utilizes daily updates from Wikipedia revision logs to implement a fully automated…

Information Retrieval · Computer Science 2025-05-26 Jiehan Cheng , Zhicheng Dou