English
Related papers

Related papers: CODEC: Complex Document and Entity Collection

200 papers

Across the financial domain, researchers answer complex questions by extensively "searching" for relevant information to generate long-form reports. This workshop paper discusses automating the construction of query-specific document and…

Information Retrieval · Computer Science 2022-11-09 Iain Mackie , Jeffrey Dalton

While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present…

Information Retrieval · Computer Science 2024-01-12 Shubham Chatterjee , Iain Mackie , Jeff Dalton

We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three…

Computation and Language · Computer Science 2020-10-07 Tara Safavi , Danai Koutra

Publication databases rely on accurate metadata extraction from diverse web sources, yet variations in web layouts and data formats present challenges for metadata providers. This paper introduces CRAWLDoc, a new method for contextual…

Computation and Language · Computer Science 2025-06-05 Fabian Karl , Ansgar Scherp

Neural IR has advanced through two distinct paths: entity-oriented approaches leveraging knowledge graphs and multi-vector models capturing fine-grained semantics. We introduce QDER, a neural re-ranking model that unifies these approaches…

Information Retrieval · Computer Science 2025-10-14 Shubham Chatterjee , Jeff Dalton

Contrastive learning has been the dominant approach to training dense retrieval models. In this work, we investigate the impact of ranking context - an often overlooked aspect of learning dense retrieval models. In particular, we examine…

Information Retrieval · Computer Science 2023-10-24 George Zerveas , Navid Rekabsaz , Daniel Cohen , Carsten Eickhoff

Coreference resolution across multiple documents poses a significant challenge in natural language processing, particularly within the domain of knowledge graphs. This study introduces an innovative method aimed at identifying and resolving…

Computation and Language · Computer Science 2025-04-09 Zhang Dong , Mingbang Wang , Songhang deng , Le Dai , Jiyuan Li , Xingzu Liu , Ruilin Nong

With over 200 million published academic documents and millions of new documents being written each year, academic researchers face the challenge of searching for information within this vast corpus. However, existing retrieval systems…

Information Retrieval · Computer Science 2024-05-21 Gengchen Wei , Xinle Pang , Tianning Zhang , Yu Sun , Xun Qian , Chen Lin , Han-Sen Zhong , Wanli Ouyang

The ability to understand and answer questions over documents can be useful in many business and practical applications. However, documents often contain lengthy and diverse multimodal contents such as texts, figures, and tables, which are…

Computation and Language · Computer Science 2024-11-12 Yew Ken Chia , Liying Cheng , Hou Pong Chan , Chaoqun Liu , Maojia Song , Sharifah Mahani Aljunied , Soujanya Poria , Lidong Bing

Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into…

Multi-entity question answering (MEQA) represents significant challenges for large language models (LLM) and retrieval-augmented generation (RAG) systems, which frequently struggle to consolidate scattered information across diverse…

Computation and Language · Computer Science 2025-09-25 Teng Lin , Yuyu Luo , Honglin Zhang , Jicheng Zhang , Chunlin Liu , Kaishun Wu , Nan Tang

Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations.…

Computation and Language · Computer Science 2023-07-07 Zijun Yao , Yantao Liu , Xin Lv , Shulin Cao , Jifan Yu , Lei Hou , Juanzi Li

Research into COVID-19 is a big challenge and highly relevant at the moment. New tools are required to assist medical experts in their research with relevant and valuable information. The COVID-19 Open Research Dataset Challenge (CORD-19)…

Digital Libraries · Computer Science 2020-05-19 Hermann Kroll , Jan Pirklbauer , Johannes Ruthmann , Wolf-Tilo Balke

We introduce the task of entity-centric query refinement. Given an input query whose answer is a (potentially large) collection of entities, the task output is a small set of query refinements meant to assist the user in efficient domain…

Computation and Language · Computer Science 2022-09-19 David Wadden , Nikita Gupta , Kenton Lee , Kristina Toutanova

Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the…

Computation and Language · Computer Science 2024-08-27 Qiang Gao , Zixiang Meng , Bobo Li , Jun Zhou , Fei Li , Chong Teng , Donghong Ji

Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type…

Computation and Language · Computer Science 2021-02-01 James Ravenscroft , Arie Cattan , Amanda Clare , Ido Dagan , Maria Liakata

Document-level Relation Extraction (DocRE) involves identifying relations between entities across multiple sentences in a document. Evidence sentences, crucial for precise entity pair relationships identification, enhance focus on essential…

Computation and Language · Computer Science 2025-04-10 Khai Phan Tran , Xue Li

Deep Research Agents increasingly automate survey generation, yet whether they match human experts at retrieving essential papers and organizing them into expert-like taxonomies remains unclear. Existing benchmarks emphasize writing quality…

Existing scholarly information extraction (SIE) datasets focus on scientific papers and overlook implementation-level details in code repositories. README files describe datasets, source code, and other implementation-level artifacts,…

Computation and Language · Computer Science 2026-03-09 Genet Asefa Gesese , Zongxiong Chen , Shufan Jiang , Mary Ann Tan , Zhaotai Liu , Sonja Schimmler , Harald Sack

Entity-oriented retrieval assumes that relevant documents exhibit query-relevant entities, yet evaluations report conflicting results. We show this inconsistency stems not from model failure, but from evaluation. On TREC Robust04, we…

Information Retrieval · Computer Science 2026-04-08 Shubham Chatterjee
‹ Prev 1 2 3 10 Next ›