English
Related papers

Related papers: Incremental Entity Resolution from Linked Document…

200 papers

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Clustering web documents has numerous applications, such as aggregating news articles into meaningful events, detecting trends and hot topics on the Web, preserving diversity in search results, etc. At the same time, the importance of named…

Computation and Language · Computer Science 2016-07-19 Matthias Galle , Jean-Michel Renders , Guillaume Jacquet

The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the…

Information Retrieval · Computer Science 2024-08-07 Hassan S. Shavarani , Anoop Sarkar

Named entities in text documents are the names of people, organization, location or other types of objects in the documents that exist in the real world. A persisting research challenge is to use computational techniques to identify such…

Computation and Language · Computer Science 2019-07-09 Abdulkareem Alsudais , Hovig Tchalian

The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or…

Information Retrieval · Computer Science 2017-03-31 Besnik Fetahu , Ujwal Gadiraju , Stefan Dietze

Coreference resolution across multiple documents poses a significant challenge in natural language processing, particularly within the domain of knowledge graphs. This study introduces an innovative method aimed at identifying and resolving…

Computation and Language · Computer Science 2025-04-09 Zhang Dong , Mingbang Wang , Songhang deng , Le Dai , Jiyuan Li , Xingzu Liu , Ruilin Nong

We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs…

Machine Learning · Computer Science 2024-12-19 Xin Du , Kumiko Tanaka-Ishii

Entity resolution is the problem of reconciling database references corresponding to the same real-world entities. Given the abundance of publicly available databases that have unresolved entities, we motivate the problem of query-time…

Databases · Computer Science 2011-11-02 I. Bhattacharya , L. Getoor

Information Retrieval systems can be improved by exploiting context information such as user and document features. This article presents a model based on overlapping probabilistic or fuzzy clusters for such features. The model is applied…

Human-Computer Interaction · Computer Science 2011-02-21 Thomas Mandl , Christa Womser-Hacker

Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a…

Databases · Computer Science 2018-03-20 Yuhang Zhang , Kee Siong Ng , Michael Walker , Pauline Chou , Tania Churchill , Peter Christen

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

Record linkage is the process of identifying records that refer to the same entities from several databases. This process is challenging because commonly no unique entity identifiers are available. Linkage therefore has to rely on partially…

Databases · Computer Science 2016-12-14 Peter Christen

Relating entities and events in text is a key component of natural language understanding. Cross-document coreference resolution, in particular, is important for the growing interest in multi-document analysis tasks. In this work we propose…

Computation and Language · Computer Science 2021-04-20 Emily Allaway , Shuai Wang , Miguel Ballesteros

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

Statements about entities occur everywhere, from newspapers and web pages to structured databases. Correlating references to entities across systems that use different identifiers or names for them is a widespread problem. In this paper, we…

Artificial Intelligence · Computer Science 2014-06-27 R. V. Guha

We consider the task of document-level entity linking (EL), where it is important to make consistent decisions for entity mentions over the full document jointly. We aim to leverage explicit "connections" among mentions within the document…

Computation and Language · Computer Science 2022-07-05 Klim Zaporojets , Johannes Deleu , Yiwei Jiang , Thomas Demeester , Chris Develder

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

Entity resolution (record linkage, microclustering) systems are notoriously difficult to evaluate. Looking for a needle in a haystack, traditional evaluation methods use sophisticated, application-specific sampling schemes to find matching…

Computation and Language · Computer Science 2024-04-09 Olivier Binette , Youngsoo Baek , Siddharth Engineer , Christina Jones , Abel Dasylva , Jerome P. Reiter

This paper presents a link analysis approach for identifying privileged documents by constructing a network of human entities derived from email header metadata. Entities are classified as either counsel or non-counsel based on a predefined…

Information Retrieval · Computer Science 2025-12-10 Jianping Zhang , Han Qin , Nathaniel Huber-Fliflet

Recent advances in machine learning, particularly Large Language Models (LLMs) such as BERT and GPT, provide rich contextual embeddings that improve text representation. However, current document clustering approaches often ignore the…

Computation and Language · Computer Science 2024-12-20 Imed Keraghel , Mohamed Nadif
‹ Prev 1 2 3 10 Next ›