English
Related papers

Related papers: Efficient and Effective ER with Progressive Blocki…

200 papers

This paper studies rule-based blocking in Entity Resolution (ER). We propose HyperBlocker, a GPU-accelerated system for blocking in ER. As opposed to previous blocking algorithms and parallel blocking solvers, HyperBlocker employs a…

Databases · Computer Science 2024-12-16 Xiaoke Zhu , Min Xie , Ting Deng , Qi Zhang

Entity Resolution (ER) is typically implemented as a batch task that processes all available data before identifying duplicate records. However, applications with time or computational constraints, e.g., those running in the cloud, require…

Databases · Computer Science 2025-03-12 Jakub Maciejewski , Konstantinos Nikoletos , George Papadakis , Yannis Velegrakis

Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian…

Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We…

Databases · Computer Science 2020-08-24 George Papadakis , Dimitrios Skoutas , Emmanouil Thanos , Themis Palpanas

Entity Resolution, also called record linkage or deduplication, refers to the process of identifying and merging duplicate versions of the same entity into a unified representation. The standard practice is to use a Rule based or Machine…

Artificial Intelligence · Computer Science 2016-09-22 Janani Balaji , Faizan Javed , Mayank Kejriwal , Chris Min , Sam Sander , Ozgur Ozturk

Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record…

Databases · Computer Science 2019-12-10 Wei Zhang , Hao Wei , Bunyamin Sisman , Xin Luna Dong , Christos Faloutsos , David Page

Entity Resolution (ER) is the task of finding entity profiles that correspond to the same real-world entity. Progressive ER aims to efficiently resolve large datasets when limited time and/or computational resources are available. In…

Databases · Computer Science 2019-05-17 Giovanni Simonini , George Papadakis , Themis Palpanas , Sonia Bergamaschi

An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data.…

Databases · Computer Science 2020-05-20 Vasilis Efthymiou , Kostas Stefanidis , Vassilis Christophides

Entity Resolution (ER) is a critical data cleaning task for identifying records that refer to the same real-world entity. In the era of Big Data, traditional batch ER is often infeasible due to volume and velocity constraints, necessitating…

Databases · Computer Science 2026-01-05 Dimitrios Karapiperis , George Papadakis , Vassilios Verykios

Entity Resolution constitutes a core data integration task that relies on Blocking in order to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall, requires no domain knowledge and applies to data of any…

Databases · Computer Science 2022-04-20 Luca Gagliardelli , George Papadakis , Giovanni Simonini , Sonia Bergamaschi , Themis Palpanas

Flexible sharing of electronic medical records (EMRs) is an urgent need in healthcare, as fragmented storage creates EMR management complexity for both practitioners and patients. Blockchain has emerged as a promising solution to address…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-30 Xiaohan Hu , Jyoti Sahni , Colin R. Simpson , Normalia Samian , Winston K. G. Seah

Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all aspects of ER, there is still a high demand for democratizing ER - humans are heavily involved in labeling data, performing feature…

Databases · Computer Science 2019-11-20 Muhammad Ebraheem , Saravanan Thirumuruganathan , Shafiq Joty , Mourad Ouzzani , Nan Tang

The goal of entity resolution is to identify records in multiple datasets that represent the same real-world entity. However, comparing all records across datasets can be computationally intensive, leading to long runtimes. To reduce these…

Databases · Computer Science 2023-06-26 Alexander Brinkmann , Roee Shraga , Christian Bizer

In recent years, crowdsourcing is increasingly applied as a means to enhance data quality. Although the crowd generates insightful information especially for complex problems such as entity resolution (ER), the output quality of crowd…

Databases · Computer Science 2015-12-03 Anja Gruenheid , Besmira Nushi , Tim Kraska , Wolfgang Gatterbauer , Donald Kossmann

Accurate and efficient entity resolution (ER) is a significant challenge in many data mining and analysis projects requiring integrating and processing massive data collections. It is becoming increasingly important in real-world…

Databases · Computer Science 2021-11-09 Samudra Herath , Matthew Roughan , Gary Glonek

Entity Resolution (ER) is a fundamental data quality improvement task that identifies and links records referring to the same real-world entity. Traditional ER approaches often rely on pairwise comparisons, which can be costly in terms of…

Databases · Computer Science 2025-06-04 Jiajie Fu , Haitong Tang , Arijit Khan , Sharad Mehrotra , Xiangyu Ke , Yunjun Gao

Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most…

Databases · Computer Science 2019-06-17 Boyi Hou , Qun Chen , Yanyan Wang , Youcef Nafa , Zhanhuai Li

Entity resolution (ER) is the problem of identifying and merging records that refer to the same real-world entity. In many scenarios, raw records are stored under heterogeneous environment. Specifically, the schemas of records may differ…

Databases · Computer Science 2016-11-01 Yiming Lin , Hongzhi Wang , Jianzhong Li , Hong Gao

This paper presents Block, a distributed scheduling framework designed to optimize load balancing and auto-provisioning across instances in large language model serving frameworks by leveraging contextual information from incoming requests.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-14 Wei Da , Evangelia Kalyvianaki

Entity Resolution suffers from quadratic time complexity. To increase its time efficiency, three kinds of filtering techniques are typically used for restricting its search space: (i) blocking workflows, which group together entity profiles…

‹ Prev 1 2 3 10 Next ›