Related papers: Efficient and Effective ER with Progressive Blocki…
This paper studies rule-based blocking in Entity Resolution (ER). We propose HyperBlocker, a GPU-accelerated system for blocking in ER. As opposed to previous blocking algorithms and parallel blocking solvers, HyperBlocker employs a…
Entity Resolution (ER) is typically implemented as a batch task that processes all available data before identifying duplicate records. However, applications with time or computational constraints, e.g., those running in the cloud, require…
Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian…
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We…
Entity Resolution, also called record linkage or deduplication, refers to the process of identifying and merging duplicate versions of the same entity into a unified representation. The standard practice is to use a Rule based or Machine…
Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record…
Entity Resolution (ER) is the task of finding entity profiles that correspond to the same real-world entity. Progressive ER aims to efficiently resolve large datasets when limited time and/or computational resources are available. In…
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data.…
Entity Resolution (ER) is a critical data cleaning task for identifying records that refer to the same real-world entity. In the era of Big Data, traditional batch ER is often infeasible due to volume and velocity constraints, necessitating…
Entity Resolution constitutes a core data integration task that relies on Blocking in order to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall, requires no domain knowledge and applies to data of any…
Flexible sharing of electronic medical records (EMRs) is an urgent need in healthcare, as fragmented storage creates EMR management complexity for both practitioners and patients. Blockchain has emerged as a promising solution to address…
Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all aspects of ER, there is still a high demand for democratizing ER - humans are heavily involved in labeling data, performing feature…
The goal of entity resolution is to identify records in multiple datasets that represent the same real-world entity. However, comparing all records across datasets can be computationally intensive, leading to long runtimes. To reduce these…
In recent years, crowdsourcing is increasingly applied as a means to enhance data quality. Although the crowd generates insightful information especially for complex problems such as entity resolution (ER), the output quality of crowd…
Accurate and efficient entity resolution (ER) is a significant challenge in many data mining and analysis projects requiring integrating and processing massive data collections. It is becoming increasingly important in real-world…
Entity Resolution (ER) is a fundamental data quality improvement task that identifies and links records referring to the same real-world entity. Traditional ER approaches often rely on pairwise comparisons, which can be costly in terms of…
Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most…
Entity resolution (ER) is the problem of identifying and merging records that refer to the same real-world entity. In many scenarios, raw records are stored under heterogeneous environment. Specifically, the schemas of records may differ…
This paper presents Block, a distributed scheduling framework designed to optimize load balancing and auto-provisioning across instances in large language model serving frameworks by leveraging contextual information from incoming requests.…
Entity Resolution suffers from quadratic time complexity. To increase its time efficiency, three kinds of filtering techniques are typically used for restricting its search space: (i) blocking workflows, which group together entity profiles…