English
Related papers

Related papers: Skyblocking for Entity Resolution

200 papers

Entity Resolution, also called record linkage or deduplication, refers to the process of identifying and merging duplicate versions of the same entity into a unified representation. The standard practice is to use a Rule based or Machine…

Artificial Intelligence · Computer Science 2016-09-22 Janani Balaji , Faizan Javed , Mayank Kejriwal , Chris Min , Sam Sander , Ozgur Ozturk

Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We…

Databases · Computer Science 2020-08-24 George Papadakis , Dimitrios Skoutas , Emmanouil Thanos , Themis Palpanas

Entity Resolution constitutes a core data integration task that relies on Blocking in order to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall, requires no domain knowledge and applies to data of any…

Databases · Computer Science 2022-04-20 Luca Gagliardelli , George Papadakis , Giovanni Simonini , Sonia Bergamaschi , Themis Palpanas

The goal of entity resolution is to identify records in multiple datasets that represent the same real-world entity. However, comparing all records across datasets can be computationally intensive, leading to long runtimes. To reduce these…

Databases · Computer Science 2023-06-26 Alexander Brinkmann , Roee Shraga , Christian Bizer

Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record…

Databases · Computer Science 2019-12-10 Wei Zhang , Hao Wei , Bunyamin Sisman , Xin Luna Dong , Christos Faloutsos , David Page

Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a…

Databases · Computer Science 2018-03-20 Yuhang Zhang , Kee Siong Ng , Michael Walker , Pauline Chou , Tania Churchill , Peter Christen

Skyline computation is an essential database operation that has many applications in multi-criteria decision making scenarios such as recommender systems. Existing algorithms have focused on checking point domination, which lack efficiency…

Databases · Computer Science 2021-07-22 Chuanwen Li , Yu Gu , Jianzhong Qi , Ge Yu

Skyline queries are one of the most widely adopted tools for Multi-Criteria Analysis, with applications covering diverse domains, including, e.g., Database Systems, Data Mining, and Decision Making. Skylines indeed offer a useful overview…

Databases · Computer Science 2024-11-25 Paolo Ciaccia , Davide Martinenghi

Entity Resolution suffers from quadratic time complexity. To increase its time efficiency, three kinds of filtering techniques are typically used for restricting its search space: (i) blocking workflows, which group together entity profiles…

Blocking is a critical step in entity resolution, and the emergence of neural network-based representation models has led to the development of dense blocking as a promising approach for exploring deep semantics in blocking. However,…

Databases · Computer Science 2024-04-26 Tianshu Wang , Hongyu Lin , Xianpei Han , Xiaoyang Chen , Boxi Cao , Le Sun

Unravelling hidden patterns in datasets is a classical problem with many potential applications. In this paper, we present a challenge whose objective is to discover nonlinear relationships in noisy cloud of points. If a set of point…

Machine Learning · Statistics 2018-05-31 Terry Lyons , Imanol Perez Arribas

Living in the Information Age allows almost everyone have access to a large amount of information and options to choose from in order to fulfill their needs. In many cases, the amount of information available and the rate of change may hide…

Databases · Computer Science 2017-04-07 Christos Kalyvas , Theodoros Tzouramanis

The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an even redistribution of data between map and reduce tasks. In the presence of skewed data, sophisticated redistribution…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-19 Lars Kolb , Andreas Thor , Erhard Rahm

Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than…

Machine Learning · Computer Science 2017-08-30 Tanmoy Chakraborty

While classical skyline queries identify interesting data within large datasets, flexible skylines introduce preferences through constraints on attribute weights, and further reduce the data returned. However, computing these queries can be…

Databases · Computer Science 2025-01-08 Emilio De Lorenzis , Davide Martinenghi

Entity Matching (EM) is crucial for identifying equivalent data entities across different sources, a task that becomes increasingly challenging with the growth and heterogeneity of data. Blocking techniques, which reduce the computational…

Machine Learning · Computer Science 2024-09-26 Mohammad Hossein Moslemi , Harini Balamurugan , Mostafa Milani

Extreme multi-label classification aims to learn a classifier that annotates an instance with a relevant subset of labels from an extremely large label set. Many existing solutions embed the label matrix to a low-dimensional linear…

Machine Learning · Computer Science 2018-11-06 Yuefeng Liang , Cho-Jui Hsieh , Thomas C. M. Lee

Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical.…

Databases · Computer Science 2017-05-31 Md Farhadur Rahman , Abolfazl Asudeh , Nick Koudas , Gautam Das

The problem of optimizing across different, conceivably conflicting, criteria is called multi-objective optimization and it is widely spread across many fields. This is a recurring problem in database queries when there is the need of…

Databases · Computer Science 2022-01-14 Matteo Savino

Skyline queries have wide-ranging applications in fields that involve multi-criteria decision making, including tourism, retail industry, and human resources. By automatically removing incompetent candidates, skyline queries allow users to…

Human-Computer Interaction · Computer Science 2018-04-24 Xun Zhao , Yanhong Wu , Weiwei Cui , Xinnan Du , Yuan Chen , Yong Wang , Dik Lun Lee , Huamin Qu
‹ Prev 1 2 3 10 Next ›