English
Related papers

Related papers: Practical Bayesian Inference for Record Linkage

200 papers

Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism…

Methodology · Statistics 2020-01-08 Brendan S. McVeigh , Bradley T. Spahn , Jared S. Murray

Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied…

Methodology · Statistics 2015-04-29 Rebecca C. Steorts

Researchers are often interested in linking individuals between two datasets that lack a common unique identifier. Matching procedures often struggle to match records with common names, birthplaces or other field values. Computational…

Methodology · Statistics 2021-06-14 Thomas Stringham

We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as…

Methodology · Statistics 2015-11-03 Rebecca C. Steorts , Rob Hall , Stephen E. Fienberg

In record linkage (RL), or exact file matching, the goal is to identify the links between entities with information on two or more files. RL is an important activity in areas including counting the population, enhancing survey frames and…

Statistics Theory · Mathematics 2012-12-21 Michael D. Larsen

We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em…

Computation · Statistics 2014-03-04 Rebecca C. Steorts , Rob Hall , Stephen E. Fienberg

Consider the following problem: given a database of records indexed by names (e.g., name of companies, restaurants, businesses, or universities) and a new name, determine whether the new name is in the database, and if so, which record it…

Databases · Computer Science 2018-06-29 Bahare Fatemi , Seyed Mehran Kazemi , David Poole

In theory, the probabilistic linkage method provides two distinct advantages over non-probabilistic methods, including minimal rates of linkage error and accurate measures of these rates for data users. However, implementations can fall…

Methodology · Statistics 2019-11-06 Abel Dasylva , Arthur Goussanou , David Ajavon , Hanan Abousaleh

Record linkage (de-duplication or entity resolution) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from such databases, the downstream task is any inferential,…

Methodology · Statistics 2018-10-12 Rebecca C. Steorts , Andrea Tancredi , Brunero Liseo

Merging datafiles containing information on overlapping sets of entities is a challenging task in the absence of unique identifiers, and is further complicated when some entities are duplicated in the datafiles. Most approaches to this…

Methodology · Statistics 2021-10-11 Serge Aleshin-Guendel , Mauricio Sadinle

In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only…

Methodology · Statistics 2024-06-25 Gauri Kamat , Mingyang Shan , Roee Gutman

Given several databases containing person-specific data held by different organizations, Privacy-Preserving Record Linkage (PPRL) aims to identify and link records that correspond to the same entity/individual across different databases…

Databases · Computer Science 2022-12-13 Dinusha Vatsalan , Dimitrios Karapiperis , Vassilios S. Verykios

Probabilistic record linkage, the task of merging two or more databases in the absence of a unique identifier, is a perennial and challenging problem. It is closely related to the problem of deduplicating a single database, which can be…

Methodology · Statistics 2016-03-28 Jared S. Murray

Data cleaning is naturally framed as probabilistic inference in a generative model of ground-truth data and likely errors, but the diversity of real-world error patterns and the hardness of inference make Bayesian approaches difficult to…

Machine Learning · Computer Science 2022-11-22 Alexander K. Lew , Monica Agrawal , David Sontag , Vikash K. Mansinghka

In this paper we introduce a novel Bayesian approach for linking multiple social networks in order to discover the same real world person having different accounts across networks. In particular, we develop a latent model that allow us to…

Applications · Statistics 2018-08-15 Juan Sosa , Abel Rodriguez

Entity resolution (record linkage or deduplication) is the process of identifying and linking duplicate records in databases. In this paper, we propose a Bayesian graphical approach for entity resolution that links records to latent…

Methodology · Statistics 2023-01-10 Neil G. Marchant , Benjamin I. P. Rubinstein , Rebecca C. Steorts

Deep learning-based linkage of records across different databases is becoming increasingly useful in data integration and mining applications to discover new insights from multiple sources of data. However, due to privacy and…

Cryptography and Security · Computer Science 2022-11-07 Thilina Ranbaduge , Dinusha Vatsalan , Ming Ding

The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of…

Methodology · Statistics 2016-01-26 Mauricio Sadinle

Bayesian statistics is concerned with conducting posterior inference for the unknown quantities in a given statistical model. Conventional Bayesian inference requires the specification of a probabilistic model for the observed data, and the…

Methodology · Statistics 2023-05-11 David T. Frazier , Christopher Drovandi , David J. Nott

In real-world Bayesian inference applications, prior assumptions regarding the parameters of interest may be unrepresentative of their actual values for a given dataset. In particular, if the likelihood is concentrated far out in the wings…

Computation · Statistics 2018-11-01 Xi Chen , Mike Hobson , Saptarshi Das , Paul Gelderblom
‹ Prev 1 2 3 10 Next ›