Related papers: Robust Group Linkage
Record linkage is the process of identifying records that refer to the same entities from several databases. This process is challenging because commonly no unique entity identifiers are available. Linkage therefore has to rely on partially…
Record linkage is an essential part of nearly all real-world systems that consume structured and unstructured data coming from different sources. Typically no common key is available for connecting records. Massive data cleaning and data…
Record linkage is aimed at the accurate and efficient identification of records that represent the same entity within or across disparate databases. It is a fundamental task in data integration and increasingly required for accurate…
Accurate and efficient record linkage is an open challenge of particular relevance to Australian Government Agencies, who recognise that so-called wicked social problems are best tackled by forming partnerships founded on large-scale data…
We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as…
Many systems can be described using graphs, or networks. Detecting communities in these networks can provide information about the underlying structure and functioning of the original systems. Yet this detection is a complex task and a…
To address the shortcomings of real-world datasets, robust learning algorithms have been designed to overcome arbitrary and indiscriminate data corruption. However, practical processes of gathering data may lead to patterns of data…
Standard agglomerative clustering suggests establishing a new reliable linkage at every step. However, in order to provide adaptive, density-consistent and flexible solutions, we study extracting all the reliable linkages at each step,…
In record linkage (RL), or exact file matching, the goal is to identify the links between entities with information on two or more files. RL is an important activity in areas including counting the population, enhancing survey frames and…
Consider the following problem: given a database of records indexed by names (e.g., name of companies, restaurants, businesses, or universities) and a new name, determine whether the new name is in the database, and if so, which record it…
Graph algorithms are central to large-scale applications such as navigation systems, social networks, and data analysis platforms. This thesis studies two important challenges in such systems: robustness to failures and fairness in…
Link prediction problem has increasingly become prominent in many domains such as social network analyses, bioinformatics experiments, transportation networks, criminal investigations and so forth. A variety of techniques has been developed…
There has been substantial recent interest in record linkage, attempting to group the records pertaining to the same entities from a large database lacking unique identifiers. This can be viewed as a type of "microclustering," with few…
Entity alignment has always had significant uses within a multitude of diverse scientific fields. In particular, the concept of matching entities across networks has grown in significance in the world of social science as communicative…
Community detection, which focuses on clustering vertex interactions, plays a significant role in network analysis. However, it also faces numerous challenges like missing data and adversarial attack. How to further improve the performance…
Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data…
A significant problem in analysis of complex network is to reveal community structure, in which network nodes are tightly connected in the same communities, between which there are sparse connections. Previous algorithms for community…
Understanding the similar properties of people involved in group search sessions has the potential to significantly improve collaborative search systems; such systems could be enhanced by information retrieval algorithms and user interface…
In this paper, we focus on exploiting the group structure for large-dimensional factor models, which captures the homogeneous effects of common factors on individuals within the same group. In view of the fact that datasets in…
Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex…