Related papers: Entity Matching by Pool-based Active Learning

A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

Entity Matching (EM) is a core data cleaning task, aiming to identify different mentions of the same real-world entity. Active learning is one way to address the challenge of scarce labeled data in practice, by dynamically collecting the…

Databases · Computer Science 2020-03-31 Venkata Vamsikrishna Meduri , Lucian Popa , Prithviraj Sen , Mohamed Sarwat

The Battleship Approach to the Low Resource Entity Matching Problem

Entity matching, a core data integration problem, is the task of deciding whether two data tuples refer to the same real-world entity. Recent advances in deep learning methods, using pre-trained language models, were proposed for resolving…

Databases · Computer Science 2023-11-28 Bar Genossar , Avigdor Gal , Roee Shraga

Active Learning for Entity Alignment

In this work, we propose a novel framework for the labeling of entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the…

Machine Learning · Computer Science 2021-05-27 Max Berrendorf , Evgeniy Faerman , Volker Tresp

Low-resource Deep Entity Resolution with Transfer and Active Learning

Entity resolution (ER) is the task of identifying different representations of the same real-world entities across databases. It is a key step for knowledge base creation and text mining. Recent adaptation of deep learning methods for ER…

Databases · Computer Science 2019-06-20 Jungo Kasai , Kun Qian , Sairam Gurajada , Yunyao Li , Lucian Popa

Neural Networks for Entity Matching: A Survey

Entity matching is the problem of identifying which records refer to the same real-world entity. It has been actively researched for decades, and a variety of different approaches have been developed. Even today, it remains a challenging…

Databases · Computer Science 2021-06-02 Nils Barlaug , Jon Atle Gulla

ALdataset: a benchmark for pool-based active learning

Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points. Pool-based AL is…

Machine Learning · Computer Science 2020-10-19 Xueying Zhan , Antoni Bert Chan

Learning from Natural Language Explanations for Generalizable Entity Matching

Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching…

Computation and Language · Computer Science 2024-10-01 Somin Wadhwa , Adit Krishnan , Runhui Wang , Byron C. Wallace , Chris Kong

Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching

Entity matching (EM) is a critical step in entity resolution (ER). Recently, entity matching based on large language models (LLMs) has shown great promise. However, current LLM-based entity matching approaches typically follow a binary…

Computation and Language · Computer Science 2024-12-13 Tianshu Wang , Xiaoyang Chen , Hongyu Lin , Xuanang Chen , Xianpei Han , Hao Wang , Zhenyu Zeng , Le Sun

Effective Few-Shot Named Entity Linking by Meta-Learning

Entity linking aims to link ambiguous mentions to their corresponding entities in a knowledge base, which is significant and fundamental for various downstream applications, e.g., knowledge base completion, question answering, and…

Computation and Language · Computer Science 2022-07-20 Xiuxing Li , Zhenyu Li , Zhengyan Zhang , Ning Liu , Haitao Yuan , Wei Zhang , Zhiyuan Liu , Jianyong Wang

MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching

Entity Matching (EM), which aims to identify all entity pairs referring to the same real-world entity from relational tables, is one of the most important tasks in real-world data management systems. Due to the labeling process of EM being…

Databases · Computer Science 2023-08-07 Xiaocan Zeng , Pengfei Wang , Yuren Mao , Lu Chen , Xiaoze Liu , Yunjun Gao

DAME: Domain Adaptation for Matching Entities

Entity matching (EM) identifies data records that refer to the same real-world entity. Despite the effort in the past years to improve the performance in EM, the existing methods still require a huge amount of labeled data in each domain…

Machine Learning · Computer Science 2022-04-21 Mohamed Trabelsi , Jeff Heflin , Jin Cao

Ensemble Semi-supervised Entity Alignment via Cycle-teaching

Entity alignment is to find identical entities in different knowledge graphs. Although embedding-based entity alignment has recently achieved remarkable progress, training data insufficiency remains a critical challenge. Conventional…

Artificial Intelligence · Computer Science 2022-03-15 Kexuan Xin , Zequn Sun , Wen Hua , Bing Liu , Wei Hu , Jianfeng Qu , Xiaofang Zhou

Entity Matching using Large Language Models

Entity matching is the task of deciding whether two entity descriptions refer to the same real-world entity. Entity matching is a central step in most data integration pipelines. Many state-of-the-art entity matching methods rely on…

Computation and Language · Computer Science 2024-10-21 Ralph Peeters , Aaron Steiner , Christian Bizer

Deep Indexed Active Learning for Matching Heterogeneous Entity Representations

Given two large lists of records, the task in entity resolution (ER) is to find the pairs from the Cartesian product of the lists that correspond to the same real world entity. Typically, passive learning methods on such tasks require large…

Databases · Computer Science 2022-01-19 Arjit Jain , Sunita Sarawagi , Prithviraj Sen

AutoBlock: A Hands-off Blocking Framework for Entity Matching

Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record…

Databases · Computer Science 2019-12-10 Wei Zhang , Hao Wei , Bunyamin Sisman , Xin Luna Dong , Christos Faloutsos , David Page

Entity Linking Meets Deep Learning: Techniques and Solutions

Entity linking (EL) is the process of linking entity mentions appearing in web text with their corresponding entities in a knowledge base. EL plays an important role in the fields of knowledge engineering and data mining, underlying a…

Computation and Language · Computer Science 2021-09-28 Wei Shen , Yuhan Li , Yinan Liu , Jiawei Han , Jianyong Wang , Xiaojie Yuan

ActiveMatch: End-to-end Semi-supervised Active Representation Learning

Semi-supervised learning (SSL) is an efficient framework that can train models with both labeled and unlabeled data, but may generate ambiguous and non-distinguishable representations when lacking adequate labeled samples. With…

Computer Vision and Pattern Recognition · Computer Science 2022-08-08 Xinkai Yuan , Zilinghan Li , Gaoang Wang

Minimax Active Learning

Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Sayna Ebrahimi , William Gan , Dian Chen , Giscard Biamby , Kamyar Salahi , Michael Laielli , Shizhan Zhu , Trevor Darrell

Structured Multi-Step Reasoning for Entity Matching Using Large Language Model

Entity matching is a fundamental task in data cleaning and data integration. With the rapid adoption of large language models (LLMs), recent studies have explored zero-shot and few-shot prompting to improve entity matching accuracy.…

Databases · Computer Science 2025-12-01 Rohan Bopardikar , Jin Wang , Jia Zou

Gradual Machine Learning for Entity Resolution

Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most…

Databases · Computer Science 2019-06-17 Boyi Hou , Qun Chen , Yanyan Wang , Youcef Nafa , Zhanhuai Li