English
Related papers

Related papers: Scaling Multiple-Source Entity Resolution using St…

200 papers

Entity resolution (ER) is the task of identifying different representations of the same real-world entities across databases. It is a key step for knowledge base creation and text mining. Recent adaptation of deep learning methods for ER…

Databases · Computer Science 2019-06-20 Jungo Kasai , Kun Qian , Sairam Gurajada , Yunyao Li , Lucian Popa

Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most…

Databases · Computer Science 2019-06-17 Boyi Hou , Qun Chen , Yanyan Wang , Youcef Nafa , Zhanhuai Li

Entity resolution (ER) is a fundamental task in data integration that enables insights from heterogeneous data sources. The primary challenge of ER lies in classifying record pairs as matches or nonmatches, which in multi-source ER (MS-ER)…

Databases · Computer Science 2026-04-10 Victor Christen , Peter Christen

Entity resolution (ER) is the problem of identifying and merging records that refer to the same real-world entity. In many scenarios, raw records are stored under heterogeneous environment. Specifically, the schemas of records may differ…

Databases · Computer Science 2016-11-01 Yiming Lin , Hongzhi Wang , Jianzhong Li , Hong Gao

Accurately identifying different representations of the same real-world entity is an integral part of data cleaning and many methods have been proposed to accomplish it. The challenges of this entity resolution task that demand so much…

Machine Learning · Computer Science 2021-06-02 Alex Bogatu , Norman W. Paton , Mark Douthwaite , Stuart Davie , Andre Freitas

The state-of-the-art performance on entity resolution (ER) has been achieved by deep learning. However, deep models are usually trained on large quantities of accurately labeled training data, and can not be easily tuned towards a target…

Machine Learning · Computer Science 2022-04-12 Zhaoqiang Chen , Qun Chen , Youcef Nafa , Tianyi Duan , Wei Pan , Lijun Zhang , Zhanhuai Li

Transfer learning is an important approach for addressing the challenges posed by limited data availability in various applications. It accomplishes this by transferring knowledge from well-established source domains to a less familiar…

Machine Learning · Statistics 2025-03-03 Yeheng Ge , Xueyu Zhou , Jian Huang

In low-resource settings, model transfer can help to overcome a lack of labeled data for many tasks and domains. However, predicting useful transfer sources is a challenging problem, as even the most similar sources might lead to unexpected…

Computation and Language · Computer Science 2021-11-01 Lukas Lange , Jannik Strötgen , Heike Adel , Dietrich Klakow

Entity resolution (ER) refers to the problem of matching records in one or more relations that refer to the same real-world entity. While supervised machine learning (ML) approaches achieve the state-of-the-art results, they require a large…

Databases · Computer Science 2020-04-07 Renzhi Wu , Sanya Chaba , Saurabh Sawlani , Xu Chu , Saravanan Thirumuruganathan

Deep learning has gained broad interest in remote sensing image scene classification thanks to the effectiveness of deep neural networks in extracting the semantics from complex data. However, deep networks require large amounts of training…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Gianmarco Perantoni , Lorenzo Bruzzone

Entity Resolution (ER) is a critical data cleaning task for identifying records that refer to the same real-world entity. In the era of Big Data, traditional batch ER is often infeasible due to volume and velocity constraints, necessitating…

Databases · Computer Science 2026-01-05 Dimitrios Karapiperis , George Papadakis , Vassilios Verykios

Entity resolution (ER) is one of the fundamental problems in data integration, where machine learning (ML) based classifiers often provide the state-of-the-art results. Considerable human effort goes into feature engineering and training…

Multi-source transfer learning has been proven effective when within-target labeled data is scarce. Previous work focuses primarily on exploiting domain similarities and assumes that source domains are richly or at least comparably labeled.…

Machine Learning · Computer Science 2018-07-09 Zirui Wang , Jaime Carbonell

Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given $k$ data sources, the goal is to output a classifier for each source by exploiting shared…

Machine Learning · Statistics 2026-02-25 Rafael Hanashiro , Abhishek Shetty , Patrick Jaillet

Entity resolution (ER) is the process of identifying records that refer to the same entities within one or across multiple databases. Numerous techniques have been developed to tackle ER challenges over the years, with recent emphasis…

Databases · Computer Science 2023-11-14 George Papadakis , Nishadi Kirielle , Peter Christen , Themis Palpanas

Pathology diagnosis based on EEG signals and decoding brain activity holds immense importance in understanding neurological disorders. With the advancement of artificial intelligence methods and machine learning techniques, the potential…

In this paper, we study transfer learning for high-dimensional factor-augmented sparse linear models, motivated by applications in economics and finance where strongly correlated predictors and latent factor structures pose major challenges…

Methodology · Statistics 2026-03-23 Bo Fu , Dandan Jiang

Deep learning has revolutionized many industries by enabling models to automatically learn complex patterns from raw data, reducing dependence on manual feature engineering. However, deep learning algorithms are sensitive to input data, and…

Machine Learning · Computer Science 2025-07-21 Mert Sehri , Zehui Hua , Francisco de Assis Boldt , Patrick Dumond

Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a…

Databases · Computer Science 2018-03-20 Yuhang Zhang , Kee Siong Ng , Michael Walker , Pauline Chou , Tania Churchill , Peter Christen

Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five…

Information Retrieval · Computer Science 2021-11-23 Iurii Mokrii , Leonid Boytsov , Pavel Braslavski
‹ Prev 1 2 3 10 Next ›