English
Related papers

Related papers: Merging datasets through deep learning

200 papers

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring…

Machine Learning · Computer Science 2025-05-23 Zhixu Silvia Tao , Kasper Vinken , Hao-Wei Yeh , Avi Cooper , Xavier Boix

Entity matching is the problem of identifying which records refer to the same real-world entity. It has been actively researched for decades, and a variety of different approaches have been developed. Even today, it remains a challenging…

Databases · Computer Science 2021-06-02 Nils Barlaug , Jon Atle Gulla

Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model…

Artificial Intelligence · Computer Science 2025-06-26 Guinan Su , Jonas Geiping

An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable…

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola

Scholars studying organizations often work with multiple datasets lacking shared identifiers or covariates. In such situations, researchers usually use approximate string ("fuzzy") matching methods to combine datasets. String matching,…

Social and Information Networks · Computer Science 2025-09-24 Brian Libgober , Connor T. Jerzak

In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts…

Software Engineering · Computer Science 2021-09-08 Elizabeth Dinella , Todd Mytkowicz , Alexey Svyatkovskiy , Christian Bird , Mayur Naik , Shuvendu K. Lahiri

Record fusion is the task of aggregating multiple records that correspond to the same real-world entity in a database. We can view record fusion as a machine learning problem where the goal is to predict the "correct" value for each…

Machine Learning · Computer Science 2020-06-19 Alireza Heidari , George Michalopoulos , Shrinu Kushagra , Ihab F. Ilyas , Theodoros Rekatsinas

Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single…

Machine Learning · Computer Science 2023-09-28 Weishi Li , Yong Peng , Miao Zhang , Liang Ding , Han Hu , Li Shen

Named Entity Recognition (NER) is a fundamental task in natural language processing. It remains a research hotspot due to its wide applicability across domains. Although recent advances in deep learning have significantly improved NER…

Computation and Language · Computer Science 2025-08-12 Xiaobo Zhang , Congqing He , Ying He , Jian Peng , Dajie Fu , Tien-Ping Tan

Name matching is a key component of systems for entity resolution or record linkage. Alternative spellings of the same names are a com- mon occurrence in many applications. We use the largest collection of genealogy person records in the…

Information Retrieval · Computer Science 2014-05-09 Jeffrey Sukharev , Leonid Zhukov , Alexandrin Popescul

Machine-learning from a disparate set of tables, a data lake, requires assembling features by merging and aggregating tables. Data discovery can extend autoML to data tables by automating these steps. We present an in-depth analysis of such…

Databases · Computer Science 2025-05-20 Riccardo Cappuzzo , Aimee Coelho , Felix Lefebvre , Paolo Papotti , Gael Varoquaux

Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes.…

Machine Learning · Computer Science 2021-06-18 Jake Zhao , Mingfeng Ou , Linji Xue , Yunkai Cui , Sai Wu , Gang Chen

Mergers are an important aspect of galaxy formation and evolution. We aim to test whether deep learning techniques can be used to reproduce visual classification of observations, physical classification of simulations and highlight any…

Astrophysics of Galaxies · Physics 2019-06-12 W. J. Pearson , L. Wang , J. W. Trayford , C. E. Petrillo , F. F. S. van der Tak

Model merging is attracting attention as a novel method for creating a new model by combining the weights of different trained models. While previous studies reported that model merging works well for models trained on a single dataset with…

Machine Learning · Computer Science 2024-09-23 Masanori Yamada , Tomoya Yamashita , Shin'ya Yamaguchi , Daiki Chijiwa

In artificial intelligence (AI), especially deep learning, data diversity and volume play a pivotal role in model development. However, training a robust deep learning model often faces challenges due to data privacy, regulations, and the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Xiao Chen , Shunan Zhang , Eric Z. Chen , Yikang Liu , Lin Zhao , Terrence Chen , Shanhui Sun

Data integration tasks such as the creation and extension of knowledge graphs involve the fusion of heterogeneous entities from many sources. Matching and fusion of such entities require to also match and combine their properties…

Databases · Computer Science 2020-10-06 Daniel Ayala , Inma Hernández , David Ruiz , Erhard Rahm

Businesses, governmental bodies and NGO's have an ever-increasing amount of data at their disposal from which they try to extract valuable information. Often, this needs to be done not only accurately but also within a short time frame.…

Machine Learning · Computer Science 2021-09-16 Pim Verschuuren , Serena Palazzo , Tom Powell , Steve Sutton , Alfred Pilgrim , Michele Faucci Giannelli

Model merging combines multiple models into a single model with aggregated capabilities, making it a powerful tool for large language model (LLM) development. However, scaling model merging is challenging: performance depends on the choice…

Machine Learning · Computer Science 2026-02-03 Oliver Bolton , Aakanksha , Arash Ahmadian , Sara Hooker , Marzieh Fadaee , Beyza Ermis

Master Data Management (MDM) ensures data integrity, consistency, and reliability across an organization's systems. I introduce a novel complex match and merge algorithm optimized for real-time MDM solutions. The proposed method accurately…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-24 Durai Rajamanickam
‹ Prev 1 2 3 10 Next ›