Related papers: Automatic Integration Issues of Tabular Data for O…

Universal Embeddings of Tabular Data

Tabular data in relational databases represents a significant portion of industrial data. Hence, analyzing and interpreting tabular data is of utmost importance. Application tasks on tabular data are manifold and are often not specified…

Machine Learning · Computer Science 2025-07-09 Astrid Franz , Frederik Hoppe , Marianne Michaelis , Udo Göbel

Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark

Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging…

Machine Learning · Computer Science 2023-10-27 Lasse Hansen , Nabeel Seedat , Mihaela van der Schaar , Andrija Petrovic

TableQA: Question Answering on Tabular Data

Tabular data is difficult to analyze and to search through, yielding for new tools and interfaces that would allow even non tech-savvy users to gain insights from open datasets without resorting to specialized data analysis tools or even…

Information Retrieval · Computer Science 2017-08-31 Svitlana Vakulenko , Vadim Savenkov

Embeddings for Tabular Data: A Survey

Tabular data comprising rows (samples) with the same set of columns (attributes, is one of the most widely used data-type among various industries, including financial services, health care, research, retail, and logistics, to name a few.…

Machine Learning · Computer Science 2023-02-24 Rajat Singh , Srikanta Bedathur

Navigating Tabular Data Synthesis Research: Understanding User Needs and Tool Capabilities

In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations).…

Artificial Intelligence · Computer Science 2024-06-03 Maria F. Davila R. , Sven Groen , Fabian Panse , Wolfram Wingerath

Synthetic Tabular Data: Methods, Attacks and Defenses

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade,…

Machine Learning · Computer Science 2025-06-09 Graham Cormode , Samuel Maddock , Enayat Ullah , Shripad Gade

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to…

Machine Learning · Computer Science 2026-04-16 Danrui Qi , Jinglin Peng , Yongjun He , Jiannan Wang

A Systematic Framework for Tabular Data Disentanglement

Tabular data, widely used in various applications such as industrial control systems, finance, and supply chain, often contains complex interrelationships among its attributes. Data disentanglement seeks to transform such data into latent…

Machine Learning · Computer Science 2026-04-10 Ivan Tjuawinata , Andre Gunawan , Anh Quan Tran , Nitish Kumar , Payal Pote , Harsh Bansal , Chu-Hung Chi , Kwok-Yan Lam , Parventanis Murthy

Auto-completion for Data Cells in Relational Tables

We address the task of auto-completing data cells in relational tables. Such tables describe entities (in rows) with their attributes (in columns). We present the CellAutoComplete framework to tackle several novel aspects of this problem,…

Information Retrieval · Computer Science 2020-02-06 Shuo Zhang , Krisztian Balog

Tabular Data: Is Deep Learning all you need?

Tabular data represent one of the most prevalent data formats in applied machine learning, largely because they accommodate a broad spectrum of real-world problems. Existing literature has studied many of the shortcomings of neural…

Machine Learning · Computer Science 2025-10-07 Guri Zabërgja , Arlind Kadra , Christian M. M. Frey , Josif Grabocka

Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges

Tables have gained significant attention in large language models (LLMs) and multimodal large language models (MLLMs) due to their complex and flexible structure. Unlike linear text inputs, tables are two-dimensional, encompassing formats…

Computation and Language · Computer Science 2025-08-04 Xiaofeng Wu , Alan Ritter , Wei Xu

Tabular Data Augmentation for Machine Learning: Progress and Prospects of Embracing Generative AI

Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality tabular data for model training remains a significant obstacle. Numerous works have focused on tabular data augmentation (TDA) to enhance the original…

Machine Learning · Computer Science 2024-08-01 Lingxi Cui , Huan Li , Ke Chen , Lidan Shou , Gang Chen

A Comprehensive Survey of Synthetic Tabular Data Generation

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang

Contextual Graph Embeddings: Accounting for Data Characteristics in Heterogeneous Data Integration

As organizations continue to access diverse datasets, the demand for effective data integration has increased. Key tasks in this process, such as schema matching and entity resolution, are essential but often require significant effort.…

Databases · Computer Science 2025-11-13 Yuka Haruki , Shigeru Ishikura , Kazuya Demachi , Teruaki Hayashi

Improving Schema Matching with Linked Data

With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources…

Databases · Computer Science 2012-05-16 Ahmad Assaf , Eldad Louw , Aline Senart , Corentin Follenfant , Raphaël Troncy , David Trastour

AutoG: Towards automatic graph construction from tabular data

Recent years have witnessed significant advancements in graph machine learning (GML), with its applications spanning numerous domains. However, the focus of GML has predominantly been on developing powerful models, often overlooking a…

Machine Learning · Computer Science 2025-11-13 Zhikai Chen , Han Xie , Jian Zhang , Xiang song , Jiliang Tang , Huzefa Rangwala , George Karypis

An Optimal Tabular Parsing Algorithm

In this paper we relate a number of parsing algorithms which have been developed in very different areas of parsing theory, and which include deterministic algorithms, tabular algorithms, and a parallel algorithm. We show that these…

cmp-lg · Computer Science 2008-02-03 Mark-Jan Nederhof

Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach

Due to the characteristics of Information and Communications Technology (ICT) products, the critical information of ICT devices is often summarized in big tabular data shared across supply chains. Therefore, it is critical to automatically…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Bin Xiao , Murat Simsek , Burak Kantarci , Ala Abu Alkheir

Cross-table Synthetic Tabular Data Detection

Detecting synthetic tabular data is essential to prevent the distribution of false or manipulated datasets that could compromise data-driven decision-making. This study explores whether synthetic tabular data can be reliably identified ''in…

Machine Learning · Computer Science 2024-12-19 G. Charbel N. Kindji , Lina Maria Rojas-Barahona , Elisa Fromont , Tanguy Urvoy

Synthetic Tabular Data Detection In the Wild

Detecting synthetic tabular data is essential to prevent the distribution of false or manipulated datasets that could compromise data-driven decision-making. This study explores whether synthetic tabular data can be reliably identified…

Machine Learning · Computer Science 2025-03-05 G. Charbel N. Kindji , Elisa Fromont , Lina Maria Rojas-Barahona , Tanguy Urvoy