Related papers: Towards Ground Truth Explainability on Tabular Dat…

Synthetic Data and the Shifting Ground of Truth

The emergence of synthetic data for privacy protection, training data generation, or simply convenient access to quasi-realistic data in any shape or volume complicates the concept of ground truth. Synthetic data mimic real-world…

Computers and Society · Computer Science 2025-09-18 Dietmar Offenhuber

What's Wrong with Your Synthetic Tabular Data? Using Explainable AI to Evaluate Generative Models

Evaluating synthetic tabular data is challenging, since they can differ from the real data in so many ways. There exist numerous metrics of synthetic data quality, ranging from statistical distances to predictive performance, often…

Machine Learning · Computer Science 2025-04-30 Jan Kapar , Niklas Koenen , Martin Jullum

Synthetic Tabular Data: Methods, Attacks and Defenses

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade,…

Machine Learning · Computer Science 2025-06-09 Graham Cormode , Samuel Maddock , Enayat Ullah , Shripad Gade

The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications

Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain limited. Current methods often rely on…

Machine Learning · Computer Science 2025-06-17 Philippe Brouillard , Chandler Squires , Jonas Wahl , Konrad P. Kording , Karen Sachs , Alexandre Drouin , Dhanya Sridhar

Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques

As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like…

Machine Learning · Computer Science 2025-07-17 Raju Challagundla , Mohsen Dorodchi , Pu Wang , Minwoo Lee

The Real Deal Behind the Artificial Appeal: Inferential Utility of Tabular Synthetic Data

Recent advances in generative models facilitate the creation of synthetic data to be made available for research in privacy-sensitive contexts. However, the analysis of synthetic data raises a unique set of methodological challenges. In…

Machine Learning · Computer Science 2024-06-13 Alexander Decruyenaere , Heidelinde Dehaene , Paloma Rabaey , Christiaan Polet , Johan Decruyenaere , Stijn Vansteelandt , Thomas Demeester

Structured Evaluation of Synthetic Tabular Data

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data;…

Machine Learning · Computer Science 2024-04-01 Scott Cheng-Hsin Yang , Baxter Eaves , Michael Schmidt , Ken Swanson , Patrick Shafto

Robust Detection of Synthetic Tabular Data under Schema Variability

The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked.…

Machine Learning · Computer Science 2025-12-02 G. Charbel N. Kindji , Elisa Fromont , Lina Maria Rojas-Barahona , Tanguy Urvoy

Generative Modeling of Complex Data

In recent years, several models have improved the capacity to generate synthetic tabular datasets. However, such models focus on synthesizing simple columnar tables and are not useable on real-life data with complex structures. This paper…

Machine Learning · Computer Science 2022-02-07 Luca Canale , Nicolas Grislain , Grégoire Lothe , Johan Leduc

Navigating Tabular Data Synthesis Research: Understanding User Needs and Tool Capabilities

In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations).…

Artificial Intelligence · Computer Science 2024-06-03 Maria F. Davila R. , Sven Groen , Fabian Panse , Wolfram Wingerath

A Survey on Deep Learning Approaches for Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, Diversity, and Beyond

Generative modelling has become the standard approach for synthesising tabular data. However, different use cases demand synthetic data to comply with different requirements to be useful in practice. In this survey, we review deep…

Machine Learning · Computer Science 2026-03-17 Mihaela Cătălina Stoian , Eleonora Giunchiglia , Thomas Lukasiewicz

Preserving logical and functional dependencies in synthetic tabular data

Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing…

Machine Learning · Computer Science 2024-09-27 Chaithra Umesh , Kristian Schultz , Manjunath Mahendra , Saparshi Bej , Olaf Wolkenhauer

Boosting Synthetic Data Generation with Effective Nonlinear Causal Discovery

Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, and artificial intelligence explanation. In all such contexts, it is crucial to generate plausible data samples. A common assumption…

Artificial Intelligence · Computer Science 2024-10-16 Martina Cinquini , Fosca Giannotti , Riccardo Guidotti

A Comprehensive Survey of Synthetic Tabular Data Generation

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang

REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths Towards more Measurable Explainable Recommendation

Explainable recommendation has attracted much attention from the industry and academic communities. It has shown great potential for improving the recommendation persuasiveness, informativeness and user satisfaction. Despite a lot of…

Information Retrieval · Computer Science 2023-03-02 Xu Chen , Jingsen Zhang , Lei Wang , Quanyu Dai , Zhenhua Dong , Ruiming Tang , Rui Zhang , Li Chen , Ji-Rong Wen

On the Usefulness of Synthetic Tabular Data Generation

Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning…

Machine Learning · Computer Science 2023-06-28 Dionysis Manousakas , Sergül Aydöre

Putting Things into Context: Rich Explanations for Query Answers using Join Graphs (extended version)

In many data analysis applications, there is a need to explain why a surprising or interesting result was produced by a query. Previous approaches to explaining results have directly or indirectly used data provenance (input tuples…

Databases · Computer Science 2021-03-30 Chenjie Li , Zhengjie Miao , Qitian Zeng , Boris Glavic , Sudeepa Roy

Reevaluating Causal Estimation Methods with Data from a Product Release

Recent developments in causal machine learning methods have made it easier to estimate flexible relationships between confounders, treatments and outcomes, making unconfoundedness assumptions in causal analysis more palatable. How…

Econometrics · Economics 2026-05-22 Justin Young , Eleanor Wiske Dillon

Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis

This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the…

Machine Learning · Computer Science 2023-08-29 Conor Hassan , Robert Salomone , Kerrie Mengersen

Improving Explanations: Applying the Feature Understandability Scale for Cost-Sensitive Feature Selection

With the growing pervasiveness of artificial intelligence, the ability to explain the inferences made by machine learning models has become increasingly important. Numerous techniques for model explainability have been proposed, with…

Human-Computer Interaction · Computer Science 2026-04-08 Nicola Rossberg , Bennett Kleinberg , Barry O'Sullivan , Luca Longo , Andrea Visentin