English
Related papers

Related papers: Towards Ground Truth Explainability on Tabular Dat…

200 papers

The emergence of synthetic data for privacy protection, training data generation, or simply convenient access to quasi-realistic data in any shape or volume complicates the concept of ground truth. Synthetic data mimic real-world…

Computers and Society · Computer Science 2025-09-18 Dietmar Offenhuber

Evaluating synthetic tabular data is challenging, since they can differ from the real data in so many ways. There exist numerous metrics of synthetic data quality, ranging from statistical distances to predictive performance, often…

Machine Learning · Computer Science 2025-04-30 Jan Kapar , Niklas Koenen , Martin Jullum

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade,…

Machine Learning · Computer Science 2025-06-09 Graham Cormode , Samuel Maddock , Enayat Ullah , Shripad Gade

Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain limited. Current methods often rely on…

As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like…

Machine Learning · Computer Science 2025-07-17 Raju Challagundla , Mohsen Dorodchi , Pu Wang , Minwoo Lee

Recent advances in generative models facilitate the creation of synthetic data to be made available for research in privacy-sensitive contexts. However, the analysis of synthetic data raises a unique set of methodological challenges. In…

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data;…

Machine Learning · Computer Science 2024-04-01 Scott Cheng-Hsin Yang , Baxter Eaves , Michael Schmidt , Ken Swanson , Patrick Shafto

The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked.…

Machine Learning · Computer Science 2025-12-02 G. Charbel N. Kindji , Elisa Fromont , Lina Maria Rojas-Barahona , Tanguy Urvoy

In recent years, several models have improved the capacity to generate synthetic tabular datasets. However, such models focus on synthesizing simple columnar tables and are not useable on real-life data with complex structures. This paper…

Machine Learning · Computer Science 2022-02-07 Luca Canale , Nicolas Grislain , Grégoire Lothe , Johan Leduc

In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations).…

Artificial Intelligence · Computer Science 2024-06-03 Maria F. Davila R. , Sven Groen , Fabian Panse , Wolfram Wingerath

Generative modelling has become the standard approach for synthesising tabular data. However, different use cases demand synthetic data to comply with different requirements to be useful in practice. In this survey, we review deep…

Machine Learning · Computer Science 2026-03-17 Mihaela Cătălina Stoian , Eleonora Giunchiglia , Thomas Lukasiewicz

Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing…

Machine Learning · Computer Science 2024-09-27 Chaithra Umesh , Kristian Schultz , Manjunath Mahendra , Saparshi Bej , Olaf Wolkenhauer

Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, and artificial intelligence explanation. In all such contexts, it is crucial to generate plausible data samples. A common assumption…

Artificial Intelligence · Computer Science 2024-10-16 Martina Cinquini , Fosca Giannotti , Riccardo Guidotti

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang

Explainable recommendation has attracted much attention from the industry and academic communities. It has shown great potential for improving the recommendation persuasiveness, informativeness and user satisfaction. Despite a lot of…

Information Retrieval · Computer Science 2023-03-02 Xu Chen , Jingsen Zhang , Lei Wang , Quanyu Dai , Zhenhua Dong , Ruiming Tang , Rui Zhang , Li Chen , Ji-Rong Wen

Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning…

Machine Learning · Computer Science 2023-06-28 Dionysis Manousakas , Sergül Aydöre

In many data analysis applications, there is a need to explain why a surprising or interesting result was produced by a query. Previous approaches to explaining results have directly or indirectly used data provenance (input tuples…

Databases · Computer Science 2021-03-30 Chenjie Li , Zhengjie Miao , Qitian Zeng , Boris Glavic , Sudeepa Roy

Recent developments in causal machine learning methods have made it easier to estimate flexible relationships between confounders, treatments and outcomes, making unconfoundedness assumptions in causal analysis more palatable. How…

Econometrics · Economics 2026-05-22 Justin Young , Eleanor Wiske Dillon

This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the…

Machine Learning · Computer Science 2023-08-29 Conor Hassan , Robert Salomone , Kerrie Mengersen

With the growing pervasiveness of artificial intelligence, the ability to explain the inferences made by machine learning models has become increasingly important. Numerous techniques for model explainability have been proposed, with…

Human-Computer Interaction · Computer Science 2026-04-08 Nicola Rossberg , Bennett Kleinberg , Barry O'Sullivan , Luca Longo , Andrea Visentin
‹ Prev 1 2 3 10 Next ›