Related papers: A supervised generative optimization approach for …

Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis

This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the…

Machine Learning · Computer Science 2023-08-29 Conor Hassan , Robert Salomone , Kerrie Mengersen

Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques

As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like…

Machine Learning · Computer Science 2025-07-17 Raju Challagundla , Mohsen Dorodchi , Pu Wang , Minwoo Lee

Synthetic Tabular Data: Methods, Attacks and Defenses

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade,…

Machine Learning · Computer Science 2025-06-09 Graham Cormode , Samuel Maddock , Enayat Ullah , Shripad Gade

A Comprehensive Survey of Synthetic Tabular Data Generation

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang

Boosting Data Analytics With Synthetic Volume Expansion

Synthetic data generation, a cornerstone of Generative Artificial Intelligence, promotes a paradigm shift in data science by addressing data scarcity and privacy while enabling unprecedented performance. As synthetic data becomes more…

Machine Learning · Statistics 2024-03-12 Xiaotong Shen , Yifei Liu , Rex Shen

Structured Evaluation of Synthetic Tabular Data

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data;…

Machine Learning · Computer Science 2024-04-01 Scott Cheng-Hsin Yang , Baxter Eaves , Michael Schmidt , Ken Swanson , Patrick Shafto

A Framework for Auditable Synthetic Data Generation

Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that enable the synthesis of high-dimensional datasets. However, many generation techniques do not give the data controller control over what…

Cryptography and Security · Computer Science 2022-11-22 Florimond Houssiau , Samuel N. Cohen , Lukasz Szpruch , Owen Daniel , Michaela G. Lawrence , Robin Mitra , Henry Wilde , Callum Mole

GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to…

Machine Learning · Computer Science 2022-12-13 Angeela Acharya , Siddhartha Sikdar , Sanmay Das , Huzefa Rangwala

AutoSimulate: (Quickly) Learning Synthetic Data Generation

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually…

Computer Vision and Pattern Recognition · Computer Science 2020-08-20 Harkirat Singh Behl , Atılım Güneş Baydin , Ran Gal , Philip H. S. Torr , Vibhav Vineet

A Survey on Deep Learning Approaches for Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, Diversity, and Beyond

Generative modelling has become the standard approach for synthesising tabular data. However, different use cases demand synthetic data to comply with different requirements to be useful in practice. In this survey, we review deep…

Machine Learning · Computer Science 2026-03-17 Mihaela Cătălina Stoian , Eleonora Giunchiglia , Thomas Lukasiewicz

New Money: A Systematic Review of Synthetic Data Generation for Finance

Synthetic data generation has emerged as a promising approach to address the challenges of using sensitive financial data in machine learning applications. By leveraging generative models, such as Generative Adversarial Networks (GANs) and…

Machine Learning · Computer Science 2025-10-31 James Meldrum , Basem Suleiman , Fethi Rabhi , Muhammad Johan Alibasa

Boosting Statistic Learning with Synthetic Data from Pretrained Large Models

The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully…

Machine Learning · Statistics 2025-05-09 Jialong Jiang , Wenkang Hu , Jian Huang , Yuling Jiao , Xu Liu

Synthetic data generation for system identification: leveraging knowledge transfer from similar systems

This paper addresses the challenge of overfitting in the learning of dynamical systems by introducing a novel approach for the generation of synthetic data, aimed at enhancing model generalization and robustness in scenarios characterized…

Machine Learning · Computer Science 2024-03-11 Dario Piga , Matteo Rufolo , Gabriele Maroni , Manas Mejari , Marco Forgione

Generating High-quality Privacy-preserving Synthetic Data

Synthetic tabular data enables sharing and analysis of sensitive records, but its practical deployment requires balancing distributional fidelity, downstream utility, and privacy protection. We study a simple, model agnostic post processing…

Machine Learning · Computer Science 2026-02-09 David Yavo , Richard Khoury , Christophe Pere , Sadoune Ait Kaci Azzou

An evaluation framework for synthetic data generation models

Nowadays, the use of synthetic data has gained popularity as a cost-efficient strategy for enhancing data augmentation for improving machine learning models performance as well as addressing concerns related to sensitive data privacy.…

Machine Learning · Computer Science 2025-10-27 Ioannis E. Livieris , Nikos Alimpertis , George Domalis , Dimitris Tsakalidis

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to…

Machine Learning · Computer Science 2019-03-26 Maria Perez-Ortiz , Peter Tino , Rafal Mantiuk , Cesar Hervas-Martinez

Synthetic data, real errors: how (not) to publish and use synthetic data

Generating synthetic data through generative models is gaining interest in the ML community and beyond, promising a future where datasets can be tailored to individual needs. Unfortunately, synthetic data is usually not perfect, resulting…

Machine Learning · Computer Science 2023-07-11 Boris van Breugel , Zhaozhi Qian , Mihaela van der Schaar

Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark

Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging…

Machine Learning · Computer Science 2023-10-27 Lasse Hansen , Nabeel Seedat , Mihaela van der Schaar , Andrija Petrovic

On the Usefulness of Synthetic Tabular Data Generation

Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning…

Machine Learning · Computer Science 2023-06-28 Dionysis Manousakas , Sergül Aydöre

Machine Learning for Synthetic Data Generation: A Review

Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and…

Machine Learning · Computer Science 2025-04-07 Yingzhou Lu , Lulu Chen , Yuanyuan Zhang , Minjie Shen , Huazheng Wang , Xiao Wang , Capucine van Rechem , Tianfan Fu , Wenqi Wei