Related papers: Generative Modeling of Complex Data

Structured Evaluation of Synthetic Tabular Data

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data;…

Machine Learning · Computer Science 2024-04-01 Scott Cheng-Hsin Yang , Baxter Eaves , Michael Schmidt , Ken Swanson , Patrick Shafto

Boosting Synthetic Data Generation with Effective Nonlinear Causal Discovery

Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, and artificial intelligence explanation. In all such contexts, it is crucial to generate plausible data samples. A common assumption…

Artificial Intelligence · Computer Science 2024-10-16 Martina Cinquini , Fosca Giannotti , Riccardo Guidotti

Generative Modeling of Networked Time-Series via Transformer Architectures

Many security and network applications require having large datasets to train the machine learning models. Limited data access is a well-known problem in the security domain. Recent studies have shown the potential of Transformer models to…

Machine Learning · Computer Science 2025-06-10 Yusuf Elnady

Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis

This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the…

Machine Learning · Computer Science 2023-08-29 Conor Hassan , Robert Salomone , Kerrie Mengersen

Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark

Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging…

Machine Learning · Computer Science 2023-10-27 Lasse Hansen , Nabeel Seedat , Mihaela van der Schaar , Andrija Petrovic

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers

Tabular data is a common form of organizing data. Multiple models are available to generate synthetic tabular datasets where observations are independent, but few have the ability to produce relational datasets. Modeling relational data is…

Machine Learning · Computer Science 2023-02-07 Aivin V. Solatorio , Olivier Dupriez

Generating Synthetic Relational Tabular Data via Structural Causal Models

Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast…

Machine Learning · Computer Science 2025-07-08 Frederik Hoppe , Astrid Franz , Lars Kleinemeier , Udo Göbel

Hybrid Generative Models for Two-Dimensional Datasets

Two-dimensional array-based datasets are pervasive in a variety of domains. Current approaches for generative modeling have typically been limited to conventional image datasets and performed in the pixel domain which do not explicitly…

Machine Learning · Computer Science 2021-07-12 Hoda Shajari , Jaemoon Lee , Sanjay Ranka , Anand Rangarajan

Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Synthetic datasets are important for evaluating and testing machine learning models. When evaluating real-life recommender systems, high-dimensional categorical (and sparse) datasets are often considered. Unfortunately, there are not many…

Information Retrieval · Computer Science 2024-12-11 Miha Malenšek , Blaž Škrlj , Blaž Mramor , Jure Demšar

Disjoint Generative Models

We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The…

Machine Learning · Computer Science 2025-07-29 Anton Danholt Lautrup , Muhammad Rajabinasab , Tobias Hyrup , Arthur Zimek , Peter Schneider-Kamp

How Realistic Is Your Synthetic Data? Constraining Deep Generative Models for Tabular Data

Deep Generative Models (DGMs) have been shown to be powerful tools for generating tabular data, as they have been increasingly able to capture the complex distributions that characterize them. However, to generate realistic synthetic data,…

Machine Learning · Computer Science 2024-02-08 Mihaela Cătălina Stoian , Salijona Dyrmishi , Maxime Cordy , Thomas Lukasiewicz , Eleonora Giunchiglia

Composable Generative Models

Generative modeling has recently seen many exciting developments with the advent of deep generative architectures such as Variational Auto-Encoders (VAE) or Generative Adversarial Networks (GAN). The ability to draw synthetic i.i.d.…

Machine Learning · Computer Science 2021-02-19 Johan Leduc , Nicolas Grislain

Synthetic Tabular Data: Methods, Attacks and Defenses

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade,…

Machine Learning · Computer Science 2025-06-09 Graham Cormode , Samuel Maddock , Enayat Ullah , Shripad Gade

Boosting Statistic Learning with Synthetic Data from Pretrained Large Models

The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully…

Machine Learning · Statistics 2025-05-09 Jialong Jiang , Wenkang Hu , Jian Huang , Yuling Jiao , Xu Liu

Compositional Generative Modeling: A Single Model is Not All You Need

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller…

Machine Learning · Computer Science 2024-06-05 Yilun Du , Leslie Kaelbling

Comparing Synthetic Tabular Data Generation Between a Probabilistic Model and a Deep Learning Model for Education Use Cases

The ability to generate synthetic data has a variety of use cases across different domains. In education research, there is a growing need to have access to synthetic data to test certain concepts and ideas. In recent years, several deep…

Machine Learning · Computer Science 2022-10-18 Herkulaas MvE Combrink , Vukosi Marivate , Benjamin Rosman

A Survey on Deep Learning Approaches for Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, Diversity, and Beyond

Generative modelling has become the standard approach for synthesising tabular data. However, different use cases demand synthetic data to comply with different requirements to be useful in practice. In this survey, we review deep…

Machine Learning · Computer Science 2026-03-17 Mihaela Cătălina Stoian , Eleonora Giunchiglia , Thomas Lukasiewicz

A supervised generative optimization approach for tabular data

Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching…

Machine Learning · Computer Science 2024-05-13 Shinpei Nakamura-Sakai , Fadi Hamad , Saheed Obitayo , Vamsi K. Potluru

Causal-TGAN: Generating Tabular Data Using Causal Generative Adversarial Networks

Synthetic data generation becomes prevalent as a solution to privacy leakage and data shortage. Generative models are designed to generate a realistic synthetic dataset, which can precisely express the data distribution for the real…

Machine Learning · Computer Science 2021-04-22 Bingyang Wen , Luis Oliveros Colon , K. P. Subbalakshmi , R. Chandramouli

Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation

Ensuring the generalisability of clinical machine learning (ML) models across diverse healthcare settings remains a significant challenge due to variability in patient demographics, disease prevalence, and institutional practices. Existing…

Machine Learning · Computer Science 2025-04-30 Bradley Segal , Joshua Fieggen , David Clifton , Lei Clifton