Related papers: TabDiff: a Mixed-type Diffusion Model for Tabular …

FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation

Realistic synthetic tabular data generation encounters significant challenges in preserving privacy, especially when dealing with sensitive information in domains like finance and healthcare. In this paper, we introduce \textit{Federated…

Machine Learning · Computer Science 2024-01-15 Timur Sattarov , Marco Schreyer , Damian Borth

CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

TabRep: Training Tabular Diffusion Models with a Simple and Effective Continuous Representation

Diffusion models have been the predominant generative model for tabular data generation. However, they face the conundrum of modeling under a separate versus a unified data representation. The former encounters the challenge of jointly…

Machine Learning · Computer Science 2025-12-23 Jacob Si , Zijing Ou , Mike Qu , Zhengrui Xiang , Yingzhen Li

TabDDPM: Modelling Tabular Data with Diffusion Models

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently…

Machine Learning · Computer Science 2024-10-08 Akim Kotelnikov , Dmitry Baranchuk , Ivan Rubachev , Artem Babenko

Federated Diffusion Modeling with Differential Privacy for Tabular Data Synthesis

The increasing demand for privacy-preserving data analytics in various domains necessitates solutions for synthetic data generation that rigorously uphold privacy standards. We introduce the DP-FedTabDiff framework, a novel integration of…

Machine Learning · Computer Science 2025-09-01 Timur Sattarov , Marco Schreyer , Damian Borth

FinDiff: Diffusion Models for Financial Tabular Data Generation

The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both…

Machine Learning · Computer Science 2023-09-06 Timur Sattarov , Marco Schreyer , Damian Borth

Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation

We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to…

Machine Learning · Computer Science 2025-12-02 Timur Sattarov , Marco Schreyer , Damian Borth

Continuous Diffusion for Mixed-Type Tabular Data

Score-based generative models, commonly referred to as diffusion models, have proven to be successful at generating text and image data. However, their adaptation to mixed-type tabular data remains underexplored. In this work, we propose…

Machine Learning · Computer Science 2026-03-27 Markus Mueller , Kathrin Gruber , Dennis Fok

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data,…

Machine Learning · Computer Science 2025-03-05 Zeyu Yang , Han Yu , Peikun Guo , Khadija Zanna , Xiaoxue Yang , Akane Sano

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model…

Machine Learning · Statistics 2023-11-20 Namjoon Suh , Xiaofeng Lin , Din-Yin Hsieh , Merhdad Honarkhah , Guang Cheng

Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features

Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a…

Machine Learning · Computer Science 2026-05-14 Markus Mueller , Kathrin Gruber , Dennis Fok

TabDLM: Free-Form Tabular Data Generation via Joint Numerical-Language Diffusion

Synthetic tabular data generation has attracted growing attention due to its importance for data augmentation, foundation models, and privacy. However, real-world tabular datasets increasingly contain free-form text fields (e.g., reviews or…

Machine Learning · Computer Science 2026-05-13 Donghong Cai , Jiarui Feng , Yanbo Wang , Da Zheng , Yixin Chen , Muhan Zhang

Diffusion Models for Tabular Data Imputation and Synthetic Data Generation

Data imputation and data generation have important applications for many domains, like healthcare and finance, where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful…

Machine Learning · Computer Science 2025-06-10 Mario Villaizán-Vallelado , Matteo Salvatori , Carlos Segura , Ioannis Arapakis

Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation

Diffusion models are increasingly being utilised to create synthetic tabular and time series data for privacy-preserving augmentation. Tabular Denoising Diffusion Probabilistic Models (TabDDPM) generate high-quality synthetic data from…

Machine Learning · Computer Science 2026-04-08 Umang Dobhal , Christina Garcia , Sozo Inoue

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular…

Machine Learning · Computer Science 2024-05-14 Hengrui Zhang , Jiani Zhang , Balasubramaniam Srinivasan , Zhengyuan Shen , Xiao Qin , Christos Faloutsos , Huzefa Rangwala , George Karypis

Tabular Data Generation using Binary Diffusion

Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data…

Machine Learning · Computer Science 2024-10-30 Vitaliy Kinakh , Slava Voloshynovskiy

MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation

Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion…

Artificial Intelligence · Computer Science 2025-08-06 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Diffusion and Flow Matching Models for Tabular Data: A Survey

Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structured records. For tabular data, however, generative modeling remains difficult: a dataset may contain…

Machine Learning · Computer Science 2026-05-25 Zhong Li , Qi Huang , Lincen Yang , Jiayang Shi , Zhao Yang , Niki van Stein , Thomas Bäck , Matthijs van Leeuwen

LaTable: Towards Large Tabular Models

Tabular data is one of the most ubiquitous modalities, yet the literature on tabular generative foundation models is lagging far behind its text and vision counterparts. Creating such a model is hard, due to the heterogeneous feature spaces…

Machine Learning · Computer Science 2024-06-26 Boris van Breugel , Jonathan Crabbé , Rob Davis , Mihaela van der Schaar

Diffusion-Based Data Augmentation for Image Recognition: A Systematic Analysis and Evaluation

Diffusion-based data augmentation (DiffDA) has emerged as a promising approach to improving classification performance under data scarcity. However, existing works vary significantly in task configurations, model choices, and experimental…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Zekun Li , Yinghuan Shi , Yang Gao , Dong Xu