English
Related papers

Related papers: TabDiff: a Mixed-type Diffusion Model for Tabular …

200 papers

Realistic synthetic tabular data generation encounters significant challenges in preserving privacy, especially when dealing with sensitive information in domains like finance and healthcare. In this paper, we introduce \textit{Federated…

Machine Learning · Computer Science 2024-01-15 Timur Sattarov , Marco Schreyer , Damian Borth

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Diffusion models have been the predominant generative model for tabular data generation. However, they face the conundrum of modeling under a separate versus a unified data representation. The former encounters the challenge of jointly…

Machine Learning · Computer Science 2025-12-23 Jacob Si , Zijing Ou , Mike Qu , Zhengrui Xiang , Yingzhen Li

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently…

Machine Learning · Computer Science 2024-10-08 Akim Kotelnikov , Dmitry Baranchuk , Ivan Rubachev , Artem Babenko

The increasing demand for privacy-preserving data analytics in various domains necessitates solutions for synthetic data generation that rigorously uphold privacy standards. We introduce the DP-FedTabDiff framework, a novel integration of…

Machine Learning · Computer Science 2025-09-01 Timur Sattarov , Marco Schreyer , Damian Borth

The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both…

Machine Learning · Computer Science 2023-09-06 Timur Sattarov , Marco Schreyer , Damian Borth

We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to…

Machine Learning · Computer Science 2025-12-02 Timur Sattarov , Marco Schreyer , Damian Borth

Score-based generative models, commonly referred to as diffusion models, have proven to be successful at generating text and image data. However, their adaptation to mixed-type tabular data remains underexplored. In this work, we propose…

Machine Learning · Computer Science 2026-03-27 Markus Mueller , Kathrin Gruber , Dennis Fok

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data,…

Machine Learning · Computer Science 2025-03-05 Zeyu Yang , Han Yu , Peikun Guo , Khadija Zanna , Xiaoxue Yang , Akane Sano

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model…

Machine Learning · Statistics 2023-11-20 Namjoon Suh , Xiaofeng Lin , Din-Yin Hsieh , Merhdad Honarkhah , Guang Cheng

Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a…

Machine Learning · Computer Science 2026-05-14 Markus Mueller , Kathrin Gruber , Dennis Fok

Synthetic tabular data generation has attracted growing attention due to its importance for data augmentation, foundation models, and privacy. However, real-world tabular datasets increasingly contain free-form text fields (e.g., reviews or…

Machine Learning · Computer Science 2026-05-13 Donghong Cai , Jiarui Feng , Yanbo Wang , Da Zheng , Yixin Chen , Muhan Zhang

Data imputation and data generation have important applications for many domains, like healthcare and finance, where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful…

Machine Learning · Computer Science 2025-06-10 Mario Villaizán-Vallelado , Matteo Salvatori , Carlos Segura , Ioannis Arapakis

Diffusion models are increasingly being utilised to create synthetic tabular and time series data for privacy-preserving augmentation. Tabular Denoising Diffusion Probabilistic Models (TabDDPM) generate high-quality synthetic data from…

Machine Learning · Computer Science 2026-04-08 Umang Dobhal , Christina Garcia , Sozo Inoue

Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular…

Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data…

Machine Learning · Computer Science 2024-10-30 Vitaliy Kinakh , Slava Voloshynovskiy

Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion…

Artificial Intelligence · Computer Science 2025-08-06 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structured records. For tabular data, however, generative modeling remains difficult: a dataset may contain…

Machine Learning · Computer Science 2026-05-25 Zhong Li , Qi Huang , Lincen Yang , Jiayang Shi , Zhao Yang , Niki van Stein , Thomas Bäck , Matthijs van Leeuwen

Tabular data is one of the most ubiquitous modalities, yet the literature on tabular generative foundation models is lagging far behind its text and vision counterparts. Creating such a model is hard, due to the heterogeneous feature spaces…

Machine Learning · Computer Science 2024-06-26 Boris van Breugel , Jonathan Crabbé , Rob Davis , Mihaela van der Schaar

Diffusion-based data augmentation (DiffDA) has emerged as a promising approach to improving classification performance under data scarcity. However, existing works vary significantly in task configurations, model choices, and experimental…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Zekun Li , Yinghuan Shi , Yang Gao , Dong Xu
‹ Prev 1 2 3 10 Next ›