Related papers: MissDiff: Training Diffusion Models on Tabular Dat…

Diffusion models for missing value imputation in tabular data

Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information. In this task, several deep generative modeling methods have been proposed and demonstrated…

Machine Learning · Computer Science 2023-03-14 Shuhan Zheng , Nontawat Charoenphakdee

Incomplete Data, Complete Dynamics: A Diffusion Approach

Learning physical dynamics from data is a fundamental challenge in machine learning and scientific modeling. Real-world observational data are inherently incomplete and irregularly sampled, posing significant challenges for existing…

Machine Learning · Computer Science 2026-05-04 Zihan Zhou , Chenguang Wang , Hongyi Ye , Yongtao Guan , Tianshu Yu

Diffusion and Flow Matching Models for Tabular Data: A Survey

Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structured records. For tabular data, however, generative modeling remains difficult: a dataset may contain…

Machine Learning · Computer Science 2026-05-25 Zhong Li , Qi Huang , Lincen Yang , Jiayang Shi , Zhao Yang , Niki van Stein , Thomas Bäck , Matthijs van Leeuwen

Diffusion Models for Tabular Data Imputation and Synthetic Data Generation

Data imputation and data generation have important applications for many domains, like healthcare and finance, where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful…

Machine Learning · Computer Science 2025-06-10 Mario Villaizán-Vallelado , Matteo Salvatori , Carlos Segura , Ioannis Arapakis

Ambient Diffusion: Learning Clean Distributions from Corrupted Data

We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. This problem arises in scientific applications where access to uncorrupted samples is impossible or expensive to…

Machine Learning · Computer Science 2023-05-31 Giannis Daras , Kulin Shah , Yuval Dagan , Aravind Gollakota , Alexandros G. Dimakis , Adam Klivans

Bootstrapping Diffusion: Diffusion Model Training Leveraging Partial and Corrupted Data

Training diffusion models requires large datasets. However, acquiring large volumes of high-quality data can be challenging, for example, collecting large numbers of high-resolution images and long videos. On the other hand, there are many…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Xudong Ma

MissHDD: Hybrid Deterministic Diffusion for Hetrogeneous Incomplete Data Imputation

Incomplete data are common in real-world tabular applications, where numerical, categorical, and discrete attributes coexist within a single dataset. This heterogeneous structure presents significant challenges for existing diffusion-based…

Machine Learning · Computer Science 2025-11-19 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Learning Data Representations with Joint Diffusion Models

Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical observations that indicate the…

Machine Learning · Computer Science 2023-04-06 Kamil Deja , Tomasz Trzcinski , Jakub M. Tomczak

MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation

Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion…

Artificial Intelligence · Computer Science 2025-08-06 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its…

Machine Learning · Computer Science 2025-02-18 Juntong Shi , Minkai Xu , Harper Hua , Hengrui Zhang , Stefano Ermon , Jure Leskovec

Latent Diffusion for Missing Data

Diffusion models have emerged as powerful generative approaches for missing-data imputation, yet most existing methods operate directly in data space and degrade when training data are heavily incomplete. We investigate whether shifting…

Machine Learning · Computer Science 2026-05-28 Alberte Heering Estad , Ignacio Peis , Jes Frellsen

Diffusion Estimation Over Cooperative Multi-Agent Networks With Missing Data

In many fields, and especially in the medical and social sciences and in recommender systems, data are gathered through clinical studies or targeted surveys. Participants are generally reluctant to respond to all questions in a survey or…

Statistics Theory · Mathematics 2016-11-15 Mohammad Reza Gholami , Magnus Jansson , Erik G. Ström , Ali H. Sayed

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data,…

Machine Learning · Computer Science 2025-03-05 Zeyu Yang , Han Yu , Peikun Guo , Khadija Zanna , Xiaoxue Yang , Akane Sano

GSURE-Based Diffusion Model Training with Corrupted Data

Diffusion models have demonstrated impressive results in both data generation and downstream tasks such as inverse problems, text-based editing, classification, and more. However, training such models usually requires large amounts of clean…

Image and Video Processing · Electrical Eng. & Systems 2024-06-17 Bahjat Kawar , Noam Elata , Tomer Michaeli , Michael Elad

InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models

As one of the most successful generative models, diffusion models have demonstrated remarkable efficacy in synthesizing high-quality images. These models learn the underlying high-dimensional data distribution in an unsupervised manner.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Min Hou , Yueying Wu , Chang Xu , Yu-Hao Huang , Chenxi Bai , Le Wu , Jiang Bian

Bridging the Gap: Addressing Discrepancies in Diffusion Model Training for Classifier-Free Guidance

Diffusion models have emerged as a pivotal advancement in generative models, setting new standards to the quality of the generated instances. In the current paper we aim to underscore a discrepancy between conventional training methods and…

Machine Learning · Computer Science 2023-11-03 Niket Patel , Luis Salamanca , Luis Barba

Diffusion Transformers for Imputation: Statistical Efficiency and Uncertainty Quantification

Imputation methods play a critical role in enhancing the quality of practical time-series data, which often suffer from pervasive missing values. Recently, diffusion-based generative imputation methods have demonstrated remarkable success…

Machine Learning · Computer Science 2025-10-03 Zeqi Ye , Minshuo Chen

DiffPuter: Empowering Diffusion Models for Missing Data Imputation

Generative models play an important role in missing data imputation in that they aim to learn the joint distribution of full data. However, applying advanced deep generative models (such as Diffusion models) to missing data imputation is…

Machine Learning · Computer Science 2025-05-27 Hengrui Zhang , Liancheng Fang , Qitian Wu , Philip S. Yu

Diffusion Model with Perceptual Loss

Diffusion models without guidance generate very unrealistic samples. Guidance is used ubiquitously, and previous research has attributed its effect to low-temperature sampling that improves quality by trading off diversity. However, this…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Shanchuan Lin , Xiao Yang

Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value

Diffusion models have achieved remarkable success in generative modeling. Despite more stable training, the loss of diffusion models is not indicative of absolute data-fitting quality, since its optimal value is typically not zero but…

Machine Learning · Computer Science 2026-04-17 Yixian Xu , Shengjie Luo , Liwei Wang , Di He , Chang Liu