English
Related papers

Related papers: DiffImpute: Tabular Data Imputation With Denoising…

200 papers

Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion…

Artificial Intelligence · Computer Science 2025-08-06 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Generative models play an important role in missing data imputation in that they aim to learn the joint distribution of full data. However, applying advanced deep generative models (such as Diffusion models) to missing data imputation is…

Machine Learning · Computer Science 2025-05-27 Hengrui Zhang , Liancheng Fang , Qitian Wu , Philip S. Yu

Incomplete data are common in real-world tabular applications, where numerical, categorical, and discrete attributes coexist within a single dataset. This heterogeneous structure presents significant challenges for existing diffusion-based…

Machine Learning · Computer Science 2025-11-19 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Missing data is a widespread problem in tabular settings. Existing solutions range from simple averaging to complex generative adversarial networks, but due to each method's large variance in performance across real-world domains and…

Machine Learning · Computer Science 2026-02-18 Jacob Feitelberg , Dwaipayan Saha , Kyuseong Choi , Zaid Ahmad , Anish Agarwal , Raaz Dwivedi

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently…

Machine Learning · Computer Science 2024-10-08 Akim Kotelnikov , Dmitry Baranchuk , Ivan Rubachev , Artem Babenko

The ubiquity of missing data has sparked considerable attention and focus on tabular data imputation methods. Diffusion models, recognized as the cutting-edge technique for data generation, demonstrate significant potential in tabular data…

Machine Learning · Computer Science 2024-07-26 Yixin Liu , Thalaiyasingam Ajanthan , Hisham Husain , Vu Nguyen

Realistic synthetic tabular data generation encounters significant challenges in preserving privacy, especially when dealing with sensitive information in domains like finance and healthcare. In this paper, we introduce \textit{Federated…

Machine Learning · Computer Science 2024-01-15 Timur Sattarov , Marco Schreyer , Damian Borth

Diffusion models are increasingly being utilised to create synthetic tabular and time series data for privacy-preserving augmentation. Tabular Denoising Diffusion Probabilistic Models (TabDDPM) generate high-quality synthetic data from…

Machine Learning · Computer Science 2026-04-08 Umang Dobhal , Christina Garcia , Sozo Inoue

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its…

Machine Learning · Computer Science 2025-02-18 Juntong Shi , Minkai Xu , Harper Hua , Hengrui Zhang , Stefano Ermon , Jure Leskovec

Data imputation and data generation have important applications for many domains, like healthcare and finance, where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful…

Machine Learning · Computer Science 2025-06-10 Mario Villaizán-Vallelado , Matteo Salvatori , Carlos Segura , Ioannis Arapakis

Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information. In this task, several deep generative modeling methods have been proposed and demonstrated…

Machine Learning · Computer Science 2023-03-14 Shuhan Zheng , Nontawat Charoenphakdee

Handling missing values in tabular datasets presents a significant challenge in training and testing artificial intelligence models, an issue usually addressed using imputation techniques. Here we introduce "Not Another Imputation Method"…

Machine Learning · Computer Science 2026-03-13 Camillo Maria Caruso , Paolo Soda , Valerio Guarrasi

Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the…

Class imbalance refers to a situation where certain classes in a dataset have significantly fewer samples than oth- ers, leading to biased model performance. Class imbalance in network intrusion detection using Tabular Denoising Diffusion…

Cryptography and Security · Computer Science 2026-02-02 Aravind B , Anirud R. S. , Sai Surya Teja N , Bala Subrahmanya Sriranga Navaneeth A , Karthika R , Mohankumar N

Missing values of varying patterns and rates in real-world tabular data pose a significant challenge in developing reliable data-driven models. The most commonly used statistical and machine learning methods for missing value imputation may…

Machine Learning · Computer Science 2025-03-26 Ibna Kowsar , Shourav B. Rabbani , Yina Hou , Manar D. Samad

Due to the high complexity and technical requirements of industrial production processes, surface defects will inevitably appear, which seriously affects the quality of products. Although existing lightweight detection networks are highly…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Xuyi Yu

Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative…

Machine Learning · Computer Science 2024-06-25 Zhichao Chen , Haoxuan Li , Fangyikang Wang , Odin Zhang , Hu Xu , Xiaoyu Jiang , Zhihuan Song , Eric H. Wang

Missing data frequently occurs in datasets across various domains, such as medicine, sports, and finance. In many cases, to enable proper and reliable analyses of such data, the missing values are often imputed, and it is necessary that the…

Tabular data are central to many applications, especially longitudinal data in healthcare, where missing values are common, undermining model fidelity and reliability. Prior imputation methods either impose restrictive assumptions or…

Machine Learning · Computer Science 2025-09-30 Dengyi Liu , Honggang Wang , Hua Fang

Denoising Diffusion Probabilistic Models (DDPMs) have significantly advanced generative AI, achieving impressive results in high-quality image and data generation. However, enhancing fidelity without compromising semantic content remains a…

‹ Prev 1 2 3 10 Next ›