English
Related papers

Related papers: Diffusion Models for Tabular Data Imputation and S…

200 papers

Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information. In this task, several deep generative modeling methods have been proposed and demonstrated…

Machine Learning · Computer Science 2023-03-14 Shuhan Zheng , Nontawat Charoenphakdee

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model…

Machine Learning · Statistics 2023-11-20 Namjoon Suh , Xiaofeng Lin , Din-Yin Hsieh , Merhdad Honarkhah , Guang Cheng

Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structured records. For tabular data, however, generative modeling remains difficult: a dataset may contain…

Machine Learning · Computer Science 2026-05-25 Zhong Li , Qi Huang , Lincen Yang , Jiayang Shi , Zhao Yang , Niki van Stein , Thomas Bäck , Matthijs van Leeuwen

Imputation methods play a critical role in enhancing the quality of practical time-series data, which often suffer from pervasive missing values. Recently, diffusion-based generative imputation methods have demonstrated remarkable success…

Machine Learning · Computer Science 2025-10-03 Zeqi Ye , Minshuo Chen

Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion…

Artificial Intelligence · Computer Science 2025-08-06 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its…

Machine Learning · Computer Science 2025-02-18 Juntong Shi , Minkai Xu , Harper Hua , Hengrui Zhang , Stefano Ermon , Jure Leskovec

Diffusion models have emerged as powerful generative frameworks by progressively adding noise to data through a forward process and then reversing this process to generate realistic samples. While these models have achieved strong…

Machine Learning · Computer Science 2025-03-04 Xingzhuo Guo , Yu Zhang , Baixu Chen , Haoran Xu , Jianmin Wang , Mingsheng Long

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data,…

Machine Learning · Computer Science 2025-03-05 Zeyu Yang , Han Yu , Peikun Guo , Khadija Zanna , Xiaoxue Yang , Akane Sano

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data…

Machine Learning · Computer Science 2024-10-30 Vitaliy Kinakh , Slava Voloshynovskiy

Generative models play an important role in missing data imputation in that they aim to learn the joint distribution of full data. However, applying advanced deep generative models (such as Diffusion models) to missing data imputation is…

Machine Learning · Computer Science 2025-05-27 Hengrui Zhang , Liancheng Fang , Qitian Wu , Philip S. Yu

With the development of Artificial Intelligence, numerous real-world tasks have been accomplished using technology integrated with deep learning. To achieve optimal performance, deep neural networks typically require large volumes of data…

Machine Learning · Computer Science 2025-05-09 Yuren Zhang , Zhongnan Pu , Lei Jing

Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible…

Machine Learning · Computer Science 2024-04-12 Minshuo Chen , Song Mei , Jianqing Fan , Mengdi Wang

The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both…

Machine Learning · Computer Science 2023-09-06 Timur Sattarov , Marco Schreyer , Damian Borth

Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With a distinguished performance in generating samples that resemble the observed data,…

Machine Learning · Computer Science 2023-05-02 Lequan Lin , Zhengkun Li , Ruikun Li , Xuliang Li , Junbin Gao

Tabular data generation has recently attracted a growing interest due to its different application scenarios. However, generating time series of tabular data, where each element of the series depends on the others, remains a largely…

Machine Learning · Computer Science 2025-04-21 Fabrizio Garuti , Enver Sangineto , Simone Luetto , Lorenzo Forni , Rita Cucchiara

The ability to train generative models that produce realistic, safe and useful tabular data is essential for data privacy, imputation, oversampling, explainability or simulation. However, generating tabular data is not straightforward due…

Machine Learning · Computer Science 2025-09-18 G. Charbel N. Kindji , Lina Maria Rojas-Barahona , Elisa Fromont , Tanguy Urvoy

The ubiquity of missing data has sparked considerable attention and focus on tabular data imputation methods. Diffusion models, recognized as the cutting-edge technique for data generation, demonstrate significant potential in tabular data…

Machine Learning · Computer Science 2024-07-26 Yixin Liu , Thalaiyasingam Ajanthan , Hisham Husain , Vu Nguyen

Autoregressive models are predominant in natural language generation, while their application in tabular data remains underexplored. We posit that this can be attributed to two factors: 1) tabular data contains heterogeneous data type,…

Machine Learning · Computer Science 2024-10-30 Hengrui Zhang , Liancheng Fang , Qitian Wu , Philip S. Yu

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang
‹ Prev 1 2 3 10 Next ›