English
Related papers

Related papers: Continuous Diffusion for Mixed-Type Tabular Data

200 papers

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its…

Machine Learning · Computer Science 2025-02-18 Juntong Shi , Minkai Xu , Harper Hua , Hengrui Zhang , Stefano Ermon , Jure Leskovec

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are…

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data,…

Machine Learning · Computer Science 2025-03-05 Zeyu Yang , Han Yu , Peikun Guo , Khadija Zanna , Xiaoxue Yang , Akane Sano

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model…

Machine Learning · Statistics 2023-11-20 Namjoon Suh , Xiaofeng Lin , Din-Yin Hsieh , Merhdad Honarkhah , Guang Cheng

Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a…

Machine Learning · Computer Science 2026-05-14 Markus Mueller , Kathrin Gruber , Dennis Fok

Incomplete data are common in real-world tabular applications, where numerical, categorical, and discrete attributes coexist within a single dataset. This heterogeneous structure presents significant challenges for existing diffusion-based…

Machine Learning · Computer Science 2025-11-19 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information. In this task, several deep generative modeling methods have been proposed and demonstrated…

Machine Learning · Computer Science 2023-03-14 Shuhan Zheng , Nontawat Charoenphakdee

Discrete diffusion models are a class of generative models that construct sequences by progressively denoising samples from a categorical noise distribution. Beyond their rapidly growing ability to generate coherent natural language, these…

Computation and Language · Computer Science 2025-12-11 Michael Cardei , Jacob K Christopher , Thomas Hartvigsen , Bhavya Kailkhura , Ferdinando Fioretto

Diffusion-based tabular data synthesis models have yielded promising results. However, when the data dimensionality increases, existing models tend to degenerate and may perform even worse than simpler, non-diffusion-based models. This is…

Machine Learning · Computer Science 2025-11-12 Zuqing Li , Junhao Gan , Jianzhong Qi

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using…

Computation and Language · Computer Science 2025-05-29 Bocheng Li , Zhujin Gao , Linli Xu

With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data…

Machine Learning · Computer Science 2023-09-22 Chaejeong Lee , Jayoung Kim , Noseong Park

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently…

Machine Learning · Computer Science 2024-10-08 Akim Kotelnikov , Dmitry Baranchuk , Ivan Rubachev , Artem Babenko

We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to…

Machine Learning · Computer Science 2025-12-02 Timur Sattarov , Marco Schreyer , Damian Borth

Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structured records. For tabular data, however, generative modeling remains difficult: a dataset may contain…

Machine Learning · Computer Science 2026-05-25 Zhong Li , Qi Huang , Lincen Yang , Jiayang Shi , Zhao Yang , Niki van Stein , Thomas Bäck , Matthijs van Leeuwen

Data imputation and data generation have important applications for many domains, like healthcare and finance, where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful…

Machine Learning · Computer Science 2025-06-10 Mario Villaizán-Vallelado , Matteo Salvatori , Carlos Segura , Ioannis Arapakis

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Xuehai He , Weixi Feng , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , William Yang Wang , Xin Eric Wang

Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models. These models are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Zeyu Yang , Zijie Pan , Chun Gu , Li Zhang

Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented…

Computer Vision and Pattern Recognition · Computer Science 2023-07-12 Subin Kim , Kyungmin Lee , June Suk Choi , Jongheon Jeong , Kihyuk Sohn , Jinwoo Shin

Synthetic tabular data generation has attracted growing attention due to its importance for data augmentation, foundation models, and privacy. However, real-world tabular datasets increasingly contain free-form text fields (e.g., reviews or…

Machine Learning · Computer Science 2026-05-13 Donghong Cai , Jiarui Feng , Yanbo Wang , Da Zheng , Yixin Chen , Muhan Zhang
‹ Prev 1 2 3 10 Next ›