Related papers: DiSK: A Diffusion Model for Structured Knowledge

A Comprehensive Survey on Generative Diffusion Models for Structured Data

In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series…

Machine Learning · Computer Science 2023-07-11 Heejoon Koo , To Eun Kim

Diffusion and Flow Matching Models for Tabular Data: A Survey

Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structured records. For tabular data, however, generative modeling remains difficult: a dataset may contain…

Machine Learning · Computer Science 2026-05-25 Zhong Li , Qi Huang , Lincen Yang , Jiayang Shi , Zhao Yang , Niki van Stein , Thomas Bäck , Matthijs van Leeuwen

Diffusion Models for Tabular Data Imputation and Synthetic Data Generation

Data imputation and data generation have important applications for many domains, like healthcare and finance, where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful…

Machine Learning · Computer Science 2025-06-10 Mario Villaizán-Vallelado , Matteo Salvatori , Carlos Segura , Ioannis Arapakis

CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data,…

Machine Learning · Computer Science 2025-03-05 Zeyu Yang , Han Yu , Peikun Guo , Khadija Zanna , Xiaoxue Yang , Akane Sano

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Jinglong Wang , Xiawei Li , Jing Zhang , Qingyuan Xu , Qin Zhou , Qian Yu , Lu Sheng , Dong Xu

TabDLM: Free-Form Tabular Data Generation via Joint Numerical-Language Diffusion

Synthetic tabular data generation has attracted growing attention due to its importance for data augmentation, foundation models, and privacy. However, real-world tabular datasets increasingly contain free-form text fields (e.g., reviews or…

Machine Learning · Computer Science 2026-05-13 Donghong Cai , Jiarui Feng , Yanbo Wang , Da Zheng , Yixin Chen , Muhan Zhang

Diffusion models for missing value imputation in tabular data

Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information. In this task, several deep generative modeling methods have been proposed and demonstrated…

Machine Learning · Computer Science 2023-03-14 Shuhan Zheng , Nontawat Charoenphakdee

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its…

Machine Learning · Computer Science 2025-02-18 Juntong Shi , Minkai Xu , Harper Hua , Hengrui Zhang , Stefano Ermon , Jure Leskovec

Scalable Syntax-Aware Language Models Using Knowledge Distillation

Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders…

Computation and Language · Computer Science 2019-06-18 Adhiguna Kuncoro , Chris Dyer , Laura Rimell , Stephen Clark , Phil Blunsom

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model…

Machine Learning · Statistics 2023-11-20 Namjoon Suh , Xiaofeng Lin , Din-Yin Hsieh , Merhdad Honarkhah , Guang Cheng

Syntax-Guided Diffusion Language Models with User-Integrated Personalization

Large language models have made revolutionary progress in generating human-like text, yet their outputs often tend to be generic, exhibiting insufficient structural diversity, which limits personalized expression. Recent advances in…

Computation and Language · Computer Science 2025-10-02 Ruqian Zhang , Yijiao Zhang , Juan Shen , Zhongyi Zhu , Annie Qu

Diffusion Transformers for Tabular Data Time Series Generation

Tabular data generation has recently attracted a growing interest due to its different application scenarios. However, generating time series of tabular data, where each element of the series depends on the others, remains a largely…

Machine Learning · Computer Science 2025-04-21 Fabrizio Garuti , Enver Sangineto , Simone Luetto , Lorenzo Forni , Rita Cucchiara

Analysis Dictionary Learning based Classification: Structure for Robustness

A discriminative structured analysis dictionary is proposed for the classification task. A structure of the union of subspaces (UoS) is integrated into the conventional analysis dictionary learning to enhance the capability of…

Computer Vision and Pattern Recognition · Computer Science 2019-09-17 Wen Tang , Ashkan Panahi , Hamid Krim , Liyi Dai

One Category One Prompt: Dataset Distillation using Diffusion Models

The extensive amounts of data required for training deep neural networks pose significant challenges on storage and transmission fronts. Dataset distillation has emerged as a promising technique to condense the information of massive…

Computer Vision and Pattern Recognition · Computer Science 2024-03-13 Ali Abbasi , Ashkan Shahbazi , Hamed Pirsiavash , Soheil Kolouri

Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images

Despite continued advancement in recent years, deep neural networks still rely on large amounts of training data to avoid overfitting. However, labeled training data for real-world applications such as healthcare is limited and difficult to…

Machine Learning · Computer Science 2023-01-13 Mohamed Akrout , Bálint Gyepesi , Péter Holló , Adrienn Poór , Blága Kincső , Stephen Solis , Katrina Cirone , Jeremy Kawahara , Dekker Slade , Latif Abid , Máté Kovács , István Fazekas

A Comprehensive Survey on Knowledge Distillation of Diffusion Models

Diffusion Models (DMs), also referred to as score-based diffusion models, utilize neural networks to specify score functions. Unlike most other probabilistic models, DMs directly model the score functions, which makes them more flexible to…

Machine Learning · Computer Science 2023-04-11 Weijian Luo

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

Domain adaptive text classification is a challenging problem for the large-scale pretrained language models because they often require expensive additional labeled data to adapt to new domains. Existing works usually fails to leverage the…

Computation and Language · Computer Science 2022-06-22 Tian Li , Xiang Chen , Zhen Dong , Weijiang Yu , Yijun Yan , Kurt Keutzer , Shanghang Zhang

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Recently, diffusion models have emerged as a new paradigm for generative models. Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is under-explored due to the…

Computation and Language · Computer Science 2023-02-15 Shansan Gong , Mukai Li , Jiangtao Feng , Zhiyong Wu , Lingpeng Kong

Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory

Dataset distillation enables the training of deep neural networks with comparable performance in significantly reduced time by compressing large datasets into small and representative ones. Although the introduction of generative models has…

Machine Learning · Computer Science 2025-05-27 Mingzhuo Li , Guang Li , Jiafeng Mao , Takahiro Ogawa , Miki Haseyama