English

Continuous Diffusion for Mixed-Type Tabular Data

Machine Learning 2026-03-27 v7 Machine Learning

Abstract

Score-based generative models, commonly referred to as diffusion models, have proven to be successful at generating text and image data. However, their adaptation to mixed-type tabular data remains underexplored. In this work, we propose CDTD, a Continuous Diffusion model for mixed-type Tabular Data. CDTD is based on a novel combination of score matching and score interpolation to enforce a unified continuous noise distribution for both continuous and categorical features. We explicitly acknowledge the necessity of homogenizing distinct data types by relying on model-specific loss calibration and initialization schemes. To further address the high heterogeneity in mixed-type tabular data, we introduce adaptive feature- or type-specific noise schedules. These ensure balanced generative performance across features and optimize the allocation of model capacity across features and diffusion time. Our experimental results show that CDTD consistently outperforms state-of-the-art benchmark models, captures feature correlations exceptionally well, and that heterogeneity in the noise schedule design boosts sample quality. Replication code is available at https://github.com/muellermarkus/cdtd.

Keywords

Cite

@article{arxiv.2312.10431,
  title  = {Continuous Diffusion for Mixed-Type Tabular Data},
  author = {Markus Mueller and Kathrin Gruber and Dennis Fok},
  journal= {arXiv preprint arXiv:2312.10431},
  year   = {2026}
}

Comments

published at ICLR 2025