English

Modeling Tabular data using Conditional GAN

Machine Learning 2019-10-29 v2 Machine Learning

Abstract

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.

Keywords

Cite

@article{arxiv.1907.00503,
  title  = {Modeling Tabular data using Conditional GAN},
  author = {Lei Xu and Maria Skoularidou and Alfredo Cuesta-Infante and Kalyan Veeramachaneni},
  journal= {arXiv preprint arXiv:1907.00503},
  year   = {2019}
}

Comments

Accepted to NeurIPS 2019

R2 v1 2026-06-23T10:08:07.669Z