Related papers: TabSCM: A practical Framework for Generating Reali…

Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure

Synthetic tabular data generation addresses data scarcity and privacy constraints in a variety of domains. Tabular Prior-Data Fitted Network (TabPFN), a recent foundation model for tabular data, has been shown capable of generating…

Machine Learning · Computer Science 2026-03-12 Davide Tugnoli , Andrea De Lorenzo , Marco Virgolin , Giovanni Cinà

How Well Does Your Tabular Generator Learn the Structure of Tabular Data?

Heterogeneous tabular data poses unique challenges in generative modelling due to its fundamentally different underlying data structure compared to homogeneous modalities, such as images and text. Although previous research has sought to…

Machine Learning · Computer Science 2025-03-13 Xiangjian Jiang , Nikola Simidjievski , Mateja Jamnik

TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization

Handling heterogeneous data in tabular datasets poses a significant challenge for deep learning models. While attention-based architectures and self-supervised learning have achieved notable success, their application to tabular data…

Machine Learning · Computer Science 2025-02-27 Anay Majee , Maria Xenochristou , Wei-Peng Chen

Causal-TGAN: Generating Tabular Data Using Causal Generative Adversarial Networks

Synthetic data generation becomes prevalent as a solution to privacy leakage and data shortage. Generative models are designed to generate a realistic synthetic dataset, which can precisely express the data distribution for the real…

Machine Learning · Computer Science 2021-04-22 Bingyang Wen , Luis Oliveros Colon , K. P. Subbalakshmi , R. Chandramouli

Tabular Foundation Model for Generative Modelling

Generative modelling is a demanding test of foundation models, because it requires robust, holistic representation learning for a given data modality, rather than optimisation for a supervised prediction target alone. While recent work on…

Machine Learning · Computer Science 2026-05-12 Xiangjian Jiang , Mingxuan Liu , Nikola Simidjievski , Tassilo Klein , Mateja Jamnik

Generating Synthetic Relational Tabular Data via Structural Causal Models

Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast…

Machine Learning · Computer Science 2025-07-08 Frederik Hoppe , Astrid Franz , Lars Kleinemeier , Udo Göbel

TabStruct: Measuring Structural Fidelity of Tabular Data

Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a…

Machine Learning · Computer Science 2026-03-06 Xiangjian Jiang , Nikola Simidjievski , Mateja Jamnik

A Fixed-Point Approach for Causal Generative Modeling

We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables, eliminating the need for Directed Acyclic Graphs (DAGs), and establish the weakest known conditions for their…

Machine Learning · Computer Science 2024-12-16 Meyer Scetbon , Joel Jennings , Agrin Hilmkil , Cheng Zhang , Chao Ma

CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models

Fine-tuning tabular foundation models (TFMs) under data scarcity is challenging, as early stopping on even scarcer validation data often fails to capture true generalization performance. We propose CausalMixFT, a method that enhances…

Machine Learning · Computer Science 2026-01-22 Magnus Bühler , Lennart Purucker , Frank Hutter

Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features

Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a…

Machine Learning · Computer Science 2026-05-14 Markus Mueller , Kathrin Gruber , Dennis Fok

From Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative Modeling

Deep generative models have shown tremendous capability in data density estimation and data generation from finite samples. While these models have shown impressive performance by learning correlations among features in the data, some…

Machine Learning · Computer Science 2024-05-24 Aneesh Komanduri , Xintao Wu , Yongkai Wu , Feng Chen

DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis

Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model,…

Machine Learning · Computer Science 2026-04-07 Hristo Petkov , Calum MacLellan , Feng Dong

Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery

Causal discovery aims to extract qualitative causal knowledge in the form of causal graphs from data. Because causal ground truth is rarely known in the real world, simulated data plays a vital role in evaluating the performance of the…

Machine Learning · Computer Science 2025-12-17 Rebecca J. Herman , Jonas Wahl , Urmi Ninad , Jakob Runge

Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors

We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define latent factor causal models (LFCMs) as a restriction on causal DAG models with latent variables,…

Methodology · Statistics 2022-07-06 Chandler Squires , Annie Yun , Eshaan Nichani , Raj Agrawal , Caroline Uhler

Kolmogorov-Arnold causal generative models

Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data. However, many deep causal models rely on highly expressive architectures with opaque…

Machine Learning · Computer Science 2026-03-23 Alejandro Almodóvar , Mar Elizo , Patricia A. Apellániz , Santiago Zazo , Juan Parras

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its…

Machine Learning · Computer Science 2025-02-18 Juntong Shi , Minkai Xu , Harper Hua , Hengrui Zhang , Stefano Ermon , Jure Leskovec

TabDLM: Free-Form Tabular Data Generation via Joint Numerical-Language Diffusion

Synthetic tabular data generation has attracted growing attention due to its importance for data augmentation, foundation models, and privacy. However, real-world tabular datasets increasingly contain free-form text fields (e.g., reviews or…

Machine Learning · Computer Science 2026-05-13 Donghong Cai , Jiarui Feng , Yanbo Wang , Da Zheng , Yixin Chen , Muhan Zhang

Modeling Tabular data using Conditional GAN

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes…

Machine Learning · Computer Science 2019-10-29 Lei Xu , Maria Skoularidou , Alfredo Cuesta-Infante , Kalyan Veeramachaneni

ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning

Deep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the…

Machine Learning · Statistics 2026-03-12 Xiaofeng Lin , Seungbae Kim , Zhuoya Li , Zachary DeSoto , Charles Fleming , Guang Cheng