English
Related papers

Related papers: TabSCM: A practical Framework for Generating Reali…

200 papers

Synthetic tabular data generation addresses data scarcity and privacy constraints in a variety of domains. Tabular Prior-Data Fitted Network (TabPFN), a recent foundation model for tabular data, has been shown capable of generating…

Machine Learning · Computer Science 2026-03-12 Davide Tugnoli , Andrea De Lorenzo , Marco Virgolin , Giovanni Cinà

Heterogeneous tabular data poses unique challenges in generative modelling due to its fundamentally different underlying data structure compared to homogeneous modalities, such as images and text. Although previous research has sought to…

Machine Learning · Computer Science 2025-03-13 Xiangjian Jiang , Nikola Simidjievski , Mateja Jamnik

Handling heterogeneous data in tabular datasets poses a significant challenge for deep learning models. While attention-based architectures and self-supervised learning have achieved notable success, their application to tabular data…

Machine Learning · Computer Science 2025-02-27 Anay Majee , Maria Xenochristou , Wei-Peng Chen

Synthetic data generation becomes prevalent as a solution to privacy leakage and data shortage. Generative models are designed to generate a realistic synthetic dataset, which can precisely express the data distribution for the real…

Machine Learning · Computer Science 2021-04-22 Bingyang Wen , Luis Oliveros Colon , K. P. Subbalakshmi , R. Chandramouli

Generative modelling is a demanding test of foundation models, because it requires robust, holistic representation learning for a given data modality, rather than optimisation for a supervised prediction target alone. While recent work on…

Machine Learning · Computer Science 2026-05-12 Xiangjian Jiang , Mingxuan Liu , Nikola Simidjievski , Tassilo Klein , Mateja Jamnik

Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast…

Machine Learning · Computer Science 2025-07-08 Frederik Hoppe , Astrid Franz , Lars Kleinemeier , Udo Göbel

Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a…

Machine Learning · Computer Science 2026-03-06 Xiangjian Jiang , Nikola Simidjievski , Mateja Jamnik

We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables, eliminating the need for Directed Acyclic Graphs (DAGs), and establish the weakest known conditions for their…

Machine Learning · Computer Science 2024-12-16 Meyer Scetbon , Joel Jennings , Agrin Hilmkil , Cheng Zhang , Chao Ma

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Fine-tuning tabular foundation models (TFMs) under data scarcity is challenging, as early stopping on even scarcer validation data often fails to capture true generalization performance. We propose CausalMixFT, a method that enhances…

Machine Learning · Computer Science 2026-01-22 Magnus Bühler , Lennart Purucker , Frank Hutter

Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a…

Machine Learning · Computer Science 2026-05-14 Markus Mueller , Kathrin Gruber , Dennis Fok

Deep generative models have shown tremendous capability in data density estimation and data generation from finite samples. While these models have shown impressive performance by learning correlations among features in the data, some…

Machine Learning · Computer Science 2024-05-24 Aneesh Komanduri , Xintao Wu , Yongkai Wu , Feng Chen

Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model,…

Machine Learning · Computer Science 2026-04-07 Hristo Petkov , Calum MacLellan , Feng Dong

Causal discovery aims to extract qualitative causal knowledge in the form of causal graphs from data. Because causal ground truth is rarely known in the real world, simulated data plays a vital role in evaluating the performance of the…

Machine Learning · Computer Science 2025-12-17 Rebecca J. Herman , Jonas Wahl , Urmi Ninad , Jakob Runge

We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define latent factor causal models (LFCMs) as a restriction on causal DAG models with latent variables,…

Methodology · Statistics 2022-07-06 Chandler Squires , Annie Yun , Eshaan Nichani , Raj Agrawal , Caroline Uhler

Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data. However, many deep causal models rely on highly expressive architectures with opaque…

Machine Learning · Computer Science 2026-03-23 Alejandro Almodóvar , Mar Elizo , Patricia A. Apellániz , Santiago Zazo , Juan Parras

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its…

Machine Learning · Computer Science 2025-02-18 Juntong Shi , Minkai Xu , Harper Hua , Hengrui Zhang , Stefano Ermon , Jure Leskovec

Synthetic tabular data generation has attracted growing attention due to its importance for data augmentation, foundation models, and privacy. However, real-world tabular datasets increasingly contain free-form text fields (e.g., reviews or…

Machine Learning · Computer Science 2026-05-13 Donghong Cai , Jiarui Feng , Yanbo Wang , Da Zheng , Yixin Chen , Muhan Zhang

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes…

Machine Learning · Computer Science 2019-10-29 Lei Xu , Maria Skoularidou , Alfredo Cuesta-Infante , Kalyan Veeramachaneni

Deep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the…

Machine Learning · Statistics 2026-03-12 Xiaofeng Lin , Seungbae Kim , Zhuoya Li , Zachary DeSoto , Charles Fleming , Guang Cheng
‹ Prev 1 2 3 10 Next ›