Related papers: Tabular Foundation Model for Generative Modelling

Does TabPFN Understand Causal Structures?

Causal discovery is fundamental for multiple scientific domains, yet extracting causal information from real world data remains a significant challenge. Given the recent success on real data, we investigate whether TabPFN, a…

Machine Learning · Computer Science 2025-11-11 Omar Swelam , Lennart Purucker , Jake Robertson , Hanne Raum , Joschka Boedecker , Frank Hutter

GAMformer: Bridging Tabular Foundation Models and Interpretable Machine Learning

While interpretability is crucial for machine learning applications in safety-critical domains and for regulatory compliance, existing tabular foundation models like TabPFN lack transparency. Generalized Additive Models (GAMs) provide the…

Machine Learning · Computer Science 2026-02-06 Andreas Mueller , Julien Siems , Harsha Nori , David Salinas , Arber Zela , Rich Caruana , Frank Hutter

How Well Does Your Tabular Generator Learn the Structure of Tabular Data?

Heterogeneous tabular data poses unique challenges in generative modelling due to its fundamentally different underlying data structure compared to homogeneous modalities, such as images and text. Although previous research has sought to…

Machine Learning · Computer Science 2025-03-13 Xiangjian Jiang , Nikola Simidjievski , Mateja Jamnik

TabPFGen -- Tabular Data Generation with TabPFN

Advances in deep generative modelling have not translated well to tabular data. We argue that this is caused by a mismatch in structure between popular generative models and discriminative models of tabular data. We thus devise a technique…

Machine Learning · Computer Science 2024-06-11 Junwei Ma , Apoorv Dankar , George Stein , Guangwei Yu , Anthony Caterini

Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure

Synthetic tabular data generation addresses data scarcity and privacy constraints in a variety of domains. Tabular Prior-Data Fitted Network (TabPFN), a recent foundation model for tabular data, has been shown capable of generating…

Machine Learning · Computer Science 2026-03-12 Davide Tugnoli , Andrea De Lorenzo , Marco Virgolin , Giovanni Cinà

Table Foundation Models: on knowledge pre-training for tabular learning

Table foundation models bring high hopes to data science: pre-trained on tabular data to embark knowledge or priors, they should facilitate downstream tasks on tables. One specific challenge is that of data semantics: numerical entries take…

Machine Learning · Computer Science 2025-07-01 Myung Jun Kim , Félix Lefebvre , Gaëtan Brison , Alexandre Perez-Lebel , Gaël Varoquaux

Deep Learning within Tabular Data: Foundations, Challenges, Advances and Future Directions

Tabular data remains one of the most prevalent data types across a wide range of real-world applications, yet effective representation learning for this domain poses unique challenges due to its irregular patterns, heterogeneous feature…

Machine Learning · Computer Science 2025-01-08 Weijieying Ren , Tianxiang Zhao , Yuqing Huang , Vasant Honavar

TabPFN Through The Looking Glass: An interpretability study of TabPFN and its internal representations

Tabular foundational models are pre-trained models designed for a wide range of tabular data tasks. They have shown strong performance across domains, yet their internal representations and learned concepts remain poorly understood. This…

Machine Learning · Computer Science 2026-01-14 Aviral Gupta , Armaan Sethi , Dhruv Kumar

A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities

Tabular datasets are inherently heterogeneous, presenting significant challenges for developing pre-trained foundation models. The recently introduced transformer-based Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented…

Machine Learning · Computer Science 2025-06-12 Han-Jia Ye , Si-Yang Liu , Wei-Lun Chao

TabSCM: A practical Framework for Generating Realistic Tabular Data

Most tabular-data generators match marginal statistics yet ignore causal structure, leading downstream models to learn spurious or unfair patterns. We present TabSCM, a mixed-type generator that preserves those causal dependencies. Starting…

Machine Learning · Computer Science 2026-04-27 Sven Jacob , Bardh Prenkaj , Weijia Shao , Gjergji Kasneci

TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields

While deep learning has achieved remarkable success across many domains, it has historically underperformed on tabular learning tasks, which remain dominated by gradient boosting decision trees. However, recent advancements are paving the…

Machine Learning · Computer Science 2025-10-31 Alan Arazi , Eilam Shapira , Roi Reichart

TabICLv2: A better, faster, scalable, and open tabular foundation model

Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular data. We introduce TabICLv2, a new…

Machine Learning · Computer Science 2026-02-12 Jingang Qu , David Holzmüller , Gaël Varoquaux , Marine Le Morvan

Causal Pre-training Under the Fairness Lens: An Empirical Study of TabPFN

Foundation models for tabular data, such as the Tabular Prior-data Fitted Network (TabPFN), are pre-trained on a massive number of synthetic datasets generated by structural causal models (SCM). They leverage in-context learning to offer…

Machine Learning · Computer Science 2026-01-28 Qinyi Liu , Mohammad Khalil , Naman Goel

TabPFN: One Model to Rule Them All?

Hollmann et al. (Nature 637 (2025) 319-326) recently introduced TabPFN, a transformer-based deep learning model for regression and classification on tabular data, which they claim "outperforms all previous methods on datasets with up to…

Machine Learning · Computer Science 2025-12-01 Qiong Zhang , Yan Shuo Tan , Qinglong Tian , Pengfei Li

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of…

Machine Learning · Computer Science 2020-12-15 Xin Huang , Ashish Khetan , Milan Cvitkovic , Zohar Karnin

ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning

Deep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the…

Machine Learning · Statistics 2026-03-12 Xiaofeng Lin , Seungbae Kim , Zhuoya Li , Zachary DeSoto , Charles Fleming , Guang Cheng

TabularFM: An Open Framework For Tabular Foundational Models

Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task,…

Machine Learning · Computer Science 2024-06-19 Quan M. Tran , Suong N. Hoang , Lam M. Nguyen , Dzung Phan , Hoang Thanh Lam

nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN

Tabular foundation models such as TabPFN have revolutionized predictive machine learning for tabular data. At the same time, the driving factors of this revolution are hard to understand. Existing open-source tabular foundation models are…

Machine Learning · Computer Science 2025-12-19 Alexander Pfefferle , Johannes Hog , Lennart Purucker , Frank Hutter

TabTreeFormer: Tabular Data Generation Using Hybrid Tree-Transformer

Transformers have shown impressive results in tabular data generation. However, they lack domain-specific inductive biases which are critical for preserving the intrinsic characteristics of tabular data. They also suffer from poor…

Machine Learning · Computer Science 2025-05-19 Jiayu Li , Bingyin Zhao , Zilong Zhao , Uzair Javaid , Kevin Yee , Biplab Sikdar

TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks

With the increasing reliance on automated decision making, the issue of algorithmic fairness has gained increasing importance. In this paper, we propose a Generative Adversarial Network for tabular data generation. The model includes two…

Machine Learning · Computer Science 2021-09-03 Amirarsalan Rajabi , Ozlem Ozmen Garibay