Related papers: TabularMath: Evaluating Computational Extrapolatio…

Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules

Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions, yet the benchmarks used to evaluate them (TabArena, TALENT, and others) still rely almost exclusively on point-estimate metrics (RMSE,…

Machine Learning · Computer Science 2026-03-31 Jonas Landsgesell , Pascal Knoll

ScoringBench: A Benchmark for Evaluating Tabular Foundation Models with Proper Scoring Rules

Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions, yet prevailing regression benchmarks evaluate them almost exclusively via point-estimate metrics (RMSE, $R^2$). This discards precisely the…

Artificial Intelligence · Computer Science 2026-05-05 Jonas Landsgesell , Pascal Knoll , Tizian Wenzel

TabDPT: Scaling Tabular Foundation Models on Real Data

Tabular data is one of the most ubiquitous sources of information worldwide, spanning a wide variety of domains. This inherent heterogeneity has slowed the development of Tabular Foundation Models (TFMs) capable of fast generalization to…

Machine Learning · Computer Science 2026-01-21 Junwei Ma , Valentin Thomas , Rasa Hosseinzadeh , Alex Labach , Hamidreza Kamkari , Jesse C. Cresswell , Keyvan Golestan , Guangwei Yu , Anthony L. Caterini , Maksims Volkovs

A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this…

Machine Learning · Computer Science 2024-08-28 Assaf Shmuel , Oren Glickman , Teddy Lazebnik

Tabular Data Generation Models: An In-Depth Survey and Performance Benchmarks with Extensive Tuning

The ability to train generative models that produce realistic, safe and useful tabular data is essential for data privacy, imputation, oversampling, explainability or simulation. However, generating tabular data is not straightforward due…

Machine Learning · Computer Science 2025-09-18 G. Charbel N. Kindji , Lina Maria Rojas-Barahona , Elisa Fromont , Tanguy Urvoy

TabularMath: Understanding Math Reasoning over Tables with Large Language Models

Mathematical reasoning has long been a key benchmark for evaluating large language models. Although substantial progress has been made on math word problems, the need for reasoning over tabular data in real-world applications has been…

Artificial Intelligence · Computer Science 2026-04-20 Shi-Yu Tian , Zhi Zhou , Wei Dong , Kun-Yang Yu , Ming Yang , Zi-Jian Cheng , Lan-Zhe Guo , Yu-Feng Li

TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks

Advances in machine learning research drive progress in real-world applications. To ensure this progress, it is important to understand the potential pitfalls on the way from a novel method's success on academic benchmarks to its practical…

Machine Learning · Computer Science 2024-10-25 Ivan Rubachev , Nikolay Kartashev , Yury Gorishniy , Artem Babenko

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

The long-standing dominance of gradient-boosted decision trees on tabular data is currently challenged by tabular foundation models using In-Context Learning (ICL): setting the training data as context for the test data and predicting in a…

Machine Learning · Computer Science 2025-05-27 Jingang Qu , David Holzmüller , Gaël Varoquaux , Marine Le Morvan

TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes

Imbalanced learning remains a fundamental challenge in tabular data applications. Despite decades of research and numerous proposed algorithms, a systematic empirical understanding of how different imbalanced learning methods behave across…

Machine Learning · Computer Science 2026-05-15 Ruizhe Liu , Jiaqi Luo

Tabular Data: Is Deep Learning all you need?

Tabular data represent one of the most prevalent data formats in applied machine learning, largely because they accommodate a broad spectrum of real-world problems. Existing literature has studied many of the shortcomings of neural…

Machine Learning · Computer Science 2025-10-07 Guri Zabërgja , Arlind Kadra , Christian M. M. Frey , Josif Grabocka

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN performs…

Machine Learning · Computer Science 2023-09-19 Noah Hollmann , Samuel Müller , Katharina Eggensperger , Frank Hutter

Retrieval & Fine-Tuning for In-Context Tabular Models

Tabular data is a pervasive modality spanning a wide range of domains, and the inherent diversity poses a considerable challenge for deep learning. Recent advancements using transformer-based in-context learning have shown promise on…

Machine Learning · Computer Science 2024-06-11 Valentin Thomas , Junwei Ma , Rasa Hosseinzadeh , Keyvan Golestan , Guangwei Yu , Maksims Volkovs , Anthony Caterini

On Finetuning Tabular Foundation Models

Foundation models are an emerging research direction in tabular deep learning. Notably, TabPFNv2 recently claimed superior performance over traditional GBDT-based methods on small-scale datasets using an in-context learning paradigm, which…

Machine Learning · Computer Science 2025-06-12 Ivan Rubachev , Akim Kotelnikov , Nikolay Kartashev , Artem Babenko

Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data

Foundation models for tabular data, like TabPFN, achieve strong performance on small datasets when pre-trained solely on synthetic data. We show that this performance can be significantly boosted by a targeted continued pre-training phase.…

Machine Learning · Computer Science 2025-07-08 Anurag Garg , Muhammad Ali , Noah Hollmann , Lennart Purucker , Samuel Müller , Frank Hutter

An Empirical Study of Machine Learning Robustness and Scalability for Imbalanced Tabular Clinical Data in Emergency and Critical Care

Every year, millions of patients pass through emergency departments and intensive care units, where clinicians must make high-stakes decisions under time pressure and uncertainty. Machine learning could support prediction of deterioration,…

Machine Learning · Computer Science 2026-05-27 Yusuf Brima , Marcellin Atemkeng

Stable and Interpretable Deep Learning for Tabular Data: Introducing InterpreTabNet with the Novel InterpreStability Metric

As Artificial Intelligence (AI) integrates deeper into diverse sectors, the quest for powerful models has intensified. While significant strides have been made in boosting model capabilities and their applicability across domains, a glaring…

Machine Learning · Computer Science 2023-10-05 Shiyun Wa , Xinai Lu , Minjuan Wang

Benchmarking Multimodal AutoML for Tabular Data with Text Fields

We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well. Here we assemble 18 multimodal data tables that each contain some text…

Machine Learning · Computer Science 2021-11-05 Xingjian Shi , Jonas Mueller , Nick Erickson , Mu Li , Alexander J. Smola

TabINR: An Implicit Neural Representation Framework for Tabular Data Imputation

Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the…

Machine Learning · Computer Science 2025-10-02 Vincent Ochs , Florentin Bieder , Sidaty el Hadramy , Paul Friedrich , Stephanie Taha-Mehlitz , Anas Taha , Philippe C. Cattin

TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

Prior-Data Fitted networks (PFNs) have been very successful in tabular contexts, handling prediction tasks in context. However, they are designed for single-task inference, meaning that predicting several target values within a context…

Machine Learning · Computer Science 2026-05-21 Cormac Cureton , Narges Armanfard

Prior-Aligned Data Cleaning for Tabular Foundation Models

Tabular Foundation Models (TFMs) achieve state-of-the-art zero-shot accuracy on small tabular datasets by meta-learning over synthetic data-generating processes -- making them highly attractive for practitioners who cannot afford large…

Machine Learning · Computer Science 2026-04-29 Laure Berti-Equille