Related papers: A Performance-Driven Benchmark for Feature Selecti…

Transfer Learning with Deep Tabular Models

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models…

Machine Learning · Computer Science 2023-08-08 Roman Levin , Valeriia Cherepanova , Avi Schwarzschild , Arpit Bansal , C. Bayan Bruss , Tom Goldstein , Andrew Gordon Wilson , Micah Goldblum

A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of…

Machine Learning · Computer Science 2024-12-19 Andrej Tschalzev , Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

Feature Selection as Deep Sequential Generative Learning

Feature selection aims to identify the most pattern-discriminative feature subset. In prior literature, filter (e.g., backward elimination) and embedded (e.g., Lasso) methods have hyperparameters (e.g., top-K, score thresholding) and tie to…

Machine Learning · Computer Science 2024-03-07 Wangyang Ying , Dongjie Wang , Haifeng Chen , Yanjie Fu

Tabular Data: Is Deep Learning all you need?

Tabular data represent one of the most prevalent data formats in applied machine learning, largely because they accommodate a broad spectrum of real-world problems. Existing literature has studied many of the shortcomings of neural…

Machine Learning · Computer Science 2025-10-07 Guri Zabërgja , Arlind Kadra , Christian M. M. Frey , Josif Grabocka

Consistent Feature Selection for Analytic Deep Neural Networks

One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the…

Machine Learning · Computer Science 2020-10-19 Vu Dinh , Lam Si Tung Ho

Deep Feature Embedding for Tabular Data

Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper…

Machine Learning · Computer Science 2024-09-02 Yuqian Wu , Hengyi Luo , Raymond S. T. Lee

A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this…

Machine Learning · Computer Science 2024-08-28 Assaf Shmuel , Oren Glickman , Teddy Lazebnik

Unlocking the Transferability of Tokens in Deep Models for Tabular Data

Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. However, such a paradigm becomes particularly challenging with tabular data when there are discrepancies between the feature…

Machine Learning · Computer Science 2023-10-24 Qi-Le Zhou , Han-Jia Ye , Le-Ye Wang , De-Chuan Zhan

TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks

Advances in machine learning research drive progress in real-world applications. To ensure this progress, it is important to understand the potential pitfalls on the way from a novel method's success on academic benchmarks to its practical…

Machine Learning · Computer Science 2024-10-25 Ivan Rubachev , Nikolay Kartashev , Yury Gorishniy , Artem Babenko

Beyond Discrete Selection: Continuous Embedding Space Optimization for Generative Feature Selection

The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection…

Machine Learning · Computer Science 2023-09-18 Meng Xiao , Dongjie Wang , Min Wu , Pengfei Wang , Yuanchun Zhou , Yanjie Fu

LassoFlexNet: Flexible Neural Architecture for Tabular Data

Despite their dominance in vision and language, deep neural networks often underperform relative to tree-based models on tabular data. To bridge this gap, we incorporate five key inductive biases into deep learning: robustness to irrelevant…

Machine Learning · Statistics 2026-03-24 Kry Yik Chau Lui , Cheng Chi , Kishore Basu , Yanshuai Cao

Consistent feature selection for neural networks via Adaptive Group Lasso

One main obstacle for the wide use of deep learning in medical and engineering sciences is its interpretability. While neural network models are strong tools for making predictions, they often provide little information about which features…

Machine Learning · Statistics 2021-12-06 Vu Dinh , Lam Si Tung Ho

Tabular Data Generation Models: An In-Depth Survey and Performance Benchmarks with Extensive Tuning

The ability to train generative models that produce realistic, safe and useful tabular data is essential for data privacy, imputation, oversampling, explainability or simulation. However, generating tabular data is not straightforward due…

Machine Learning · Computer Science 2025-09-18 G. Charbel N. Kindji , Lina Maria Rojas-Barahona , Elisa Fromont , Tanguy Urvoy

A Closer Look at Deep Learning Methods on Tabular Datasets

Tabular data is prevalent across diverse domains in machine learning. With the rapid progress of deep tabular prediction methods, especially pretrained (foundation) models, there is a growing need to evaluate these methods systematically…

Machine Learning · Computer Science 2025-11-10 Han-Jia Ye , Si-Yang Liu , Hao-Run Cai , Qi-Le Zhou , De-Chuan Zhan

Benchmarking Learning Efficiency in Deep Reservoir Computing

It is common to evaluate the performance of a machine learning model by measuring its predictive power on a test dataset. This approach favors complicated models that can smoothly fit complex functions and generalize well from training data…

Machine Learning · Computer Science 2022-10-07 Hugo Cisneros , Josef Sivic , Tomas Mikolov

Feature Selection for Latent Factor Models

Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use…

Machine Learning · Computer Science 2025-04-08 Rittwika Kansabanik , Adrian Barbu

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection…

Machine Learning · Statistics 2019-01-07 Makoto Yamada , Wittawat Jitkrittum , Leonid Sigal , Eric P. Xing , Masashi Sugiyama

Closing the gap on tabular data with Fourier and Implicit Categorical Features

While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last "unconquered castle" for neural…

Machine Learning · Computer Science 2026-02-27 Marius Dragoi , Florin Gogianu , Elena Burceanu

Deep Learning within Tabular Data: Foundations, Challenges, Advances and Future Directions

Tabular data remains one of the most prevalent data types across a wide range of real-world applications, yet effective representation learning for this domain poses unique challenges due to its irregular patterns, heterogeneous feature…

Machine Learning · Computer Science 2025-01-08 Weijieying Ren , Tianxiang Zhao , Yuqing Huang , Vasant Honavar

Towards Benchmarking Foundation Models for Tabular Data With Text

Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns,…

Machine Learning · Computer Science 2025-07-11 Martin Mráz , Breenda Das , Anshul Gupta , Lennart Purucker , Frank Hutter