English
Related papers

Related papers: A Performance-Driven Benchmark for Feature Selecti…

200 papers

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models…

Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of…

Machine Learning · Computer Science 2024-12-19 Andrej Tschalzev , Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

Feature selection aims to identify the most pattern-discriminative feature subset. In prior literature, filter (e.g., backward elimination) and embedded (e.g., Lasso) methods have hyperparameters (e.g., top-K, score thresholding) and tie to…

Machine Learning · Computer Science 2024-03-07 Wangyang Ying , Dongjie Wang , Haifeng Chen , Yanjie Fu

Tabular data represent one of the most prevalent data formats in applied machine learning, largely because they accommodate a broad spectrum of real-world problems. Existing literature has studied many of the shortcomings of neural…

Machine Learning · Computer Science 2025-10-07 Guri Zabërgja , Arlind Kadra , Christian M. M. Frey , Josif Grabocka

One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the…

Machine Learning · Computer Science 2020-10-19 Vu Dinh , Lam Si Tung Ho

Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper…

Machine Learning · Computer Science 2024-09-02 Yuqian Wu , Hengyi Luo , Raymond S. T. Lee

The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this…

Machine Learning · Computer Science 2024-08-28 Assaf Shmuel , Oren Glickman , Teddy Lazebnik

Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. However, such a paradigm becomes particularly challenging with tabular data when there are discrepancies between the feature…

Machine Learning · Computer Science 2023-10-24 Qi-Le Zhou , Han-Jia Ye , Le-Ye Wang , De-Chuan Zhan

Advances in machine learning research drive progress in real-world applications. To ensure this progress, it is important to understand the potential pitfalls on the way from a novel method's success on academic benchmarks to its practical…

Machine Learning · Computer Science 2024-10-25 Ivan Rubachev , Nikolay Kartashev , Yury Gorishniy , Artem Babenko

The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection…

Machine Learning · Computer Science 2023-09-18 Meng Xiao , Dongjie Wang , Min Wu , Pengfei Wang , Yuanchun Zhou , Yanjie Fu

Despite their dominance in vision and language, deep neural networks often underperform relative to tree-based models on tabular data. To bridge this gap, we incorporate five key inductive biases into deep learning: robustness to irrelevant…

Machine Learning · Statistics 2026-03-24 Kry Yik Chau Lui , Cheng Chi , Kishore Basu , Yanshuai Cao

One main obstacle for the wide use of deep learning in medical and engineering sciences is its interpretability. While neural network models are strong tools for making predictions, they often provide little information about which features…

Machine Learning · Statistics 2021-12-06 Vu Dinh , Lam Si Tung Ho

The ability to train generative models that produce realistic, safe and useful tabular data is essential for data privacy, imputation, oversampling, explainability or simulation. However, generating tabular data is not straightforward due…

Machine Learning · Computer Science 2025-09-18 G. Charbel N. Kindji , Lina Maria Rojas-Barahona , Elisa Fromont , Tanguy Urvoy

Tabular data is prevalent across diverse domains in machine learning. With the rapid progress of deep tabular prediction methods, especially pretrained (foundation) models, there is a growing need to evaluate these methods systematically…

Machine Learning · Computer Science 2025-11-10 Han-Jia Ye , Si-Yang Liu , Hao-Run Cai , Qi-Le Zhou , De-Chuan Zhan

It is common to evaluate the performance of a machine learning model by measuring its predictive power on a test dataset. This approach favors complicated models that can smoothly fit complex functions and generalize well from training data…

Machine Learning · Computer Science 2022-10-07 Hugo Cisneros , Josef Sivic , Tomas Mikolov

Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use…

Machine Learning · Computer Science 2025-04-08 Rittwika Kansabanik , Adrian Barbu

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection…

Machine Learning · Statistics 2019-01-07 Makoto Yamada , Wittawat Jitkrittum , Leonid Sigal , Eric P. Xing , Masashi Sugiyama

While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last "unconquered castle" for neural…

Machine Learning · Computer Science 2026-02-27 Marius Dragoi , Florin Gogianu , Elena Burceanu

Tabular data remains one of the most prevalent data types across a wide range of real-world applications, yet effective representation learning for this domain poses unique challenges due to its irregular patterns, heterogeneous feature…

Machine Learning · Computer Science 2025-01-08 Weijieying Ren , Tianxiang Zhao , Yuqing Huang , Vasant Honavar

Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns,…

Machine Learning · Computer Science 2025-07-11 Martin Mráz , Breenda Das , Anshul Gupta , Lennart Purucker , Frank Hutter
‹ Prev 1 2 3 10 Next ›