Related papers: Tree-Regularized Tabular Embeddings

Regularization Learning Networks: Deep Learning for Tabular Datasets

Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight…

Machine Learning · Statistics 2018-10-25 Ira Shavitt , Eran Segal

Topologically Regularized Data Embeddings

Unsupervised feature learning often finds low-dimensional embeddings that capture the structure of complex data. For tasks for which prior expert topological knowledge is available, incorporating this into the learned representation may…

Machine Learning · Computer Science 2022-03-08 Robin Vandaele , Bo Kang , Jefrey Lijffijt , Tijl De Bie , Yvan Saeys

Deep Neural Networks and Tabular Data: A Survey

Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous data sets, deep neural networks have repeatedly shown excellent…

Machine Learning · Computer Science 2023-01-24 Vadim Borisov , Tobias Leemann , Kathrin Seßler , Johannes Haug , Martin Pawelczyk , Gjergji Kasneci

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack…

Computation and Language · Computer Science 2026-05-07 Minjie Qiang , Mingming Zhang , Xiaoyi Bao , Xing Fu , Yu Cheng , Weiqiang Wang , Zhongqing Wang , Ningtao Wang

Why do tree-based models still outperform deep learning on tabular data?

While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such…

Machine Learning · Computer Science 2022-07-20 Léo Grinsztajn , Edouard Oyallon , Gaël Varoquaux

Reproducibility of predictive networks for mouse visual cortex

Deep predictive models of neuronal activity have recently enabled several new discoveries about the selectivity and invariance of neurons in the visual cortex. These models learn a shared set of nonlinear basis functions, which are linearly…

Neurons and Cognition · Quantitative Biology 2024-06-19 Polina Turishcheva , Max Burg , Fabian H. Sinz , Alexander Ecker

Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

Embedding-based neural topic models could explicitly represent words and topics by embedding them to a homogeneous feature space, which shows higher interpretability. However, there are no explicit constraints for the training of…

Computation and Language · Computer Science 2022-06-17 Wei Shao , Lei Huang , Shuqi Liu , Shihua Ma , Linqi Song

Topologically Regularized Data Embeddings

Unsupervised representation learning methods are widely used for gaining insight into high-dimensional, unstructured, or structured data. In some cases, users may have prior topological knowledge about the data, such as a known cluster…

Machine Learning · Computer Science 2023-11-08 Edith Heiter , Robin Vandaele , Tijl De Bie , Yvan Saeys , Jefrey Lijffijt

A Closer Look at Deep Learning Methods on Tabular Datasets

Tabular data is prevalent across diverse domains in machine learning. With the rapid progress of deep tabular prediction methods, especially pretrained (foundation) models, there is a growing need to evaluate these methods systematically…

Machine Learning · Computer Science 2025-11-10 Han-Jia Ye , Si-Yang Liu , Hao-Run Cai , Qi-Le Zhou , De-Chuan Zhan

Rapid and Precise Topological Comparison with Merge Tree Neural Networks

Merge trees are a valuable tool in the scientific visualization of scalar fields; however, current methods for merge tree comparisons are computationally expensive, primarily due to the exhaustive matching between tree nodes. To address…

Machine Learning · Computer Science 2024-10-07 Yu Qin , Brittany Terese Fasy , Carola Wenk , Brian Summa

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Tabular Foundation Models have recently established the state of the art in supervised tabular learning, by leveraging pretraining to learn generalizable representations of numerical and categorical structured data. However, they lack…

Machine Learning · Computer Science 2026-05-12 Alan Arazi , Eilam Shapira , Shoham Grunblat , Mor Ventura , Elad Hoffer , Gioia Blayer , David Holzmüller , Lennart Purucker , Gaël Varoquaux , Frank Hutter , Roi Reichart

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells,…

Information Retrieval · Computer Science 2019-06-04 Li Deng , Shuo Zhang , Krisztian Balog

Representation Learning for Tabular Data: A Comprehensive Survey

Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks…

Machine Learning · Computer Science 2025-04-24 Jun-Peng Jiang , Si-Yang Liu , Hao-Run Cai , Qile Zhou , Han-Jia Ye

Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number…

Machine Learning · Computer Science 2026-04-24 Liane Vogel , Kavitha Srinivas , Niharika D'Souza , Sola Shirai , Oktie Hassanzadeh , Horst Samulowitz

SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data

Tabular data is the most commonly used form of data in industry. Gradient Boosting Trees, Support Vector Machine, Random Forest, and Logistic Regression are typically used for classification tasks on tabular data. DNN models using…

Computer Vision and Pattern Recognition · Computer Science 2019-06-05 Baohua Sun , Lin Yang , Wenhan Zhang , Michael Lin , Patrick Dong , Charles Young , Jason Dong

Improving Deep Tabular Learning

Tabular data remain a dominant form of real-world information but pose persistent challenges for deep learning due to heterogeneous feature types, lack of natural structure, and limited label-preserving augmentations. As a result, ensemble…

Machine Learning · Computer Science 2025-09-23 Sivan Sarafian , Yehudit Aperstein

Graph Neural Network contextual embedding for Deep Learning on Tabular Data

All industries are trying to leverage Artificial Intelligence (AI) based on their existing big data which is available in so called tabular form, where each record is composed of a number of heterogeneous continuous and categorical columns…

Machine Learning · Computer Science 2024-02-29 Mario Villaizán-Vallelado , Matteo Salvatori , Belén Carro Martinez , Antonio Javier Sanchez Esguevillas

XBNet : An Extremely Boosted Neural Network

Neural networks have proved to be very robust at processing unstructured data like images, text, videos, and audio. However, it has been observed that their performance is not up to the mark in tabular data; hence tree-based models are…

Machine Learning · Computer Science 2022-04-25 Tushar Sarkar

Neural Reasoning Networks: Efficient Interpretable Neural Networks With Automatic Textual Explanations

Recent advances in machine learning have led to a surge in adoption of neural networks for various tasks, but lack of interpretability remains an issue for many others in which an understanding of the features influencing the prediction is…

Machine Learning · Computer Science 2024-10-11 Stephen Carrow , Kyle Harper Erwin , Olga Vilenskaia , Parikshit Ram , Tim Klinger , Naweed Aghmad Khan , Ndivhuwo Makondo , Alexander Gray

ReConTab: Regularized Contrastive Representation Learning for Tabular Data

Representation learning stands as one of the critical machine learning techniques across various domains. Through the acquisition of high-quality features, pre-trained embeddings significantly reduce input space redundancy, benefiting…

Machine Learning · Computer Science 2023-12-19 Suiyao Chen , Jing Wu , Naira Hovakimyan , Handong Yao