Related papers: Deep Tabular Representation Corrector

LLM Embeddings for Deep Learning on Tabular Data

Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate…

Machine Learning · Computer Science 2025-02-18 Boshko Koloski , Andrei Margeloiu , Xiangjian Jiang , Blaž Škrlj , Nikola Simidjievski , Mateja Jamnik

Deep Learning within Tabular Data: Foundations, Challenges, Advances and Future Directions

Tabular data remains one of the most prevalent data types across a wide range of real-world applications, yet effective representation learning for this domain poses unique challenges due to its irregular patterns, heterogeneous feature…

Machine Learning · Computer Science 2025-01-08 Weijieying Ren , Tianxiang Zhao , Yuqing Huang , Vasant Honavar

Representation Learning for Tabular Data: A Comprehensive Survey

Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks…

Machine Learning · Computer Science 2025-04-24 Jun-Peng Jiang , Si-Yang Liu , Hao-Run Cai , Qile Zhou , Han-Jia Ye

ReConTab: Regularized Contrastive Representation Learning for Tabular Data

Representation learning stands as one of the critical machine learning techniques across various domains. Through the acquisition of high-quality features, pre-trained embeddings significantly reduce input space redundancy, benefiting…

Machine Learning · Computer Science 2023-12-19 Suiyao Chen , Jing Wu , Naira Hovakimyan , Handong Yao

TURL: Table Understanding through Representation Learning

Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on…

Information Retrieval · Computer Science 2020-12-04 Xiang Deng , Huan Sun , Alyssa Lees , You Wu , Cong Yu

Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning

While interests in tabular deep learning has significantly grown, conventional tree-based models still outperform deep learning methods. To narrow this performance gap, we explore the innovative retrieval mechanism, a methodology that…

Machine Learning · Computer Science 2023-11-14 Felix den Breejen , Sangmin Bae , Stephen Cha , Tae-Young Kim , Seoung Hyun Koh , Se-Young Yun

PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks…

Machine Learning · Computer Science 2024-07-16 Hangting Ye , Wei Fan , Xiaozhuang Song , Shun Zheng , He Zhao , Dandan Guo , Yi Chang

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of…

Machine Learning · Computer Science 2020-12-15 Xin Huang , Ashish Khetan , Milan Cvitkovic , Zohar Karnin

Improving Deep Tabular Learning

Tabular data remain a dominant form of real-world information but pose persistent challenges for deep learning due to heterogeneous feature types, lack of natural structure, and limited label-preserving augmentations. As a result, ensemble…

Machine Learning · Computer Science 2025-09-23 Sivan Sarafian , Yehudit Aperstein

Distributionally robust self-supervised learning for tabular data

Machine learning (ML) models trained using Empirical Risk Minimization (ERM) often exhibit systematic errors on specific subpopulations of tabular data, known as error slices. Learning robust representation in presence of error slices is…

Machine Learning · Computer Science 2025-11-10 Shantanu Ghosh , Tiankang Xie , Mikhail Kuznetsov

Rethinking Data Augmentation for Tabular Data in Deep Learning

Tabular data is the most widely used data format in machine learning (ML). While tree-based methods outperform DL-based methods in supervised learning, recent literature reports that self-supervised learning with Transformer-based models…

Machine Learning · Computer Science 2023-05-23 Soma Onishi , Shoya Meguro

Improving the classification of multi-class imbalanced data is more difficult than its two-class counterpart. In this paper, we use deep neural networks to train new representations of tabular multi-class data. Unlike the typically…

Machine Learning · Computer Science 2023-12-19 Damian Horna , Lango Mateusz , Jerzy Stefanowski

TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields

While deep learning has achieved remarkable success across many domains, it has historically underperformed on tabular learning tasks, which remain dominated by gradient boosting decision trees. However, recent advancements are paving the…

Machine Learning · Computer Science 2025-10-31 Alan Arazi , Eilam Shapira , Roi Reichart

Rethinking Pre-Training in Tabular Data: A Neighborhood Embedding Perspective

Pre-training is prevalent in deep learning for vision and text data, leveraging knowledge from other datasets to enhance downstream tasks. However, for tabular data, the inherent heterogeneity in attribute and label spaces across datasets…

Machine Learning · Computer Science 2025-02-13 Han-Jia Ye , Qi-Le Zhou , Huai-Hong Yin , De-Chuan Zhan , Wei-Lun Chao

TabINR: An Implicit Neural Representation Framework for Tabular Data Imputation

Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the…

Machine Learning · Computer Science 2025-10-02 Vincent Ochs , Florentin Bieder , Sidaty el Hadramy , Paul Friedrich , Stephanie Taha-Mehlitz , Anas Taha , Philippe C. Cattin

Unlocking the Transferability of Tokens in Deep Models for Tabular Data

Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. However, such a paradigm becomes particularly challenging with tabular data when there are discrepancies between the feature…

Machine Learning · Computer Science 2023-10-24 Qi-Le Zhou , Han-Jia Ye , Le-Ye Wang , De-Chuan Zhan

Trompt: Towards a Better Deep Neural Network for Tabular Data

Tabular data is arguably one of the most commonly used data structures in various practical domains, including finance, healthcare and e-commerce. The inherent heterogeneity allows tabular data to store rich information. However, based on a…

Machine Learning · Computer Science 2023-06-01 Kuan-Yu Chen , Ping-Han Chiang , Hsin-Rung Chou , Ting-Wei Chen , Tien-Hao Chang

A Closer Look at Deep Learning Methods on Tabular Datasets

Tabular data is prevalent across diverse domains in machine learning. With the rapid progress of deep tabular prediction methods, especially pretrained (foundation) models, there is a growing need to evaluate these methods systematically…

Machine Learning · Computer Science 2025-11-10 Han-Jia Ye , Si-Yang Liu , Hao-Run Cai , Qi-Le Zhou , De-Chuan Zhan

TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction

Predictive modeling on tabular data is the cornerstone of many real-world applications. Although gradient boosting machines and some recent deep models achieve strong performance on tabular data, they often lack interpretability. On the…

Machine Learning · Computer Science 2025-07-01 Tommy Xu , Zhitian Zhang , Xiangyu Sun , Lauren Kelly Zung , Hossein Hajimirsadeghi , Greg Mori

Controlling Neural Networks with Rule Representations

We propose a novel training method that integrates rules into deep learning, in a way the strengths of the rules are controllable at inference. Deep Neural Networks with Controllable Rule Representations (DeepCTRL) incorporates a rule…

Machine Learning · Computer Science 2021-11-18 Sungyong Seo , Sercan O. Arik , Jinsung Yoon , Xiang Zhang , Kihyuk Sohn , Tomas Pfister