Related papers: Distributionally robust self-supervised learning f…

Representation Learning for Tabular Data: A Comprehensive Survey

Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks…

Machine Learning · Computer Science 2025-04-24 Jun-Peng Jiang , Si-Yang Liu , Hao-Run Cai , Qile Zhou , Han-Jia Ye

Learning Representations Robust to Group Shifts and Adversarial Examples

Despite the high performance achieved by deep neural networks on various tasks, extensive studies have demonstrated that small tweaks in the input could fail the model predictions. This issue of deep neural networks has led to a number of…

Machine Learning · Computer Science 2022-02-22 Ming-Chang Chiu , Xuezhe Ma

LLM Embeddings for Deep Learning on Tabular Data

Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate…

Machine Learning · Computer Science 2025-02-18 Boshko Koloski , Andrei Margeloiu , Xiangjian Jiang , Blaž Škrlj , Nikola Simidjievski , Mateja Jamnik

Deep Learning with Tabular Data: A Self-supervised Approach

We have described a novel approach for training tabular data using the TabTransformer model with self-supervised learning. Traditional machine learning models for tabular data, such as GBDT are being widely used though our paper examines…

Machine Learning · Computer Science 2024-01-30 Tirth Kiranbhai Vyas

Improved Group Robustness via Classifier Retraining on Independent Splits

Deep neural networks trained by minimizing the average risk can achieve strong average performance. Still, their performance for a subgroup may degrade if the subgroup is underrepresented in the overall data population. Group…

Machine Learning · Computer Science 2023-08-01 Thien Hang Nguyen , Hongyang R. Zhang , Huy Le Nguyen

Rethinking Data Augmentation for Tabular Data in Deep Learning

Tabular data is the most widely used data format in machine learning (ML). While tree-based methods outperform DL-based methods in supervised learning, recent literature reports that self-supervised learning with Transformer-based models…

Machine Learning · Computer Science 2023-05-23 Soma Onishi , Shoya Meguro

Deep Feature Embedding for Tabular Data

Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper…

Machine Learning · Computer Science 2024-09-02 Yuqian Wu , Hengyi Luo , Raymond S. T. Lee

Distributed Robust Learning

We propose a framework for distributed robust statistical learning on {\em big contaminated data}. The Distributed Robust Learning (DRL) framework can reduce the computational time of traditional robust learning methods by several orders of…

Machine Learning · Statistics 2015-02-10 Jiashi Feng , Huan Xu , Shie Mannor

Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains

Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across…

Machine Learning · Statistics 2024-06-05 Steven Wilkins-Reeves , Xu Chen , Qi Ma , Christine Agarwal , Aude Hofleitner

Deep Tabular Representation Corrector

Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. The recent success of deep learning has fostered many deep networks (e.g., Transformer, ResNet) based…

Machine Learning · Computer Science 2026-03-18 Hangting Ye , Peng Wang , Wei Fan , Xiaozhuang Song , He Zhao , Dandan Gun , Yi Chang

rETF-semiSL: Semi-Supervised Learning for Neural Collapse in Temporal Data

Deep neural networks for time series must capture complex temporal patterns, to effectively represent dynamic data. Self- and semi-supervised learning methods show promising results in pre-training large models, which -- when finetuned for…

Machine Learning · Computer Science 2025-08-15 Yuhan Xie , William Cappelletti , Mahsa Shoaran , Pascal Frossard

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Multimodal large language models (MLLMs) struggle with numerical regression under long-tailed target distributions. Token-level supervised fine-tuning (SFT) and point-wise regression rewards bias learning toward high-density regions,…

Computation and Language · Computer Science 2026-05-12 Yao Du , Shanshan Song , Xiaomeng Li

An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation

The performance of machine learning models under distribution shift has been the focus of the community in recent years. Most of current methods have been proposed to improve the robustness to distribution shift from the algorithmic…

Computer Vision and Pattern Recognition · Computer Science 2022-05-26 Ziquan Liu , Yi Xu , Yuanhong Xu , Qi Qian , Hao Li , Rong Jin , Xiangyang Ji , Antoni B. Chan

Data optimization for large batch distributed training of deep neural networks

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and…

Machine Learning · Computer Science 2020-12-21 Shubhankar Gahlot , Junqi Yin , Mallikarjun Shankar

Small Language Models for Tabular Data

Supervised deep learning is most commonly applied to difficult problems defined on large and often extensively curated datasets. Here we demonstrate the ability of deep representation learning to address problems of classification and…

Machine Learning · Computer Science 2022-11-30 Benjamin L. Badger

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning…

Machine Learning · Computer Science 2019-06-25 Marvin Zhang , Sharad Vikram , Laura Smith , Pieter Abbeel , Matthew J. Johnson , Sergey Levine

ReConTab: Regularized Contrastive Representation Learning for Tabular Data

Representation learning stands as one of the critical machine learning techniques across various domains. Through the acquisition of high-quality features, pre-trained embeddings significantly reduce input space redundancy, benefiting…

Machine Learning · Computer Science 2023-12-19 Suiyao Chen , Jing Wu , Naira Hovakimyan , Handong Yao

Investigating Group Distributionally Robust Optimization for Deep Imbalanced Learning: A Case Study of Binary Tabular Data Classification

One of the most studied machine learning challenges that recent studies have shown the susceptibility of deep neural networks to is the class imbalance problem. While concerted research efforts in this direction have been notable in recent…

Machine Learning · Computer Science 2023-03-07 Ismail. B. Mustapha , Shafaatunnur Hasan , Hatem S Y Nabbus , Mohamed Mostafa Ali Montaser , Sunday Olusanya Olatunji , Siti Maryam Shamsuddin

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

Research on adversarial robustness is primarily focused on image and text data. Yet, many scenarios in which lack of robustness can result in serious risks, such as fraud detection, medical diagnosis, or recommender systems often do not…

Machine Learning · Computer Science 2023-12-14 Klim Kireev , Maksym Andriushchenko , Carmela Troncoso , Nicolas Flammarion

Distribution Shift Aware Neural Tabular Learning

Tabular learning transforms raw features into optimized spaces for downstream tasks, but its effectiveness deteriorates under distribution shifts between training and testing data. We formalize this challenge as the Distribution Shift…

Machine Learning · Computer Science 2025-08-28 Wangyang Ying , Nanxu Gong , Dongjie Wang , Xinyuan Wang , Arun Vignesh Malarkkan , Vivek Gupta , Chandan K. Reddy , Yanjie Fu