English

Transfer Learning with Deep Tabular Models

Machine Learning 2023-08-08 v2 Machine Learning

Abstract

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we demonstrate that upstream data gives tabular neural networks a decisive advantage over widely used GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning, and we present a how-to guide for using upstream data to boost performance with a variety of tabular neural network architectures. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications. Our code is available at https://github.com/LevinRoman/tabular-transfer-learning .

Keywords

Cite

@article{arxiv.2206.15306,
  title  = {Transfer Learning with Deep Tabular Models},
  author = {Roman Levin and Valeriia Cherepanova and Avi Schwarzschild and Arpit Bansal and C. Bayan Bruss and Tom Goldstein and Andrew Gordon Wilson and Micah Goldblum},
  journal= {arXiv preprint arXiv:2206.15306},
  year   = {2023}
}