Related papers: Enhancing Tabular Reasoning with Pattern Exploitin…

Revisiting Pretraining Objectives for Tabular Deep Learning

Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP.…

Machine Learning · Computer Science 2022-07-13 Ivan Rubachev , Artem Alekberov , Yury Gorishniy , Artem Babenko

TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction

Predictive modeling on tabular data is the cornerstone of many real-world applications. Although gradient boosting machines and some recent deep models achieve strong performance on tabular data, they often lack interpretability. On the…

Machine Learning · Computer Science 2025-07-01 Tommy Xu , Zhitian Zhang , Xiangyu Sun , Lauren Kelly Zung , Hossein Hajimirsadeghi , Greg Mori

To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training…

Computation and Language · Computer Science 2020-06-17 Sinong Wang , Madian Khabsa , Hao Ma

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

Recent years have witnessed the burgeoning of pretrained language models (LMs) for text-based natural language (NL) understanding tasks. Such models are typically trained on free-form NL text, hence may not be suitable for tasks like…

Computation and Language · Computer Science 2020-05-19 Pengcheng Yin , Graham Neubig , Wen-tau Yih , Sebastian Riedel

Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding

Automated tabular understanding and reasoning are essential tasks for data scientists. Recently, Large language models (LLMs) have become increasingly prevalent in tabular reasoning tasks. Previous work focuses on (1) finetuning LLMs using…

Machine Learning · Computer Science 2025-08-27 Chufan Gao , Jintai Chen , Jimeng Sun

LERT: A Linguistically-motivated Pre-trained Language Model

Pre-trained Language Model (PLM) has become a representative foundation model in the natural language processing field. Most PLMs are trained with linguistic-agnostic pre-training tasks on the surface form of the text, such as the masked…

Computation and Language · Computer Science 2022-11-11 Yiming Cui , Wanxiang Che , Shijin Wang , Ting Liu

PTab: Using the Pre-trained Language Model for Modeling Tabular Data

Tabular data is the foundation of the information age and has been extensively studied. Recent studies show that neural-based models are effective in learning contextual representation for tabular data. The learning of an effective…

Machine Learning · Computer Science 2022-09-19 Guang Liu , Jie Yang , Ledell Wu

ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications. We present a continual…

Computation and Language · Computer Science 2021-09-20 Rujun Han , Xiang Ren , Nanyun Peng

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Previous literatures show that pre-trained masked language models (MLMs) such as BERT can achieve competitive factual knowledge extraction performance on some datasets, indicating that MLMs can potentially be a reliable knowledge source. In…

Computation and Language · Computer Science 2021-06-18 Boxi Cao , Hongyu Lin , Xianpei Han , Le Sun , Lingyong Yan , Meng Liao , Tong Xue , Jin Xu

Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining

In the domain of data science, the predictive tasks of classification, regression, and imputation of missing values are commonly encountered challenges associated with tabular data. This research endeavors to apply Large Language Models…

Machine Learning · Computer Science 2026-04-23 Yazheng Yang , Yuqi Wang , Yaxuan Li , Sankalok Sen , Lei Li , Lin Qiu , Qi Liu

TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such…

Machine Learning · Computer Science 2025-12-29 Saisai Yang , Qingyi Huang , Jing Yuan , Liangyu Zha , Kai Tang , Yuhang Yang , Ning Wang , Yucheng Wei , Liyao Li , Wentao Ye , Hao Chen , Tao Zhang , Junlin Zhou , Haobo Wang , Gang Chen , Junbo Zhao

Maximizing Efficiency of Language Model Pre-training for Learning Representation

Pre-trained language models in the past years have shown exponential growth in model parameters and compute time. ELECTRA is a novel approach for improving the compute efficiency of pre-trained language models (e.g. BERT) based on masked…

Computation and Language · Computer Science 2021-10-14 Junmo Kang , Suwon Shin , Jeonghwan Kim , Jaeyoung Jo , Sung-Hyon Myaeng

Harnessing LLMs Explanations to Boost Surrogate Models in Tabular Data Classification

Large Language Models (LLMs) have shown remarkable ability in solving complex tasks, making them a promising tool for enhancing tabular learning. However, existing LLM-based methods suffer from high resource requirements, suboptimal…

Machine Learning · Computer Science 2025-05-12 Ruxue Shi , Hengrui Gu , Xu Shen , Xin Wang

Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

Tabular foundation models are becoming increasingly popular for low-resource tabular problems. These models make up for small training datasets by pretraining on large volumes of synthetic data. The prior knowledge obtained via pretraining…

Machine Learning · Computer Science 2026-05-18 George Yakushev , Alina Shutova , Ivan Rubachev , Natalia Bereberdina , Renat Sergazinov , Artem Babenko

Incorporating External Knowledge to Enhance Tabular Reasoning

Reasoning about tabular information presents unique challenges to modern NLP approaches which largely rely on pre-trained contextualized embeddings of text. In this paper, we study these challenges through the problem of tabular natural…

Computation and Language · Computer Science 2021-04-12 J. Neeraja , Vivek Gupta , Vivek Srikumar

Investigating Masking-based Data Generation in Language Models

The current era of natural language processing (NLP) has been defined by the prominence of pre-trained language models since the advent of BERT. A feature of BERT and models with similar architecture is the objective of masked language…

Computation and Language · Computer Science 2023-07-04 Ed S. Ma

On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research

This evidence-based position paper critiques current research practices within the language model pre-training literature. Despite rapid recent progress afforded by increasingly better pre-trained language models (PLMs), current PLM…

Computation and Language · Computer Science 2023-06-06 Made Nindyatama Nityasya , Haryo Akbarianto Wibowo , Alham Fikri Aji , Genta Indra Winata , Radityo Eko Prasojo , Phil Blunsom , Adhiguna Kuncoro

Bridge the Gap between Language models and Tabular Understanding

Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain. Despite the promising findings in tabular pre-trained language models (TPLMs), there is…

Computation and Language · Computer Science 2023-02-21 Nuo Chen , Linjun Shou , Ming Gong , Jian Pei , Chenyu You , Jianhui Chang , Daxin Jiang , Jia Li

Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference

While recent research on natural language inference has considerably benefited from large annotated datasets, the amount of inference-related knowledge (including commonsense) provided in the annotated data is still rather limited. There…

Computation and Language · Computer Science 2021-09-10 Xiaoyu Yang , Xiaodan Zhu , Zhan Shi , Tianda Li

INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models

A salient characteristic of pre-trained language models (PTLMs) is a remarkable improvement in their generalization capability and emergence of new capabilities with increasing model capacity and pre-training dataset size. Consequently, we…

Computation and Language · Computer Science 2023-10-23 H S V N S Kowndinya Renduchintala , Krishnateja Killamsetty , Sumit Bhatia , Milan Aggarwal , Ganesh Ramakrishnan , Rishabh Iyer , Balaji Krishnamurthy