English
Related papers

Related papers: Enhancing Tabular Reasoning with Pattern Exploitin…

200 papers

Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP.…

Machine Learning · Computer Science 2022-07-13 Ivan Rubachev , Artem Alekberov , Yury Gorishniy , Artem Babenko

Predictive modeling on tabular data is the cornerstone of many real-world applications. Although gradient boosting machines and some recent deep models achieve strong performance on tabular data, they often lack interpretability. On the…

Machine Learning · Computer Science 2025-07-01 Tommy Xu , Zhitian Zhang , Xiangyu Sun , Lauren Kelly Zung , Hossein Hajimirsadeghi , Greg Mori

Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training…

Computation and Language · Computer Science 2020-06-17 Sinong Wang , Madian Khabsa , Hao Ma

Recent years have witnessed the burgeoning of pretrained language models (LMs) for text-based natural language (NL) understanding tasks. Such models are typically trained on free-form NL text, hence may not be suitable for tasks like…

Computation and Language · Computer Science 2020-05-19 Pengcheng Yin , Graham Neubig , Wen-tau Yih , Sebastian Riedel

Automated tabular understanding and reasoning are essential tasks for data scientists. Recently, Large language models (LLMs) have become increasingly prevalent in tabular reasoning tasks. Previous work focuses on (1) finetuning LLMs using…

Machine Learning · Computer Science 2025-08-27 Chufan Gao , Jintai Chen , Jimeng Sun

Pre-trained Language Model (PLM) has become a representative foundation model in the natural language processing field. Most PLMs are trained with linguistic-agnostic pre-training tasks on the surface form of the text, such as the masked…

Computation and Language · Computer Science 2022-11-11 Yiming Cui , Wanxiang Che , Shijin Wang , Ting Liu

Tabular data is the foundation of the information age and has been extensively studied. Recent studies show that neural-based models are effective in learning contextual representation for tabular data. The learning of an effective…

Machine Learning · Computer Science 2022-09-19 Guang Liu , Jie Yang , Ledell Wu

While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications. We present a continual…

Computation and Language · Computer Science 2021-09-20 Rujun Han , Xiang Ren , Nanyun Peng

Previous literatures show that pre-trained masked language models (MLMs) such as BERT can achieve competitive factual knowledge extraction performance on some datasets, indicating that MLMs can potentially be a reliable knowledge source. In…

Computation and Language · Computer Science 2021-06-18 Boxi Cao , Hongyu Lin , Xianpei Han , Le Sun , Lingyong Yan , Meng Liao , Tong Xue , Jin Xu

In the domain of data science, the predictive tasks of classification, regression, and imputation of missing values are commonly encountered challenges associated with tabular data. This research endeavors to apply Large Language Models…

Machine Learning · Computer Science 2026-04-23 Yazheng Yang , Yuqi Wang , Yaxuan Li , Sankalok Sen , Lei Li , Lin Qiu , Qi Liu

Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such…

Pre-trained language models in the past years have shown exponential growth in model parameters and compute time. ELECTRA is a novel approach for improving the compute efficiency of pre-trained language models (e.g. BERT) based on masked…

Computation and Language · Computer Science 2021-10-14 Junmo Kang , Suwon Shin , Jeonghwan Kim , Jaeyoung Jo , Sung-Hyon Myaeng

Large Language Models (LLMs) have shown remarkable ability in solving complex tasks, making them a promising tool for enhancing tabular learning. However, existing LLM-based methods suffer from high resource requirements, suboptimal…

Machine Learning · Computer Science 2025-05-12 Ruxue Shi , Hengrui Gu , Xu Shen , Xin Wang

Tabular foundation models are becoming increasingly popular for low-resource tabular problems. These models make up for small training datasets by pretraining on large volumes of synthetic data. The prior knowledge obtained via pretraining…

Machine Learning · Computer Science 2026-05-18 George Yakushev , Alina Shutova , Ivan Rubachev , Natalia Bereberdina , Renat Sergazinov , Artem Babenko

Reasoning about tabular information presents unique challenges to modern NLP approaches which largely rely on pre-trained contextualized embeddings of text. In this paper, we study these challenges through the problem of tabular natural…

Computation and Language · Computer Science 2021-04-12 J. Neeraja , Vivek Gupta , Vivek Srikumar

The current era of natural language processing (NLP) has been defined by the prominence of pre-trained language models since the advent of BERT. A feature of BERT and models with similar architecture is the objective of masked language…

Computation and Language · Computer Science 2023-07-04 Ed S. Ma

This evidence-based position paper critiques current research practices within the language model pre-training literature. Despite rapid recent progress afforded by increasingly better pre-trained language models (PLMs), current PLM…

Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain. Despite the promising findings in tabular pre-trained language models (TPLMs), there is…

Computation and Language · Computer Science 2023-02-21 Nuo Chen , Linjun Shou , Ming Gong , Jian Pei , Chenyu You , Jianhui Chang , Daxin Jiang , Jia Li

While recent research on natural language inference has considerably benefited from large annotated datasets, the amount of inference-related knowledge (including commonsense) provided in the annotated data is still rather limited. There…

Computation and Language · Computer Science 2021-09-10 Xiaoyu Yang , Xiaodan Zhu , Zhan Shi , Tianda Li

A salient characteristic of pre-trained language models (PTLMs) is a remarkable improvement in their generalization capability and emergence of new capabilities with increasing model capacity and pre-training dataset size. Consequently, we…

‹ Prev 1 2 3 10 Next ›