Related papers: Optimization Techniques for Unsupervised Complex T…

UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining

Tables convey factual and quantitative data with implicit conventions created by humans that are often challenging for machines to parse. Prior work on table recognition (TR) has mainly centered around complex task-specific combinations of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 ShengYun Peng , Aishwarya Chakravarthy , Seongmin Lee , Xiaojing Wang , Rajarajeswari Balasubramaniyan , Duen Horng Chau

Realistic Data Augmentation Framework for Enhancing Tabular Reasoning

Existing approaches to constructing training data for Natural Language Inference (NLI) tasks, such as for semi-structured table reasoning, are either via crowdsourcing or fully automatic methods. However, the former is expensive and…

Computation and Language · Computer Science 2022-10-25 Dibyakanti Kumar , Vivek Gupta , Soumya Sharma , Shuo Zhang

Leveraging Data Recasting to Enhance Tabular Reasoning

Creating challenging tabular inference data is essential for learning complex reasoning. Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is…

Computation and Language · Computer Science 2022-11-24 Aashna Jena , Vivek Gupta , Manish Shrivastava , Julian Martin Eisenschlos

Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models

Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g.,…

Machine Learning · Computer Science 2025-06-09 Rihui Jin , Zheyu Xin , Xing Xie , Zuoyi Li , Guilin Qi , Yongrui Chen , Xinbang Dai , Tongtong Wu , Gholamreza Haffari

PanelTR: Zero-Shot Table Reasoning Framework Through Multi-Agent Scientific Discussion

Table reasoning, including tabular QA and fact verification, often depends on annotated data or complex data augmentation, limiting flexibility and generalization. LLMs, despite their versatility, often underperform compared to simple…

Artificial Intelligence · Computer Science 2025-11-19 Yiran Rex Ma

Deep Contextualized Self-training for Low Resource Dependency Parsing

Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Unfortunately, it requires large amounts of labeled data, that is costly and laborious to create. In this paper we…

Computation and Language · Computer Science 2019-11-12 Guy Rotman , Roi Reichart

RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner

The reasoning abilities of large language models (LLMs) have improved with chain-of-thought (CoT) prompting, allowing models to solve complex tasks stepwise. However, training CoT capabilities requires detailed reasoning data, which is…

Artificial Intelligence · Computer Science 2025-04-11 Fu-Chieh Chang , Yu-Ting Lee , Hui-Ying Shih , Yi Hsuan Tseng , Pei-Yuan Wu

Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

The prevailing paradigm for training large reasoning models--combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR)--is fundamentally constrained by its reliance on high-quality, human-annotated…

Machine Learning · Computer Science 2026-03-24 Yuanfu Wang , Zhixuan Liu , Xiangtian Li , Chaochao Lu , Chao Yang

TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning

Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning, which is crucial for tasks such as table question answering (TQA) and table-based fact verification (TFV).…

Computation and Language · Computer Science 2025-07-11 Xinyuan Lu , Liangming Pan , Yubo Ma , Preslav Nakov , Min-Yen Kan

Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning

Table reasoning, encompassing tasks such as table question answering, fact verification, and text-to-SQL, requires precise understanding of structured tabular data, coupled with numerical computation and code manipulation for effective…

Computation and Language · Computer Science 2025-06-03 Fangyu Lei , Jinxiang Meng , Yiming Huang , Tinghong Chen , Yun Zhang , Shizhu He , Jun Zhao , Kang Liu

Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity

Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment…

Computation and Language · Computer Science 2019-10-11 Eunah Cho , He Xie , John P. Lalor , Varun Kumar , William M. Campbell

Aligning benchmark datasets for table structure recognition

Benchmark datasets for table structure recognition (TSR) must be carefully processed to ensure they are annotated consistently. However, even if a dataset's annotations are self-consistent, there may be significant inconsistency across…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Brandon Smock , Rohith Pesala , Robin Abraham

Retrieval-Based Transformer for Table Augmentation

Data preparation, also called data wrangling, is considered one of the most expensive and time-consuming steps when performing analytics or building machine learning models. Preparing data typically involves collecting and merging data from…

Computation and Language · Computer Science 2023-06-22 Michael Glass , Xueqing Wu , Ankita Rajaram Naik , Gaetano Rossiello , Alfio Gliozzo

Improve Event Extraction via Self-Training with Gradient Guidance

Data scarcity has been the main factor that hinders the progress of event extraction. To overcome this issue, we propose a Self-Training with Feedback (STF) framework that leverages the large-scale unlabeled data and acquires feedback for…

Computation and Language · Computer Science 2023-08-03 Zhiyang Xu , Jay-Yoon Lee , Lifu Huang

CAST: Cluster-Aware Self-Training for Tabular Data via Reliable Confidence

Tabular data is one of the most widely used data modalities, encompassing numerous datasets with substantial amounts of unlabeled data. Despite this prevalence, there is a notable lack of simple and versatile methods for utilizing unlabeled…

Machine Learning · Computer Science 2024-08-30 Minwook Kim , Juseong Kim , Ki Beom Kim , Giltae Song

Self-supervised Text-to-SQL Learning with Header Alignment Training

Since we can leverage a large amount of unlabeled data without any human supervision to train a model and transfer the knowledge to target tasks, self-supervised learning is a de-facto component for the recent success of deep learning in…

Computation and Language · Computer Science 2021-03-12 Donggyu Kim , Seanie Lee

Rethinking Data Augmentation for Tabular Data in Deep Learning

Tabular data is the most widely used data format in machine learning (ML). While tree-based methods outperform DL-based methods in supervised learning, recent literature reports that self-supervised learning with Transformer-based models…

Machine Learning · Computer Science 2023-05-23 Soma Onishi , Shoya Meguro

TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such…

Machine Learning · Computer Science 2025-12-29 Saisai Yang , Qingyi Huang , Jing Yuan , Liangyu Zha , Kai Tang , Yuhang Yang , Ning Wang , Yucheng Wei , Liyao Li , Wentao Ye , Hao Chen , Tao Zhang , Junlin Zhou , Haobo Wang , Gang Chen , Junbo Zhao

STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models

Table reasoning with large language models (LLMs) plays a critical role in building intelligent systems capable of understanding and analyzing tabular data. Despite recent progress, existing methods still face key limitations: their…

Artificial Intelligence · Computer Science 2026-01-27 Huajian Zhang , Mingyue Cheng , Yucong Luo , Xiaoyu Tao

SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning

Self-supervised learning has been shown to be very effective in learning useful representations, and yet much of the success is achieved in data types such as images, audio, and text. The success is mainly enabled by taking advantage of…

Machine Learning · Computer Science 2021-10-28 Talip Ucar , Ehsan Hajiramezanali , Lindsay Edwards