English
Related papers

Related papers: Self-training Language Models for Arithmetic Reaso…

200 papers

Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning…

Computation and Language · Computer Science 2024-08-30 Tian Ye , Zicheng Xu , Yuanzhi Li , Zeyuan Allen-Zhu

Recent successes of reinforcement learning (RL) in training large reasoning models motivate the question of whether self-training - the process where a model learns from its own judgments - can be sustained within RL. In this work, we study…

Machine Learning · Computer Science 2025-10-10 Sheikh Shafayat , Fahim Tajwar , Ruslan Salakhutdinov , Jeff Schneider , Andrea Zanette

Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful…

Computation and Language · Computer Science 2024-07-26 Tianduo Wang , Shichen Li , Wei Lu

Can language models improve their reasoning performance without external rewards, using only their own sampled responses for training? We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that…

Machine Learning · Computer Science 2026-05-18 Mengqi Li , Lei Zhao , Anthony Man-Cho So , Ruoyu Sun , Xiao Li

Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation…

Computation and Language · Computer Science 2024-10-08 Zhihan Zhang , Tao Ge , Zhenwen Liang , Wenhao Yu , Dian Yu , Mengzhao Jia , Dong Yu , Meng Jiang

We examine whether self-supervised language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical…

Machine Learning · Computer Science 2020-08-13 Markus N. Rabe , Dennis Lee , Kshitij Bansal , Christian Szegedy

Recent progress in large language models (LLM) found chain-of-thought prompting strategies to improve the reasoning ability of LLMs by encouraging problem solving through multiple steps. Therefore, subsequent research aimed to integrate the…

Computation and Language · Computer Science 2025-02-21 Ting-Ruen Wei , Haowei Liu , Xuyang Wu , Yi Fang

Large-scale high-quality training data is important for improving the performance of models. After trained with data that has rationales (reasoning steps), models gain reasoning capability. However, the dataset with high-quality rationales…

Computation and Language · Computer Science 2024-05-01 Yunlong Feng , Yang Xu , Libo Qin , Yasheng Wang , Wanxiang Che

Mechanisms for continued self-improvement of language models without external supervision remain an open challenge. We propose Peer-Predictive Self-Training (PST), a label-free fine-tuning framework in which multiple language models improve…

Computation and Language · Computer Science 2026-04-28 Shi Feng , Hanlin Zhang , Fan Nie , Sham Kakade , Yiling Chen

Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Zhengxian Wu , Kai Shi , Chuanrui Zhang , Zirui Liao , Jun Yang , Ni Yang , Qiuying Peng , Luyuan Zhang , Hangrui Xu , Tianhuang Su , Zhenyu Yang , Haonan Lu , Haoqian Wang

Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is…

Machine Learning · Computer Science 2026-03-20 Nived Rajaraman , Audrey Huang , Miro Dudik , Robert Schapire , Dylan J. Foster , Akshay Krishnamurthy

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

Self-training is a useful strategy for semi-supervised learning, leveraging raw texts for enhancing model performances. Traditional self-training methods depend on heuristics such as model confidence for instance selection, the manual…

Computation and Language · Computer Science 2018-04-17 Chenhua Chen , Yue Zhang

Can language models improve their accuracy without external supervision? Methods such as debate, bootstrap, and internal coherence maximization achieve this surprising feat, even matching golden finetuning performance. Yet why they work…

Machine Learning · Computer Science 2026-01-21 Tianyi Qiu , Ahmed Hani Ismail , Zhonghao He , Shi Feng

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we…

Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as…

Machine Learning · Computer Science 2025-02-07 Jaehyeok Lee , Keisuke Sakaguchi , JinYeong Bak

Large Language Models have demonstrated outstanding performance across various downstream tasks and have been widely applied in multiple scenarios. Human-annotated preference data is used for training to further improve LLMs' performance,…

Computation and Language · Computer Science 2025-03-06 Shimao Zhang , Xiao Liu , Xin Zhang , Junxiao Liu , Zheheng Luo , Shujian Huang , Yeyun Gong

Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce…

Machine Learning · Computer Science 2026-03-06 Ruiqi Zhang , Daman Arora , Song Mei , Andrea Zanette

Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without…

Computation and Language · Computer Science 2022-10-26 Jiaxin Huang , Shixiang Shane Gu , Le Hou , Yuexin Wu , Xuezhi Wang , Hongkun Yu , Jiawei Han

Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of…

‹ Prev 1 2 3 10 Next ›