Related papers: Self-training Language Models for Arithmetic Reaso…

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning…

Computation and Language · Computer Science 2024-08-30 Tian Ye , Zicheng Xu , Yuanzhi Li , Zeyuan Allen-Zhu

Can Large Reasoning Models Self-Train?

Recent successes of reinforcement learning (RL) in training large reasoning models motivate the question of whether self-training - the process where a model learns from its own judgments - can be sustained within RL. In this work, we study…

Machine Learning · Computer Science 2025-10-10 Sheikh Shafayat , Fahim Tajwar , Ruslan Salakhutdinov , Jeff Schneider , Andrea Zanette

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful…

Computation and Language · Computer Science 2024-07-26 Tianduo Wang , Shichen Li , Wei Lu

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

Can language models improve their reasoning performance without external rewards, using only their own sampled responses for training? We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that…

Machine Learning · Computer Science 2026-05-18 Mengqi Li , Lei Zhao , Anthony Man-Cho So , Ruoyu Sun , Xiao Li

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation…

Computation and Language · Computer Science 2024-10-08 Zhihan Zhang , Tao Ge , Zhenwen Liang , Wenhao Yu , Dian Yu , Mengzhao Jia , Dong Yu , Meng Jiang

Mathematical Reasoning via Self-supervised Skip-tree Training

We examine whether self-supervised language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical…

Machine Learning · Computer Science 2020-08-13 Markus N. Rabe , Dennis Lee , Kshitij Bansal , Christian Szegedy

A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics

Recent progress in large language models (LLM) found chain-of-thought prompting strategies to improve the reasoning ability of LLMs by encouraging problem solving through multiple steps. Therefore, subsequent research aimed to integrate the…

Computation and Language · Computer Science 2025-02-21 Ting-Ruen Wei , Haowei Liu , Xuyang Wu , Yi Fang

Improving Language Model Reasoning with Self-motivated Learning

Large-scale high-quality training data is important for improving the performance of models. After trained with data that has rationales (reasoning steps), models gain reasoning capability. However, the dataset with high-quality rationales…

Computation and Language · Computer Science 2024-05-01 Yunlong Feng , Yang Xu , Libo Qin , Yasheng Wang , Wanxiang Che

Peer-Predictive Self-Training for Language Model Reasoning

Mechanisms for continued self-improvement of language models without external supervision remain an open challenge. We propose Peer-Predictive Self-Training (PST), a label-free fine-tuning framework in which multiple language models improve…

Computation and Language · Computer Science 2026-04-28 Shi Feng , Hanlin Zhang , Fan Nie , Sham Kakade , Yiling Chen

When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Zhengxian Wu , Kai Shi , Chuanrui Zhang , Zirui Liao , Jun Yang , Ni Yang , Qiuying Peng , Luyuan Zhang , Hangrui Xu , Tianhuang Su , Zhenyu Yang , Haonan Lu , Haoqian Wang

Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum

Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is…

Machine Learning · Computer Science 2026-03-20 Nived Rajaraman , Audrey Huang , Miro Dudik , Robert Schapire , Dylan J. Foster , Akshay Krishnamurthy

Self-Improving Pretraining: using post-trained models to pretrain better models

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

Computation and Language · Computer Science 2026-04-07 Ellen Xiaoqing Tan , Jack Lanchantin , Shehzaad Dhuliawala , Danwei Li , Thao Nguyen , Jing Xu , Ping Yu , Ilia Kulikov , Sainbayar Sukhbaatar , Jason Weston , Xian Li , Olga Golovneva

Learning How to Self-Learn: Enhancing Self-Training Using Neural Reinforcement Learning

Self-training is a useful strategy for semi-supervised learning, leveraging raw texts for enhancing model performances. Traditional self-training methods depend on heuristics such as model confidence for instance selection, the manual…

Computation and Language · Computer Science 2018-04-17 Chenhua Chen , Yue Zhang

Self-Improvement as Coherence Optimization: A Theoretical Account

Can language models improve their accuracy without external supervision? Methods such as debate, bootstrap, and internal coherence maximization achieve this surprising feat, even matching golden finetuning performance. Yet why they work…

Machine Learning · Computer Science 2026-01-21 Tianyi Qiu , Ahmed Hani Ismail , Zhonghao He , Shi Feng

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we…

Machine Learning · Computer Science 2024-04-19 Avi Singh , John D. Co-Reyes , Rishabh Agarwal , Ankesh Anand , Piyush Patil , Xavier Garcia , Peter J. Liu , James Harrison , Jaehoon Lee , Kelvin Xu , Aaron Parisi , Abhishek Kumar , Alex Alemi , Alex Rizkowsky , Azade Nova , Ben Adlam , Bernd Bohnet , Gamaleldin Elsayed , Hanie Sedghi , Igor Mordatch , Isabelle Simpson , Izzeddin Gur , Jasper Snoek , Jeffrey Pennington , Jiri Hron , Kathleen Kenealy , Kevin Swersky , Kshiteej Mahajan , Laura Culp , Lechao Xiao , Maxwell L. Bileschi , Noah Constant , Roman Novak , Rosanne Liu , Tris Warkentin , Yundi Qian , Yamini Bansal , Ethan Dyer , Behnam Neyshabur , Jascha Sohl-Dickstein , Noah Fiedel

Self-Training Meets Consistency: Improving LLMs' Reasoning with Consistency-Driven Rationale Evaluation

Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as…

Machine Learning · Computer Science 2025-02-07 Jaehyeok Lee , Keisuke Sakaguchi , JinYeong Bak

Process-based Self-Rewarding Language Models

Large Language Models have demonstrated outstanding performance across various downstream tasks and have been widely applied in multiple scenarios. Human-annotated preference data is used for training to further improve LLMs' performance,…

Computation and Language · Computer Science 2025-03-06 Shimao Zhang , Xiao Liu , Xin Zhang , Junxiao Liu , Zheheng Luo , Shujian Huang , Yeyun Gong

SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning

Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce…

Machine Learning · Computer Science 2026-03-06 Ruiqi Zhang , Daman Arora , Song Mei , Andrea Zanette

Large Language Models Can Self-Improve

Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without…

Computation and Language · Computer Science 2022-10-26 Jiaxin Huang , Shixiang Shane Gu , Le Hou , Yuexin Wu , Xuezhi Wang , Hongkun Yu , Jiawei Han

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of…

Computation and Language · Computer Science 2024-11-26 Zhiheng Xi , Dingwen Yang , Jixuan Huang , Jiafu Tang , Guanyu Li , Yiwen Ding , Wei He , Boyang Hong , Shihan Do , Wenyu Zhan , Xiao Wang , Rui Zheng , Tao Ji , Xiaowei Shi , Yitao Zhai , Rongxiang Weng , Jingang Wang , Xunliang Cai , Tao Gui , Zuxuan Wu , Qi Zhang , Xipeng Qiu , Xuanjing Huang , Yu-Gang Jiang