English
Related papers

Related papers: Training Language Models to Reason Efficiently

200 papers

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on…

Computation and Language · Computer Science 2026-01-06 DeepSeek-AI , Daya Guo , Dejian Yang , Haowei Zhang , Junxiao Song , Peiyi Wang , Qihao Zhu , Runxin Xu , Ruoyu Zhang , Shirong Ma , Xiao Bi , Xiaokang Zhang , Xingkai Yu , Yu Wu , Z. F. Wu , Zhibin Gou , Zhihong Shao , Zhuoshu Li , Ziyi Gao , Aixin Liu , Bing Xue , Bingxuan Wang , Bochao Wu , Bei Feng , Chengda Lu , Chenggang Zhao , Chengqi Deng , Chenyu Zhang , Chong Ruan , Damai Dai , Deli Chen , Dongjie Ji , Erhang Li , Fangyun Lin , Fucong Dai , Fuli Luo , Guangbo Hao , Guanting Chen , Guowei Li , H. Zhang , Han Bao , Hanwei Xu , Haocheng Wang , Honghui Ding , Huajian Xin , Huazuo Gao , Hui Qu , Hui Li , Jianzhong Guo , Jiashi Li , Jiawei Wang , Jingchang Chen , Jingyang Yuan , Junjie Qiu , Junlong Li , J. L. Cai , Jiaqi Ni , Jian Liang , Jin Chen , Kai Dong , Kai Hu , Kaige Gao , Kang Guan , Kexin Huang , Kuai Yu , Lean Wang , Lecong Zhang , Liang Zhao , Litong Wang , Liyue Zhang , Lei Xu , Leyi Xia , Mingchuan Zhang , Minghua Zhang , Minghui Tang , Meng Li , Miaojun Wang , Mingming Li , Ning Tian , Panpan Huang , Peng Zhang , Qiancheng Wang , Qinyu Chen , Qiushi Du , Ruiqi Ge , Ruisong Zhang , Ruizhe Pan , Runji Wang , R. J. Chen , R. L. Jin , Ruyi Chen , Shanghao Lu , Shangyan Zhou , Shanhuang Chen , Shengfeng Ye , Shiyu Wang , Shuiping Yu , Shunfeng Zhou , Shuting Pan , S. S. Li , Shuang Zhou , Shaoqing Wu , Shengfeng Ye , Tao Yun , Tian Pei , Tianyu Sun , T. Wang , Wangding Zeng , Wanjia Zhao , Wen Liu , Wenfeng Liang , Wenjun Gao , Wenqin Yu , Wentao Zhang , W. L. Xiao , Wei An , Xiaodong Liu , Xiaohan Wang , Xiaokang Chen , Xiaotao Nie , Xin Cheng , Xin Liu , Xin Xie , Xingchao Liu , Xinyu Yang , Xinyuan Li , Xuecheng Su , Xuheng Lin , X. Q. Li , Xiangyue Jin , Xiaojin Shen , Xiaosha Chen , Xiaowen Sun , Xiaoxiang Wang , Xinnan Song , Xinyi Zhou , Xianzu Wang , Xinxia Shan , Y. K. Li , Y. Q. Wang , Y. X. Wei , Yang Zhang , Yanhong Xu , Yao Li , Yao Zhao , Yaofeng Sun , Yaohui Wang , Yi Yu , Yichao Zhang , Yifan Shi , Yiliang Xiong , Ying He , Yishi Piao , Yisong Wang , Yixuan Tan , Yiyang Ma , Yiyuan Liu , Yongqiang Guo , Yuan Ou , Yuduan Wang , Yue Gong , Yuheng Zou , Yujia He , Yunfan Xiong , Yuxiang Luo , Yuxiang You , Yuxuan Liu , Yuyang Zhou , Y. X. Zhu , Yanhong Xu , Yanping Huang , Yaohui Li , Yi Zheng , Yuchen Zhu , Yunxian Ma , Ying Tang , Yukun Zha , Yuting Yan , Z. Z. Ren , Zehui Ren , Zhangli Sha , Zhe Fu , Zhean Xu , Zhenda Xie , Zhengyan Zhang , Zhewen Hao , Zhicheng Ma , Zhigang Yan , Zhiyu Wu , Zihui Gu , Zijia Zhu , Zijun Liu , Zilin Li , Ziwei Xie , Ziyang Song , Zizheng Pan , Zhen Huang , Zhipeng Xu , Zhongyu Zhang , Zhen Zhang

Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling. While reinforcement…

Machine Learning · Computer Science 2025-06-16 Zhenyu Hou , Xin Lv , Rui Lu , Jiajie Zhang , Yujiang Li , Zijun Yao , Juanzi Li , Jie Tang , Yuxiao Dong

Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to perform complex reasoning tasks, transitioning from fast and intuitive thinking (System 1) to slow and deep reasoning (System 2). While System…

Computation and Language · Computer Science 2025-04-01 Rui Wang , Hongru Wang , Boyang Xue , Jianhui Pang , Shudong Liu , Yi Chen , Jiahao Qiu , Derek Fai Wong , Heng Ji , Kam-Fai Wong

Long chain-of-thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs). However, extensive reasoning traces lead to inefficiencies and increased time-to-first-token (TTFT). We propose a training…

Computation and Language · Computer Science 2026-01-08 Roy Xie , David Qiu , Deepak Gopinath , Dong Lin , Yanchao Sun , Chong Wang , Saloni Potdar , Bhuwan Dhingra

Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking…

Computation and Language · Computer Science 2026-03-23 Taiqiang Wu , Zenan Xu , Bo Zhou , Ngai Wong

Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs), deploying additional inference-time compute to improve task performance. However, as LLMs increase in both size and adoption,…

Computation and Language · Computer Science 2025-06-25 C. Nicolò De Sabbata , Theodore R. Sumers , Badr AlKhamissi , Antoine Bosselut , Thomas L. Griffiths

Recent advancements in large reasoning models (LRMs) have significantly enhanced language models' capabilities in complex problem-solving by emulating human-like deliberative thinking. However, these models often exhibit overthinking (i.e.,…

Artificial Intelligence · Computer Science 2025-06-19 Weixiang Zhao , Jiahe Guo , Yang Deng , Xingyu Sui , Yulin Hu , Yanyan Zhao , Wanxiang Che , Bing Qin , Tat-Seng Chua , Ting Liu

In recent years, training methods centered on Reinforcement Learning (RL) have markedly enhanced the reasoning and alignment performance of Large Language Models (LLMs), particularly in understanding human intents, following user…

Computation and Language · Computer Science 2025-09-23 Keliang Liu , Dingkang Yang , Ziyun Qian , Weijie Yin , Yuchi Wang , Hongsheng Li , Jun Liu , Peng Zhai , Yang Liu , Lihua Zhang

Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason, exhibiting promising performance in solving complex tasks. However, their deliberative reasoning process leads…

Computation and Language · Computer Science 2025-08-14 Yue Liu , Jiaying Wu , Yufei He , Ruihan Gong , Jun Xia , Liang Li , Hongcheng Gao , Hongyu Chen , Baolong Bi , Jiaheng Zhang , Zhiqi Huang , Bryan Hooi , Stan Z. Li , Keqin Li

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning…

Computation and Language · Computer Science 2025-08-25 Yang Sui , Yu-Neng Chuang , Guanchu Wang , Jiamu Zhang , Tianyi Zhang , Jiayi Yuan , Hongyi Liu , Andrew Wen , Shaochen Zhong , Na Zou , Hanjie Chen , Xia Hu

Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers…

In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex…

Reward models (RMs) play a critical role in enhancing the reasoning performance of LLMs. For example, they can provide training signals to finetune LLMs during reinforcement learning (RL) and help select the best answer from multiple…

Computation and Language · Computer Science 2025-10-06 Qiyuan Liu , Hao Xu , Xuhong Chen , Wei Chen , Yee Whye Teh , Ning Miao

Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we…

Artificial Intelligence · Computer Science 2024-10-10 Martin Klissarov , Devon Hjelm , Alexander Toshev , Bogdan Mazoure

Reasoning large language models (LLMs) excel in complex tasks, which has drawn significant attention to reinforcement learning (RL) for LLMs. However, existing approaches allocate an equal number of rollouts to all questions during the RL…

Machine Learning · Computer Science 2025-10-21 Mengqi Liao , Xiangyu Xi , Ruinian Chen , Jia Leng , Yangen Hu , Ke Zeng , Shuai Liu , Huaiyu Wan

Recent advancements in reasoning-focused language models such as OpenAI's O1 and DeepSeek-R1 have shown that scaling test-time computation-through chain-of-thought reasoning and iterative exploration-can yield substantial improvements on…

Large language models (LLMs) have demonstrated impressive performance on reasoning-intensive tasks, but enhancing their reasoning abilities typically relies on either reinforcement learning (RL) with verifiable signals or supervised…

Computation and Language · Computer Science 2026-03-17 Yige Yuan , Teng Xiao , Shuchang Tao , Xue Wang , Jinyang Gao , Bolin Ding , Bingbing Xu

Enhancing the reasoning capabilities of large language models (LLMs) typically relies on massive computational resources and extensive datasets, limiting accessibility for resource-constrained settings. Our study investigates the potential…

Machine Learning · Computer Science 2026-01-21 Quy-Anh Dang , Chris Ngo

Reinforcement learning (RL) has emerged as a promising strategy for finetuning small language models (SLMs) to solve targeted tasks such as math and coding. However, RL algorithms tend to be resource-intensive, taking a significant amount…

Machine Learning · Computer Science 2025-10-07 Lianghuan Huang , Sagnik Anupam , Insup Lee , Shuo Li , Osbert Bastani

Large reasoning models (LRMs) have recently shown promise in solving complex math problems when optimized with Reinforcement Learning (RL). But conventional approaches rely on outcome-only rewards that provide sparse feedback, resulting in…

Machine Learning · Computer Science 2025-08-01 Tao He , Rongchuan Mu , Lizi Liao , Yixin Cao , Ming Liu , Bing Qin
‹ Prev 1 2 3 10 Next ›