English
Related papers

Related papers: Stabilizing Efficient Reasoning with Step-Level Ad…

200 papers

Test-time scaling has proven effective in further enhancing the performance of pretrained Large Language Models (LLMs). However, mainstream post-training methods (i.e., reinforcement learning (RL) with chain-of-thought (CoT) reasoning)…

Machine Learning · Computer Science 2025-08-19 Yuyang Xu , Yi Cheng , Haochao Ying , Zhuoyun Du , Renjun Hu , Xing Shi , Wei Lin , Jian Wu

Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by…

Machine Learning · Computer Science 2026-02-02 Deyang Kong , Qi Guo , Xiangyu Xi , Wei Wang , Jingang Wang , Xunliang Cai , Shikun Zhang , Wei Ye

Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice…

Artificial Intelligence · Computer Science 2026-04-10 Weiyang Huang , Xuefeng Bai , Kehai Chen , Xinyang Chen , Yibin Chen , Weili Guan , Min Zhang

Large reasoning models improve with more test-time computation, but often overthink, producing unnecessarily long chains-of-thought that raise cost without improving accuracy. Prior reinforcement learning approaches typically rely on a…

Computation and Language · Computer Science 2026-03-03 Xintong Li , Sha Li , Rongmei Lin , Hongye Jin , Linwei Li , Hejie Cui , Sarah Zhang , Chia-Yuan Chang , Kewei Cheng , Besnik Fetahu , Priyanka Nigam , Jingbo Shang , Bing Yin

Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training…

Computation and Language · Computer Science 2025-01-08 Yuchun Fan , Yongyu Mu , Yilin Wang , Lei Huang , Junhao Ruan , Bei Li , Tong Xiao , Shujian Huang , Xiaocheng Feng , Jingbo Zhu

Large reasoning models (LRMs) achieve strong performance through extended reasoning traces, but they often exhibit overthinking behavior for low-complexity queries. Existing efforts to mitigate this issue are fundamentally limited by…

Machine Learning · Computer Science 2026-02-27 Zihang Xu , Haozhi Xie , Ziqi Miao , Wuxuan Gong , Chen Qian , Lijun Li

Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without additional training. However, most…

Machine Learning · Computer Science 2026-01-07 Tuc Nguyen , Thai Le

Reinforcement Learning with Verifiable Rewards (RLVR) elicits long chain-of-thought reasoning in large language models (LLMs), but outcome-based rewards lead to coarse-grained advantage estimation. While existing approaches improve RLVR via…

Computation and Language · Computer Science 2026-01-08 Fei Wu , Zhenrong Zhang , Qikai Chang , Jianshu Zhang , Quan Liu , Jun Du

Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender…

Information Retrieval · Computer Science 2026-05-12 Yiwen Chen , Fuwei Zhang , Zehao Chen , Deqing Wang , Hehan Li , Peizhi Xu , Hanmeng Liu , Shuanglong Li , Xin Pei , Fuzhen Zhuang , Zhao Zhang

Training Large Language Models (LLMs) for reasoning tasks is increasingly driven by Reinforcement Learning with Verifiable Rewards (RLVR), where Proximal Policy Optimization (PPO) provides a principled framework for stable policy updates.…

Machine Learning · Computer Science 2026-01-13 Xue Gong , Qi Yi , Ziyuan Nan , Guanhua Huang , Kejiao Li , Yuhao Jiang , Ruibin Xiong , Zenan Xu , Jiaming Guo , Shaohui Peng , Bo Zhou

Large reasoning models (LRMs) achieve impressive reasoning capabilities by generating lengthy chain-of-thoughts, but this "overthinking" incurs high latency and cost without commensurate accuracy gains. In this work, we introduce AALC, a…

Computation and Language · Computer Science 2025-08-11 Ruosen Li , Ziming Luo , Quan Zhang , Ruochen Li , Ben Zhou , Ali Payani , Xinya Du

Long chain-of-thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs). However, extensive reasoning traces lead to inefficiencies and increased time-to-first-token (TTFT). We propose a training…

Computation and Language · Computer Science 2026-01-08 Roy Xie , David Qiu , Deepak Gopinath , Dong Lin , Yanchao Sun , Chong Wang , Saloni Potdar , Bhuwan Dhingra

Many applications of large language models (LLMs) require deductive reasoning, yet models frequently produce incorrect or redundant inference steps. We frame natural language inference as a search problem where the final answer is the valid…

Artificial Intelligence · Computer Science 2026-05-26 Andreas Opedal , Francesco Ignazio Re , Abulhair Saparov , Mrinmaya Sachan , Bernhard Schölkopf , Ryan Cotterell

Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods…

Artificial Intelligence · Computer Science 2025-10-13 Dongqi Zheng

Large Language Models (LLMs) have demonstrated remarkable capabilities, enabling language agents to excel at single-turn tasks. However, their application to complex, multi-step, and long-horizon tasks remains challenging. While…

Machine Learning · Computer Science 2025-10-24 Jiazheng Li , Yawei Wang , David Yan , Yijun Tian , Zhichao Xu , Huan Song , Panpan Xu , Lin Lee Cheong

Large language models (LLMs) have achieved impressive results on multi-step mathematical reasoning, yet at the cost of high computational overhead. This challenge is particularly acute for test-time scaling methods such as parallel…

Machine Learning · Computer Science 2026-03-24 Yuanlin Chu , Bo Wang , Xiang Liu , Hong Chen , Aiwei Liu , Xuming Hu

The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of-thought (CoT). However, CoT often becomes…

Computation and Language · Computer Science 2026-01-07 Nathanaël Carraz Rakotonirina , Ren Pang , Neha Anna John , Michael Bohlke-Schneider , Momchil Hardalov

Recent advancements in large language models (LLMs) have significantly advanced complex reasoning capabilities, particularly through extended chain-of-thought (CoT) reasoning that incorporates mechanisms such as backtracking,…

Computation and Language · Computer Science 2025-10-21 Baohao Liao , Xinyi Chen , Sara Rajaee , Yuhui Xu , Christian Herold , Anders Søgaard , Maarten de Rijke , Christof Monz

Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking…

Computation and Language · Computer Science 2026-03-23 Taiqiang Wu , Zenan Xu , Bo Zhou , Ngai Wong

Large language models (LLMs) have been shown to be capable of impressive few-shot generalisation to new tasks. However, they still tend to perform poorly on multi-step logical reasoning problems. Here we carry out a comprehensive evaluation…

Artificial Intelligence · Computer Science 2022-05-20 Antonia Creswell , Murray Shanahan , Irina Higgins
‹ Prev 1 2 3 10 Next ›