Related papers: Stabilizing Efficient Reasoning with Step-Level Ad…

SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression

Test-time scaling has proven effective in further enhancing the performance of pretrained Large Language Models (LLMs). However, mainstream post-training methods (i.e., reinforcement learning (RL) with chain-of-thought (CoT) reasoning)…

Machine Learning · Computer Science 2025-08-19 Yuyang Xu , Yi Cheng , Haochao Ying , Zhuoyun Du , Renjun Hu , Xing Shi , Wei Lin , Jian Wu

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by…

Machine Learning · Computer Science 2026-02-02 Deyang Kong , Qi Guo , Xiangyu Xi , Wei Wang , Jingang Wang , Xunliang Cai , Shikun Zhang , Wei Ye

SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice…

Artificial Intelligence · Computer Science 2026-04-10 Weiyang Huang , Xuefeng Bai , Kehai Chen , Xinyang Chen , Yibin Chen , Weili Guan , Min Zhang

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning

Large reasoning models improve with more test-time computation, but often overthink, producing unnecessarily long chains-of-thought that raise cost without improving accuracy. Prior reinforcement learning approaches typically rely on a…

Computation and Language · Computer Science 2026-03-03 Xintong Li , Sha Li , Rongmei Lin , Hongye Jin , Linwei Li , Hejie Cui , Sarah Zhang , Chia-Yuan Chang , Kewei Cheng , Besnik Fetahu , Priyanka Nigam , Jingbo Shang , Bing Yin

SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training…

Computation and Language · Computer Science 2025-01-08 Yuchun Fan , Yongyu Mu , Yilin Wang , Lei Huang , Junhao Ruan , Bei Li , Tong Xiao , Shujian Huang , Xiaocheng Feng , Jingbo Zhu

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

Large reasoning models (LRMs) achieve strong performance through extended reasoning traces, but they often exhibit overthinking behavior for low-complexity queries. Existing efforts to mitigate this issue are fundamentally limited by…

Machine Learning · Computer Science 2026-02-27 Zihang Xu , Haozhi Xie , Ziqi Miao , Wuxuan Gong , Chen Qian , Lijun Li

ATLAS: Adaptive Test-Time Latent Steering with External Verifiers for Enhancing LLMs Reasoning

Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without additional training. However, most…

Machine Learning · Computer Science 2026-01-07 Tuc Nguyen , Thai Le

Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) elicits long chain-of-thought reasoning in large language models (LLMs), but outcome-based rewards lead to coarse-grained advantage estimation. While existing approaches improve RLVR via…

Computation and Language · Computer Science 2026-01-08 Fei Wu , Zhenrong Zhang , Qikai Chang , Jianshu Zhang , Quan Liu , Jun Du

LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender…

Information Retrieval · Computer Science 2026-05-12 Yiwen Chen , Fuwei Zhang , Zehao Chen , Deqing Wang , Hehan Li , Peizhi Xu , Hanmeng Liu , Shuanglong Li , Xin Pei , Fuzhen Zhuang , Zhao Zhang

Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training

Training Large Language Models (LLMs) for reasoning tasks is increasingly driven by Reinforcement Learning with Verifiable Rewards (RLVR), where Proximal Policy Optimization (PPO) provides a principled framework for stable policy updates.…

Machine Learning · Computer Science 2026-01-13 Xue Gong , Qi Yi , Ziyuan Nan , Guanhua Huang , Kejiao Li , Yuhao Jiang , Ruibin Xiong , Zenan Xu , Jiaming Guo , Shaohui Peng , Bo Zhou

AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control

Large reasoning models (LRMs) achieve impressive reasoning capabilities by generating lengthy chain-of-thoughts, but this "overthinking" incurs high latency and cost without commensurate accuracy gains. In this work, we introduce AALC, a…

Computation and Language · Computer Science 2025-08-11 Ruosen Li , Ziming Luo , Quan Zhang , Ruochen Li , Ben Zhou , Ali Payani , Xinya Du

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Long chain-of-thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs). However, extensive reasoning traces lead to inefficiencies and increased time-to-first-token (TTFT). We propose a training…

Computation and Language · Computer Science 2026-01-08 Roy Xie , David Qiu , Deepak Gopinath , Dong Lin , Yanchao Sun , Chong Wang , Saloni Potdar , Bhuwan Dhingra

Learning to Reason Efficiently with A* Post-Training

Many applications of large language models (LLMs) require deductive reasoning, yet models frequently produce incorrect or redundant inference steps. We frame natural language inference as a search problem where the final answer is the valid…

Artificial Intelligence · Computer Science 2026-05-26 Andreas Opedal , Francesco Ignazio Re , Abulhair Saparov , Mrinmaya Sachan , Bernhard Schölkopf , Ryan Cotterell

ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models

Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods…

Artificial Intelligence · Computer Science 2025-10-13 Dongqi Zheng

SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph

Large Language Models (LLMs) have demonstrated remarkable capabilities, enabling language agents to excel at single-turn tasks. However, their application to complex, multi-step, and long-horizon tasks remains challenging. While…

Machine Learning · Computer Science 2025-10-24 Jiazheng Li , Yawei Wang , David Yan , Yijun Tian , Zhichao Xu , Huan Song , Panpan Xu , Lin Lee Cheong

SSR: Speculative Parallel Scaling Reasoning in Test-time

Large language models (LLMs) have achieved impressive results on multi-step mathematical reasoning, yet at the cost of high computational overhead. This challenge is particularly acute for test-time scaling methods such as parallel…

Machine Learning · Computer Science 2026-03-24 Yuanlin Chu , Bo Wang , Xiang Liu , Hong Chen , Aiwei Liu , Xuming Hu

Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning

The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of-thought (CoT). However, CoT often becomes…

Computation and Language · Computer Science 2026-01-07 Nathanaël Carraz Rakotonirina , Ren Pang , Neha Anna John , Michael Bohlke-Schneider , Momchil Hardalov

Lost at the Beginning of Reasoning

Recent advancements in large language models (LLMs) have significantly advanced complex reasoning capabilities, particularly through extended chain-of-thought (CoT) reasoning that incorporates mechanisms such as backtracking,…

Computation and Language · Computer Science 2025-10-21 Baohao Liao , Xinyi Chen , Sara Rajaee , Yuhui Xu , Christian Herold , Anders Søgaard , Maarten de Rijke , Christof Monz

The Art of Efficient Reasoning: Data, Reward, and Optimization

Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking…

Computation and Language · Computer Science 2026-03-23 Taiqiang Wu , Zenan Xu , Bo Zhou , Ngai Wong

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

Large language models (LLMs) have been shown to be capable of impressive few-shot generalisation to new tasks. However, they still tend to perform poorly on multi-step logical reasoning problems. Here we carry out a comprehensive evaluation…

Artificial Intelligence · Computer Science 2022-05-20 Antonia Creswell , Murray Shanahan , Irina Higgins