Related papers: Improving Latent Generalization Using Test-time Co…

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Long chain-of-thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs). However, extensive reasoning traces lead to inefficiencies and increased time-to-first-token (TTFT). We propose a training…

Computation and Language · Computer Science 2026-01-08 Roy Xie , David Qiu , Deepak Gopinath , Dong Lin , Yanchao Sun , Chong Wang , Saloni Potdar , Bhuwan Dhingra

On the generalization of language models from in-context learning and finetuning: a controlled study

Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning. E.g. they can fail to generalize to simple reversals of relations they are trained on, or fail to make simple logical…

Computation and Language · Computer Science 2025-11-12 Andrew K. Lampinen , Arslan Chaudhry , Stephanie C. Y. Chan , Cody Wild , Diane Wan , Alex Ku , Jörg Bornschein , Razvan Pascanu , Murray Shanahan , James L. McClelland

Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

While large language models (LLMs) showcase unprecedented capabilities, they also exhibit certain inherent limitations when facing seemingly trivial tasks. A prime example is the recently debated "reversal curse", which surfaces when…

Computation and Language · Computer Science 2024-11-25 Zhengkai Lin , Zhihang Fu , Kai Liu , Liang Xie , Binbin Lin , Wenxiao Wang , Deng Cai , Yue Wu , Jieping Ye

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the…

Computation and Language · Computer Science 2024-04-11 Aaron Mueller , Albert Webson , Jackson Petty , Tal Linzen

RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning

Despite rapid advancements in large language models (LLMs), the token-level autoregressive nature constrains their complex reasoning capabilities. To enhance LLM reasoning, inference-time techniques, including…

Artificial Intelligence · Computer Science 2026-01-28 Qianyue Hao , Sibo Li , Jian Yuan , Yong Li

Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models

Generative Large Language Models (LLMs) are capable of being in-context learners. However, the underlying mechanism of in-context learning (ICL) is still a major research question, and experimental research results about how models exploit…

Computation and Language · Computer Science 2025-02-11 Aliakbar Nafar , Kristen Brent Venable , Parisa Kordjamshidi

Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning

Recent advancements in large language models (LLMs) have demonstrated emergent capabilities in complex reasoning, largely spurred by rule-based Reinforcement Learning (RL) techniques applied during the post-training. This has raised the…

Machine Learning · Computer Science 2025-07-22 Sneheel Sarangi , Hanan Salam

Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs

Rote learning is a memorization technique based on repetition. Many researchers argue that rote learning hinders generalization because it encourages verbatim memorization rather than deeper understanding. This concern extends even to…

Computation and Language · Computer Science 2026-03-03 Qinyuan Wu , Soumi Das , Mahsa Amani , Bishwamittra Ghosh , Mohammad Aflah Khan , Krishna P. Gummadi , Muhammad Bilal Zafar

Training Language Models to Reason Efficiently

Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly…

Machine Learning · Computer Science 2025-11-05 Daman Arora , Andrea Zanette

Latent Thought Models with Variational Bayes Inference-Time Computation

We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive…

Computation and Language · Computer Science 2025-06-10 Deqian Kong , Minglu Zhao , Dehong Xu , Bo Pang , Shu Wang , Edouardo Honig , Zhangzhang Si , Chuan Li , Jianwen Xie , Sirui Xie , Ying Nian Wu

Measuring Systematic Generalization in Neural Proof Generation with Transformers

We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded in the form of natural language. We investigate their systematic generalization abilities on a…

Machine Learning · Computer Science 2020-10-22 Nicolas Gontier , Koustuv Sinha , Siva Reddy , Christopher Pal

Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs

Enhancing the complex reasoning capabilities of Large Language Models (LLMs) attracts widespread attention. While reinforcement learning (RL) has shown superior performance for improving complex reasoning, its impact on cross-lingual…

Computation and Language · Computer Science 2025-09-30 Shulin Huang , Yiran Ding , Junshu Pan , Yue Zhang

Reverse Thinking Enhances Missing Information Detection in Large Language Models

Large Language Models (LLMs) have demonstrated remarkable capabilities in various reasoning tasks, yet they often struggle with problems involving missing information, exhibiting issues such as incomplete responses, factual errors, and…

Artificial Intelligence · Computer Science 2025-12-12 Yuxin Liu , Chaojie Gu , Yihang Zhang , Bin Qian , Shibo He

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on…

Computation and Language · Computer Science 2026-01-06 DeepSeek-AI , Daya Guo , Dejian Yang , Haowei Zhang , Junxiao Song , Peiyi Wang , Qihao Zhu , Runxin Xu , Ruoyu Zhang , Shirong Ma , Xiao Bi , Xiaokang Zhang , Xingkai Yu , Yu Wu , Z. F. Wu , Zhibin Gou , Zhihong Shao , Zhuoshu Li , Ziyi Gao , Aixin Liu , Bing Xue , Bingxuan Wang , Bochao Wu , Bei Feng , Chengda Lu , Chenggang Zhao , Chengqi Deng , Chenyu Zhang , Chong Ruan , Damai Dai , Deli Chen , Dongjie Ji , Erhang Li , Fangyun Lin , Fucong Dai , Fuli Luo , Guangbo Hao , Guanting Chen , Guowei Li , H. Zhang , Han Bao , Hanwei Xu , Haocheng Wang , Honghui Ding , Huajian Xin , Huazuo Gao , Hui Qu , Hui Li , Jianzhong Guo , Jiashi Li , Jiawei Wang , Jingchang Chen , Jingyang Yuan , Junjie Qiu , Junlong Li , J. L. Cai , Jiaqi Ni , Jian Liang , Jin Chen , Kai Dong , Kai Hu , Kaige Gao , Kang Guan , Kexin Huang , Kuai Yu , Lean Wang , Lecong Zhang , Liang Zhao , Litong Wang , Liyue Zhang , Lei Xu , Leyi Xia , Mingchuan Zhang , Minghua Zhang , Minghui Tang , Meng Li , Miaojun Wang , Mingming Li , Ning Tian , Panpan Huang , Peng Zhang , Qiancheng Wang , Qinyu Chen , Qiushi Du , Ruiqi Ge , Ruisong Zhang , Ruizhe Pan , Runji Wang , R. J. Chen , R. L. Jin , Ruyi Chen , Shanghao Lu , Shangyan Zhou , Shanhuang Chen , Shengfeng Ye , Shiyu Wang , Shuiping Yu , Shunfeng Zhou , Shuting Pan , S. S. Li , Shuang Zhou , Shaoqing Wu , Shengfeng Ye , Tao Yun , Tian Pei , Tianyu Sun , T. Wang , Wangding Zeng , Wanjia Zhao , Wen Liu , Wenfeng Liang , Wenjun Gao , Wenqin Yu , Wentao Zhang , W. L. Xiao , Wei An , Xiaodong Liu , Xiaohan Wang , Xiaokang Chen , Xiaotao Nie , Xin Cheng , Xin Liu , Xin Xie , Xingchao Liu , Xinyu Yang , Xinyuan Li , Xuecheng Su , Xuheng Lin , X. Q. Li , Xiangyue Jin , Xiaojin Shen , Xiaosha Chen , Xiaowen Sun , Xiaoxiang Wang , Xinnan Song , Xinyi Zhou , Xianzu Wang , Xinxia Shan , Y. K. Li , Y. Q. Wang , Y. X. Wei , Yang Zhang , Yanhong Xu , Yao Li , Yao Zhao , Yaofeng Sun , Yaohui Wang , Yi Yu , Yichao Zhang , Yifan Shi , Yiliang Xiong , Ying He , Yishi Piao , Yisong Wang , Yixuan Tan , Yiyang Ma , Yiyuan Liu , Yongqiang Guo , Yuan Ou , Yuduan Wang , Yue Gong , Yuheng Zou , Yujia He , Yunfan Xiong , Yuxiang Luo , Yuxiang You , Yuxuan Liu , Yuyang Zhou , Y. X. Zhu , Yanhong Xu , Yanping Huang , Yaohui Li , Yi Zheng , Yuchen Zhu , Yunxian Ma , Ying Tang , Yukun Zha , Yuting Yan , Z. Z. Ren , Zehui Ren , Zhangli Sha , Zhe Fu , Zhean Xu , Zhenda Xie , Zhengyan Zhang , Zhewen Hao , Zhicheng Ma , Zhigang Yan , Zhiyu Wu , Zihui Gu , Zijia Zhu , Zijun Liu , Zilin Li , Ziwei Xie , Ziyang Song , Zizheng Pan , Zhen Huang , Zhipeng Xu , Zhongyu Zhang , Zhen Zhang

Large Reasoning Models are not thinking straight: on the unreliability of thinking trajectories

Large Language Models (LLMs) trained via Reinforcement Learning (RL) have recently achieved impressive results on reasoning benchmarks. Yet, growing evidence shows that these models often generate longer but ineffective chains of thought…

Machine Learning · Computer Science 2025-07-02 Jhouben Cuesta-Ramirez , Samuel Beaussant , Mehdi Mounsif

Thinker: Learning to Think Fast and Slow

Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs…

Computation and Language · Computer Science 2025-10-17 Stephen Chung , Wenyu Du , Jie Fu

Reverse Thinking Makes LLMs Stronger Reasoners

Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. This often enhances overall reasoning…

Computation and Language · Computer Science 2025-03-11 Justin Chih-Yao Chen , Zifeng Wang , Hamid Palangi , Rujun Han , Sayna Ebrahimi , Long Le , Vincent Perot , Swaroop Mishra , Mohit Bansal , Chen-Yu Lee , Tomas Pfister

Learning from Failures in Multi-Attempt Reinforcement Learning

Recent advancements in reinforcement learning (RL) for large language models (LLMs), exemplified by DeepSeek R1, have shown that even a simple question-answering task can substantially improve an LLM's reasoning capabilities. In this work,…

Computation and Language · Computer Science 2025-03-10 Stephen Chung , Wenyu Du , Jie Fu

How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns

Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to preserve it. The reasons behind this divergence…

Machine Learning · Computer Science 2026-01-01 Haoyue Bai , Yiyou Sun , Wenjie Hu , Shi Qiu , Maggie Ziyu Huan , Peiyang Song , Robert Nowak , Dawn Song

Large Language Models Can Learn Temporal Reasoning

While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning…

Computation and Language · Computer Science 2024-10-10 Siheng Xiong , Ali Payani , Ramana Kompella , Faramarz Fekri