Related papers: Case-Based or Rule-Based: How Do Transformers Do t…

Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

Length generalization, the ability to solve problems longer than those seen during training, remains a critical challenge for large language models (LLMs). Previous work modifies positional encodings (PEs) and data formats to improve length…

Computation and Language · Computer Science 2025-05-20 Yi Hu , Shijia Kang , Haotong Yang , Haotian Xu , Muhan Zhang

RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning

Formal logic enables computers to reason in natural language by representing sentences in symbolic forms and applying rules to derive conclusions. However, in what our study characterizes as "rulebreaker" scenarios, this method can lead to…

Computation and Language · Computer Science 2025-08-18 Jason Chan , Robert Gaizauskas , Zhixue Zhao

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Investigating the reasoning abilities of transformer models, and discovering new challenging tasks for them, has been a topic of much interest. Recent studies have found these models to be surprisingly strong at performing deductive…

Computation and Language · Computer Science 2021-12-17 Kyle Richardson , Ashish Sabharwal

MetaRuleGPT: Recursive Numerical Reasoning of Language Models Trained with Simple Rules

Recent studies have highlighted the limitations of large language models in mathematical reasoning, particularly their inability to capture the underlying logic. Inspired by meta-learning, we propose that models should acquire not only…

Computation and Language · Computer Science 2024-12-19 Kejie Chen , Lin Wang , Qinghai Zhang , Renjun Xu

Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them

Large language models (LLMs) have been able to perform various forms of reasoning tasks in a wide range of scenarios, but are they truly engaging in task abstraction and rule-based reasoning beyond mere memorization? To answer this…

Machine Learning · Computer Science 2025-12-09 Guanyu Chen , Peiyang Wang , Yizhou Jiang , Yuqian Liu , Chujie Zhao , Ying Fang , Tianren Zhang , Feng Chen

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth…

Artificial Intelligence · Computer Science 2025-10-21 Maggie Huan , Yuetai Li , Tuney Zheng , Xiaoyu Xu , Seungone Kim , Minxin Du , Radha Poovendran , Graham Neubig , Xiang Yue

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

Large Language Models (LLMs) have shown remarkable performance in various natural language processing tasks but face challenges in mathematical reasoning, where complex problem-solving requires both linguistic understanding and mathematical…

Computation and Language · Computer Science 2025-03-20 Shuguang Chen , Guang Lin

When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Large Language Models (LLMs) with reasoning capabilities have achieved state-of-the-art performance on a wide range of tasks. Despite its empirical success, the tasks and model scales at which reasoning becomes effective, as well as its…

Computation and Language · Computer Science 2025-09-29 Nicolas Boizard , Hippolyte Gisserot-Boukhlef , Kevin El-Haddad , Céline Hudelot , Pierre Colombo

ReFT: Reasoning with Reinforced Fine-Tuning

One way to enhance the reasoning capability of Large Language Models (LLMs) is to conduct Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) annotations. This approach does not show sufficiently strong generalization ability,…

Computation and Language · Computer Science 2024-12-16 Trung Quoc Luong , Xinbo Zhang , Zhanming Jie , Peng Sun , Xiaoran Jin , Hang Li

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data…

Computation and Language · Computer Science 2023-09-14 Zheng Yuan , Hongyi Yuan , Chengpeng Li , Guanting Dong , Keming Lu , Chuanqi Tan , Chang Zhou , Jingren Zhou

Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic

Large language models (LLMs) achieve impressive results on advanced mathematics benchmarks but sometimes fail on basic arithmetic tasks, raising the question of whether they have truly grasped fundamental arithmetic rules or are merely…

Computation and Language · Computer Science 2025-09-18 Yang Yan , Yu Lu , Renjun Xu , Zhenzhong Lan

Reasoning Models Reason Well, Until They Don't

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings…

Artificial Intelligence · Computer Science 2025-10-28 Revanth Rameshkumar , Jimson Huang , Yunxin Sun , Fei Xia , Abulhair Saparov

Benchmarking Large Language Models for Math Reasoning Tasks

The use of Large Language Models (LLMs) in mathematical reasoning has become a cornerstone of related research, demonstrating the intelligence of these models and enabling potential practical applications through their advanced performance,…

Computation and Language · Computer Science 2024-12-20 Kathrin Seßler , Yao Rong , Emek Gözlüklü , Enkelejda Kasneci

How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns

Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to preserve it. The reasons behind this divergence…

Machine Learning · Computer Science 2026-01-01 Haoyue Bai , Yiyou Sun , Wenjie Hu , Shi Qiu , Maggie Ziyu Huan , Peiyang Song , Robert Nowak , Dawn Song

Do Large Language Models Understand Logic or Just Mimick Context?

Over the past few years, the abilities of large language models (LLMs) have received extensive attention, which have performed exceptionally well in complicated scenarios such as logical reasoning and symbolic inference. A significant…

Computation and Language · Computer Science 2024-02-20 Junbing Yan , Chengyu Wang , Jun Huang , Wei Zhang

JudgeLRM: Large Reasoning Models as a Judge

Large Language Models (LLMs) are increasingly adopted as evaluators, offering a scalable alternative to human annotation. However, existing supervised fine-tuning (SFT) approaches often fall short in domains that demand complex reasoning.…

Computation and Language · Computer Science 2025-11-04 Nuo Chen , Zhiyuan Hu , Qingyun Zou , Jiaying Wu , Qian Wang , Bryan Hooi , Bingsheng He

Linear Reasoning vs. Proof by Cases: Obstacles for Large Language Models in FOL Problem Solving

To comprehensively evaluate the mathematical reasoning capabilities of Large Language Models (LLMs), researchers have introduced abundant mathematical reasoning datasets. However, most existing datasets primarily focus on linear reasoning,…

Computation and Language · Computer Science 2026-02-25 Yuliang Ji , Fuchen Shen , Jian Wu , Qiujie Xie , Yue Zhang

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far…

Artificial Intelligence · Computer Science 2024-07-18 Liang Zeng , Liangjun Zhong , Liang Zhao , Tianwen Wei , Liu Yang , Jujie He , Cheng Cheng , Rui Hu , Yang Liu , Shuicheng Yan , Han Fang , Yahui Zhou

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for…

Machine Learning · Computer Science 2025-04-02 Vidhisha Balachandran , Jingya Chen , Lingjiao Chen , Shivam Garg , Neel Joshi , Yash Lara , John Langford , Besmira Nushi , Vibhav Vineet , Yue Wu , Safoora Yousefi

Interpreting and Improving Large Language Models in Arithmetic Calculation

Large language models (LLMs) have demonstrated remarkable potential across numerous applications and have shown an emergent ability to tackle complex reasoning tasks, such as mathematical computations. However, even for the simplest…

Computation and Language · Computer Science 2024-09-04 Wei Zhang , Chaoqun Wan , Yonggang Zhang , Yiu-ming Cheung , Xinmei Tian , Xu Shen , Jieping Ye