Related papers: Can LLMs Reason Structurally? Benchmarking via the…

HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning

Structure reasoning is a fundamental capability of large language models (LLMs), enabling them to reason about structured commonsense and answer multi-hop questions. However, existing benchmarks for structure reasoning mainly focus on…

Computation and Language · Computer Science 2025-03-04 Zhuohang Jiang , Pangjing Wu , Ziran Liang , Peter Q. Chen , Xu Yuan , Ye Jia , Jiancheng Tu , Chen Li , Peter H. F. Ng , Qing Li

DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs

Test-time scaling has significantly improved large language model performance, enabling deeper reasoning to solve complex problems. However, this increased reasoning capability also leads to excessive token generation and unnecessary…

Machine Learning · Computer Science 2025-04-21 Masoud Hashemi , Oluwanifemi Bamgbose , Sathwik Tejaswi Madhusudhan , Jishnu Sethumadhavan Nair , Aman Tiwari , Vikas Yadav

Reasoning Models Reason Well, Until They Don't

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings…

Artificial Intelligence · Computer Science 2025-10-28 Revanth Rameshkumar , Jimson Huang , Yunxin Sun , Fei Xia , Abulhair Saparov

Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning…

Computation and Language · Computer Science 2026-05-20 Husnain Amjad , Raja Khurram Shahzad , Aamir Shahzad , Mehwish Fatima

LLMSR@XLLM25: An Empirical Study of LLM for Structural Reasoning

We present Team asdfo123's submission to the LLMSR@XLLM25 shared task, which evaluates large language models on producing fine-grained, controllable, and interpretable reasoning processes. Systems must extract all problem conditions,…

Computation and Language · Computer Science 2025-05-20 Xinye Li , Mingqi Wan , Dianbo Sui

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their…

Artificial Intelligence · Computer Science 2025-11-21 Parshin Shojaee , Iman Mirzadeh , Keivan Alizadeh , Maxwell Horton , Samy Bengio , Mehrdad Farajtabar

LLMs for Relational Reasoning: How Far are We?

Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general…

Artificial Intelligence · Computer Science 2024-01-18 Zhiming Li , Yushi Cao , Xiufeng Xu , Junzhe Jiang , Xu Liu , Yon Shin Teo , Shang-wei Lin , Yang Liu

CLR-Bench: Evaluating Large Language Models in College-level Reasoning

Large language models (LLMs) have demonstrated their remarkable performance across various language understanding tasks. While emerging benchmarks have been proposed to evaluate LLMs in various domains such as mathematics and computer…

Artificial Intelligence · Computer Science 2024-10-28 Junnan Dong , Zijin Hong , Yuanchen Bei , Feiran Huang , Xinrun Wang , Xiao Huang

Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks

This paper investigates the ability of large language models (LLMs) to solve statistical tasks, as well as their capacity to assess the quality of reasoning. While state-of-the-art LLMs have demonstrated remarkable performance in a range of…

Computation and Language · Computer Science 2026-01-22 Crish Nagarkar , Leonid Bogachev , Serge Sharoff

Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs

Recent large language models (LLMs) achieve near-saturation accuracy on many established mathematical reasoning benchmarks, raising concerns about their ability to diagnose genuine reasoning competence. This saturation largely stems from…

Artificial Intelligence · Computer Science 2026-02-27 Xiang Zheng , Weiqi Zhai , Wei Wang , Boyu Yang , Wenbo Li , Ruixiang Luo , Haoxiang Sun , Yucheng Wang , Zhengze Li , Meng Wang , Yuetian Du , Guojie Lin , Yaxuan Wang , Xiaoxiao Xu , Yanhu Mo , Xuan Ren , Hu Wei , Bing Zhao

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically…

Computation and Language · Computer Science 2024-07-31 Tianshi Zheng , Jiaxin Bai , Yicheng Wang , Tianqing Fang , Yue Guo , Yauwai Yim , Yangqiu Song

Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems

Large Reasoning Models (LRMs) have advanced rapidly; however, existing benchmarks in mathematics, code, and common-sense reasoning remain limited. They lack long-context evaluation, offer insufficient challenge, and provide answers that are…

Artificial Intelligence · Computer Science 2026-02-09 Qifan Zhang , Jianhao Ruan , Aochuan Chen , Kang Zeng , Nuo Chen , Jing Tang , Jia Li

Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation

Recent advances in large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking. However, whether LLMs possess genuine fluid intelligence (i.e., the ability to reason abstractly and…

Artificial Intelligence · Computer Science 2025-09-30 Yue Yang , MingKang Chen , Qihua Liu , Mengkang Hu , Qiguang Chen , Gengrui Zhang , Shuyue Hu , Guangtao Zhai , Yu Qiao , Yu Wang , Wenqi Shao , Ping Luo

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language…

Computation and Language · Computer Science 2024-09-17 Fangzhi Xu , Qika Lin , Jiawei Han , Tianzhe Zhao , Jun Liu , Erik Cambria

From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast,…

Machine Learning · Computer Science 2025-06-11 Zhanke Zhou , Xiao Feng , Zhaocheng Zhu , Jiangchao Yao , Sanmi Koyejo , Bo Han

GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs

Geometric spatial reasoning forms the foundation of many applications in artificial intelligence, yet the ability of large language models (LLMs) to operate over geometric spatial information expressed in procedural code remains…

Artificial Intelligence · Computer Science 2026-02-11 Shixian Luo , Zezhou Zhu , Yu Yuan , Yuncheng Yang , Lianlei Shan , Yong Wu

RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models

With the increasing use of large language models (LLMs), ensuring reliable performance in diverse, real-world environments is essential. Despite their remarkable achievements, LLMs often struggle with adversarial inputs, significantly…

Computation and Language · Computer Science 2024-06-18 Yuqing Wang , Yun Zhao

Enhancing Large Language Models through Structured Reasoning

Recent Large Language Models (LLMs) have significantly advanced natural language processing and automated decision-making. However, these models still encounter difficulties when performing complex reasoning tasks involving logical…

Computation and Language · Computer Science 2025-06-26 Yubo Dong , Hehe Fan

Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases

Recent advancements in reasoning-enhanced large language models (LLMs), such as DeepSeek-R1 and OpenAI-o3, have demonstrated significant progress. However, their application in professional medical contexts remains underexplored,…

Computation and Language · Computer Science 2025-03-11 Pengcheng Qiu , Chaoyi Wu , Shuyu Liu , Weike Zhao , Zhuoxia Chen , Hongfei Gu , Chuanjin Peng , Ya Zhang , Yanfeng Wang , Weidi Xie

Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming

Pre-trained large language models (LMs) struggle to perform logical reasoning reliably despite advances in scale and compositionality. In this work, we tackle this challenge through the lens of symbolic programming. We propose DSR-LM, a…

Artificial Intelligence · Computer Science 2023-05-09 Hanlin Zhang , Jiani Huang , Ziyang Li , Mayur Naik , Eric Xing