English
Related papers

Related papers: Can LLMs Reason Structurally? Benchmarking via the…

200 papers

Structure reasoning is a fundamental capability of large language models (LLMs), enabling them to reason about structured commonsense and answer multi-hop questions. However, existing benchmarks for structure reasoning mainly focus on…

Computation and Language · Computer Science 2025-03-04 Zhuohang Jiang , Pangjing Wu , Ziran Liang , Peter Q. Chen , Xu Yuan , Ye Jia , Jiancheng Tu , Chen Li , Peter H. F. Ng , Qing Li

Test-time scaling has significantly improved large language model performance, enabling deeper reasoning to solve complex problems. However, this increased reasoning capability also leads to excessive token generation and unnecessary…

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings…

Artificial Intelligence · Computer Science 2025-10-28 Revanth Rameshkumar , Jimson Huang , Yunxin Sun , Fei Xia , Abulhair Saparov

Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning…

Computation and Language · Computer Science 2026-05-20 Husnain Amjad , Raja Khurram Shahzad , Aamir Shahzad , Mehwish Fatima

We present Team asdfo123's submission to the LLMSR@XLLM25 shared task, which evaluates large language models on producing fine-grained, controllable, and interpretable reasoning processes. Systems must extract all problem conditions,…

Computation and Language · Computer Science 2025-05-20 Xinye Li , Mingqi Wan , Dianbo Sui

Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their…

Artificial Intelligence · Computer Science 2025-11-21 Parshin Shojaee , Iman Mirzadeh , Keivan Alizadeh , Maxwell Horton , Samy Bengio , Mehrdad Farajtabar

Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general…

Artificial Intelligence · Computer Science 2024-01-18 Zhiming Li , Yushi Cao , Xiufeng Xu , Junzhe Jiang , Xu Liu , Yon Shin Teo , Shang-wei Lin , Yang Liu

Large language models (LLMs) have demonstrated their remarkable performance across various language understanding tasks. While emerging benchmarks have been proposed to evaluate LLMs in various domains such as mathematics and computer…

Artificial Intelligence · Computer Science 2024-10-28 Junnan Dong , Zijin Hong , Yuanchen Bei , Feiran Huang , Xinrun Wang , Xiao Huang

This paper investigates the ability of large language models (LLMs) to solve statistical tasks, as well as their capacity to assess the quality of reasoning. While state-of-the-art LLMs have demonstrated remarkable performance in a range of…

Computation and Language · Computer Science 2026-01-22 Crish Nagarkar , Leonid Bogachev , Serge Sharoff

Recent large language models (LLMs) achieve near-saturation accuracy on many established mathematical reasoning benchmarks, raising concerns about their ability to diagnose genuine reasoning competence. This saturation largely stems from…

While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically…

Computation and Language · Computer Science 2024-07-31 Tianshi Zheng , Jiaxin Bai , Yicheng Wang , Tianqing Fang , Yue Guo , Yauwai Yim , Yangqiu Song

Large Reasoning Models (LRMs) have advanced rapidly; however, existing benchmarks in mathematics, code, and common-sense reasoning remain limited. They lack long-context evaluation, offer insufficient challenge, and provide answers that are…

Artificial Intelligence · Computer Science 2026-02-09 Qifan Zhang , Jianhao Ruan , Aochuan Chen , Kang Zeng , Nuo Chen , Jing Tang , Jia Li

Recent advances in large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking. However, whether LLMs possess genuine fluid intelligence (i.e., the ability to reason abstractly and…

Artificial Intelligence · Computer Science 2025-09-30 Yue Yang , MingKang Chen , Qihua Liu , Mengkang Hu , Qiguang Chen , Gengrui Zhang , Shuyue Hu , Guangtao Zhai , Yu Qiao , Yu Wang , Wenqi Shao , Ping Luo

Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language…

Computation and Language · Computer Science 2024-09-17 Fangzhi Xu , Qika Lin , Jiawei Han , Tianzhe Zhao , Jun Liu , Erik Cambria

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast,…

Machine Learning · Computer Science 2025-06-11 Zhanke Zhou , Xiao Feng , Zhaocheng Zhu , Jiangchao Yao , Sanmi Koyejo , Bo Han

Geometric spatial reasoning forms the foundation of many applications in artificial intelligence, yet the ability of large language models (LLMs) to operate over geometric spatial information expressed in procedural code remains…

Artificial Intelligence · Computer Science 2026-02-11 Shixian Luo , Zezhou Zhu , Yu Yuan , Yuncheng Yang , Lianlei Shan , Yong Wu

With the increasing use of large language models (LLMs), ensuring reliable performance in diverse, real-world environments is essential. Despite their remarkable achievements, LLMs often struggle with adversarial inputs, significantly…

Computation and Language · Computer Science 2024-06-18 Yuqing Wang , Yun Zhao

Recent Large Language Models (LLMs) have significantly advanced natural language processing and automated decision-making. However, these models still encounter difficulties when performing complex reasoning tasks involving logical…

Computation and Language · Computer Science 2025-06-26 Yubo Dong , Hehe Fan

Recent advancements in reasoning-enhanced large language models (LLMs), such as DeepSeek-R1 and OpenAI-o3, have demonstrated significant progress. However, their application in professional medical contexts remains underexplored,…

Computation and Language · Computer Science 2025-03-11 Pengcheng Qiu , Chaoyi Wu , Shuyu Liu , Weike Zhao , Zhuoxia Chen , Hongfei Gu , Chuanjin Peng , Ya Zhang , Yanfeng Wang , Weidi Xie

Pre-trained large language models (LMs) struggle to perform logical reasoning reliably despite advances in scale and compositionality. In this work, we tackle this challenge through the lens of symbolic programming. We propose DSR-LM, a…

Artificial Intelligence · Computer Science 2023-05-09 Hanlin Zhang , Jiani Huang , Ziyang Li , Mayur Naik , Eric Xing
‹ Prev 1 2 3 10 Next ›