English
Related papers

Related papers: MathConstruct: Challenging LLM Reasoning with Cons…

200 papers

Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions -- failing to capture the nature of mathematics…

Artificial Intelligence · Computer Science 2025-10-21 Jie Zhang , Cezara Petrui , Kristina Nikolić , Florian Tramèr

Although demonstrating remarkable performance on reasoning tasks, Large Language Models (LLMs) still tend to fabricate unreliable responses when confronted with problems that are unsolvable or beyond their capability, severely undermining…

Computation and Language · Computer Science 2025-11-13 Boyang Xue , Qi Zhu , Rui Wang , Sheng Wang , Hongru Wang , Minda Hu , Fei Mi , Yasheng Wang , Lifeng Shang , Qun Liu , Kam-Fai Wong

Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning…

Computation and Language · Computer Science 2026-05-20 Husnain Amjad , Raja Khurram Shahzad , Aamir Shahzad , Mehwish Fatima

The rapid advancement of large language models (LLMs) demands robust, unbiased, and scalable evaluation methods. However, human annotations are costly to scale, model-based evaluations are susceptible to stylistic biases, and…

We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for…

Artificial Intelligence · Computer Science 2026-03-02 Antoine Peyronnet , Fabian Gloeckle , Amaury Hayat

Large Language Models (LLMs) have recently achieved impressive performance in math and reasoning benchmarks. However, they often struggle with logic problems and puzzles that are relatively easy for humans. To further investigate this, we…

Artificial Intelligence · Computer Science 2025-09-16 Nasim Borazjanizadeh , Roei Herzig , Trevor Darrell , Rogerio Feris , Leonid Karlinsky

Large language models (LLMs) have become capable mathematical problem-solvers, often producing correct proofs for challenging problems. However, correctness alone is not sufficient: mathematical proofs should also be clear, concise,…

Computation and Language · Computer Science 2026-05-12 Ivo Petrov , Jasper Dekoninck , Dimitar I. Dimitrov , Martin Vechev

With the rapid progress of Multimodal LLMs, evaluating their mathematical reasoning capabilities has become an increasingly important research direction. In particular, visual-textual mathematical reasoning serves as a key indicator of an…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Hao Liang , Linzhuang Sun , Minxuan Zhou , Zirong Chen , Meiyi Qiang , Mingan Lin , Tianpeng Li , Fan Yang , Zenan Zhou , Wentao Zhang

Large Language Models (LLMs) have shown remarkable success on a wide range of math and reasoning benchmarks. However, we observe that they often struggle when faced with unreasonable math problems. Instead of recognizing these issues,…

Computation and Language · Computer Science 2025-06-03 Jingyuan Ma , Damai Dai , Zihang Yuan , Rui li , Weilin Luo , Bin Wang , Qun Liu , Lei Sha , Zhifang Sui

Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the…

Computation and Language · Computer Science 2024-09-18 Janice Ahn , Rishu Verma , Renze Lou , Di Liu , Rui Zhang , Wenpeng Yin

Large Language Models (LLMs) have demonstrated impressive capabilities in structured reasoning and symbolic tasks, with coding emerging as a particularly successful application. This progress has naturally motivated efforts to extend these…

Artificial Intelligence · Computer Science 2026-02-02 Andrea Asperti , Alberto Naibo , Claudio Sacerdoti Coen

The use of Large Language Models (LLMs) in mathematical reasoning has become a cornerstone of related research, demonstrating the intelligence of these models and enabling potential practical applications through their advanced performance,…

Computation and Language · Computer Science 2024-12-20 Kathrin Seßler , Yao Rong , Emek Gözlüklü , Enkelejda Kasneci

Mathematical reasoning is a hallmark of human intelligence, and whether large language models (LLMs) can meaningfully perform it remains a central question in artificial intelligence and cognitive science. As LLMs are increasingly…

Computation and Language · Computer Science 2026-04-03 Linyang He , Qiyao Yu , Hanze Dong , Baohao Liao , Xinxing Xu , Micah Goldblum , Jiang Bian , Nima Mesgarani

Large Language Models (LLMs) are transformative not only for daily activities but also for engineering tasks. However, current evaluations of LLMs in engineering exhibit two critical shortcomings: (i) the reliance on simplified use cases,…

Artificial Intelligence · Computer Science 2025-05-21 Rene Heesch , Sebastian Eilermann , Alexander Windmann , Alexander Diedrich , Philipp Rosenthal , Oliver Niggemann

Large language models (LLMs) demonstrate impressive capabilities in mathematical reasoning. However, despite these achievements, current evaluations are mostly limited to specific mathematical topics, and it remains unclear whether LLMs are…

Computation and Language · Computer Science 2025-04-01 Arash Gholami Davoodi , Seyed Pouyan Mousavi Davoudi , Pouya Pezeshkpour

Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks, yet their proficiency in mathematical reasoning remains a key challenge. Addressing the gap between natural and mathematical…

Artificial Intelligence · Computer Science 2025-02-18 Xuhan Huang , Qingning Shen , Yan Hu , Anningzhe Gao , Benyou Wang

We introduce SATBench, a benchmark for evaluating the logical reasoning capabilities of large language models (LLMs) through logical puzzles derived from Boolean satisfiability (SAT) problems. Unlike prior work that focuses on inference…

Artificial Intelligence · Computer Science 2025-09-23 Anjiang Wei , Yuheng Wu , Yingjia Wan , Tarun Suresh , Huanmi Tan , Zhanke Zhou , Sanmi Koyejo , Ke Wang , Alex Aiken

Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user…

Computation and Language · Computer Science 2024-10-10 Zihao Zhou , Shudong Liu , Maizhen Ning , Wei Liu , Jindong Wang , Derek F. Wong , Xiaowei Huang , Qiufeng Wang , Kaizhu Huang

Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization. To…

‹ Prev 1 2 3 10 Next ›