Related papers: Mathify: Evaluating Large Language Models on Mathe…

Benchmarking Large Language Models for Math Reasoning Tasks

The use of Large Language Models (LLMs) in mathematical reasoning has become a cornerstone of related research, demonstrating the intelligence of these models and enabling potential practical applications through their advanced performance,…

Computation and Language · Computer Science 2024-12-20 Kathrin Seßler , Yao Rong , Emek Gözlüklü , Enkelejda Kasneci

Large Language Models for Mathematical Reasoning: Progresses and Challenges

Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the…

Computation and Language · Computer Science 2024-09-18 Janice Ahn , Rishu Verma , Renze Lou , Di Liu , Rui Zhang , Wenpeng Yin

Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective

The progress of Large Language Models (LLMs) like ChatGPT raises the question of how they can be integrated into education. One hope is that they can support mathematics learning, including word-problem solving. Since LLMs can handle…

Computation and Language · Computer Science 2025-08-12 Anselm R. Strohmaier , Wim Van Dooren , Kathrin Seßler , Brian Greer , Lieven Verschaffel

LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages

Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks, yet their proficiency in mathematical reasoning remains a key challenge. Addressing the gap between natural and mathematical…

Artificial Intelligence · Computer Science 2025-02-18 Xuhan Huang , Qingning Shen , Yan Hu , Anningzhe Gao , Benyou Wang

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

Large Language Models (LLMs) have shown remarkable performance in various natural language processing tasks but face challenges in mathematical reasoning, where complex problem-solving requires both linguistic understanding and mathematical…

Computation and Language · Computer Science 2025-03-20 Shuguang Chen , Guang Lin

Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with…

Artificial Intelligence · Computer Science 2026-03-27 Liang Zhang , Yu Fu , Xinyi Jin

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions -- failing to capture the nature of mathematics…

Artificial Intelligence · Computer Science 2025-10-21 Jie Zhang , Cezara Petrui , Kristina Nikolić , Florian Tramèr

Large Language Models for Mathematical Analysis

Mathematical problem-solving is a key field in artificial intelligence (AI) and a critical benchmark for evaluating the capabilities of large language models (LLMs). While extensive research has focused on mathematical problem-solving, most…

Computation and Language · Computer Science 2025-01-03 Ziye Chen , Hao Qi

Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange

Large Language Models (LLMs) have demonstrated exceptional capabilities in various natural language tasks, often achieving performances that surpass those of humans. Despite these advancements, the domain of mathematics presents a…

Computation and Language · Computer Science 2024-04-02 Ankit Satpute , Noah Giessing , Andre Greiner-Petter , Moritz Schubotz , Olaf Teschke , Akiko Aizawa , Bela Gipp

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the…

Computation and Language · Computer Science 2024-06-27 Meng Fang , Xiangpeng Wan , Fei Lu , Fei Xing , Kai Zou

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

Recent advancements in large language models (LLMs) have showcased significant improvements in mathematics. However, traditional math benchmarks like GSM8k offer a unidimensional perspective, falling short in providing a holistic assessment…

Computation and Language · Computer Science 2024-05-21 Hongwei Liu , Zilong Zheng , Yuxuan Qiao , Haodong Duan , Zhiwei Fei , Fengzhe Zhou , Wenwei Zhang , Songyang Zhang , Dahua Lin , Kai Chen

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from…

Computation and Language · Computer Science 2024-05-06 Longhui Yu , Weisen Jiang , Han Shi , Jincheng Yu , Zhengying Liu , Yu Zhang , James T. Kwok , Zhenguo Li , Adrian Weller , Weiyang Liu

Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning…

Computation and Language · Computer Science 2026-05-20 Husnain Amjad , Raja Khurram Shahzad , Aamir Shahzad , Mehwish Fatima

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

The rapid advancement of Large Language Models (LLMs) in the realm of mathematical reasoning necessitates comprehensive evaluations to gauge progress and inspire future directions. Existing assessments predominantly focus on problem-solving…

Computation and Language · Computer Science 2024-06-05 Xiaoyuan Li , Wenjie Wang , Moxin Li , Junrong Guo , Yang Zhang , Fuli Feng

Teaching LLMs to Think Mathematically: A Critical Study of Decision-Making via Optimization

This paper investigates the capabilities of large language models (LLMs) in formulating and solving decision-making problems using mathematical programming. We first conduct a systematic review and meta-analysis of recent literature to…

Artificial Intelligence · Computer Science 2025-08-26 Mohammad J. Abdel-Rahman , Yasmeen Alslman , Dania Refai , Amro Saleh , Malik A. Abu Loha , Mohammad Yahya Hamed

MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning

Mathematical reasoning remains one of the most challenging domains for large language models (LLMs), requiring not only linguistic understanding but also structured logical deduction and numerical precision. While recent LLMs demonstrate…

Computation and Language · Computer Science 2026-01-27 Mahbub E Sobhani , Md. Faiyaz Abdullah Sayeedi , Tasnim Mohiuddin , Md Mofijul Islam , Swakkhar Shatabda

Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning

Large Language Models (LLMs) have been applied to Math Word Problems (MWPs) with transformative impacts, revolutionizing how these complex problems are approached and solved in various domains including educational settings. However, the…

Computation and Language · Computer Science 2024-06-18 Joykirat Singh , Akshay Nambi , Vibhav Vineet

Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning

Due to the remarkable language understanding and generation abilities of large language models (LLMs), their use in educational applications has been explored. However, little work has been done on investigating the pedagogical ability of…

Computation and Language · Computer Science 2023-10-23 An-Zi Yen , Wei-Ling Hsu

LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics

We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for…

Artificial Intelligence · Computer Science 2026-03-02 Antoine Peyronnet , Fabian Gloeckle , Amaury Hayat

CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models

Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual…

Computation and Language · Computer Science 2024-11-04 Wentao Liu , Qianjun Pan , Yi Zhang , Zhuo Liu , Ji Wu , Jie Zhou , Aimin Zhou , Qin Chen , Bo Jiang , Liang He