English
Related papers

Related papers: Mathify: Evaluating Large Language Models on Mathe…

200 papers

The use of Large Language Models (LLMs) in mathematical reasoning has become a cornerstone of related research, demonstrating the intelligence of these models and enabling potential practical applications through their advanced performance,…

Computation and Language · Computer Science 2024-12-20 Kathrin Seßler , Yao Rong , Emek Gözlüklü , Enkelejda Kasneci

Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the…

Computation and Language · Computer Science 2024-09-18 Janice Ahn , Rishu Verma , Renze Lou , Di Liu , Rui Zhang , Wenpeng Yin

The progress of Large Language Models (LLMs) like ChatGPT raises the question of how they can be integrated into education. One hope is that they can support mathematics learning, including word-problem solving. Since LLMs can handle…

Computation and Language · Computer Science 2025-08-12 Anselm R. Strohmaier , Wim Van Dooren , Kathrin Seßler , Brian Greer , Lieven Verschaffel

Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks, yet their proficiency in mathematical reasoning remains a key challenge. Addressing the gap between natural and mathematical…

Artificial Intelligence · Computer Science 2025-02-18 Xuhan Huang , Qingning Shen , Yan Hu , Anningzhe Gao , Benyou Wang

Large Language Models (LLMs) have shown remarkable performance in various natural language processing tasks but face challenges in mathematical reasoning, where complex problem-solving requires both linguistic understanding and mathematical…

Computation and Language · Computer Science 2025-03-20 Shuguang Chen , Guang Lin

Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with…

Artificial Intelligence · Computer Science 2026-03-27 Liang Zhang , Yu Fu , Xinyi Jin

Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions -- failing to capture the nature of mathematics…

Artificial Intelligence · Computer Science 2025-10-21 Jie Zhang , Cezara Petrui , Kristina Nikolić , Florian Tramèr

Mathematical problem-solving is a key field in artificial intelligence (AI) and a critical benchmark for evaluating the capabilities of large language models (LLMs). While extensive research has focused on mathematical problem-solving, most…

Computation and Language · Computer Science 2025-01-03 Ziye Chen , Hao Qi

Large Language Models (LLMs) have demonstrated exceptional capabilities in various natural language tasks, often achieving performances that surpass those of humans. Despite these advancements, the domain of mathematics presents a…

Computation and Language · Computer Science 2024-04-02 Ankit Satpute , Noah Giessing , Andre Greiner-Petter , Moritz Schubotz , Olaf Teschke , Akiko Aizawa , Bela Gipp

Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the…

Computation and Language · Computer Science 2024-06-27 Meng Fang , Xiangpeng Wan , Fei Lu , Fei Xing , Kai Zou

Recent advancements in large language models (LLMs) have showcased significant improvements in mathematics. However, traditional math benchmarks like GSM8k offer a unidimensional perspective, falling short in providing a holistic assessment…

Computation and Language · Computer Science 2024-05-21 Hongwei Liu , Zilong Zheng , Yuxuan Qiao , Haodong Duan , Zhiwei Fei , Fengzhe Zhou , Wenwei Zhang , Songyang Zhang , Dahua Lin , Kai Chen

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from…

Computation and Language · Computer Science 2024-05-06 Longhui Yu , Weisen Jiang , Han Shi , Jincheng Yu , Zhengying Liu , Yu Zhang , James T. Kwok , Zhenguo Li , Adrian Weller , Weiyang Liu

Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning…

Computation and Language · Computer Science 2026-05-20 Husnain Amjad , Raja Khurram Shahzad , Aamir Shahzad , Mehwish Fatima

The rapid advancement of Large Language Models (LLMs) in the realm of mathematical reasoning necessitates comprehensive evaluations to gauge progress and inspire future directions. Existing assessments predominantly focus on problem-solving…

Computation and Language · Computer Science 2024-06-05 Xiaoyuan Li , Wenjie Wang , Moxin Li , Junrong Guo , Yang Zhang , Fuli Feng

This paper investigates the capabilities of large language models (LLMs) in formulating and solving decision-making problems using mathematical programming. We first conduct a systematic review and meta-analysis of recent literature to…

Artificial Intelligence · Computer Science 2025-08-26 Mohammad J. Abdel-Rahman , Yasmeen Alslman , Dania Refai , Amro Saleh , Malik A. Abu Loha , Mohammad Yahya Hamed

Mathematical reasoning remains one of the most challenging domains for large language models (LLMs), requiring not only linguistic understanding but also structured logical deduction and numerical precision. While recent LLMs demonstrate…

Computation and Language · Computer Science 2026-01-27 Mahbub E Sobhani , Md. Faiyaz Abdullah Sayeedi , Tasnim Mohiuddin , Md Mofijul Islam , Swakkhar Shatabda

Large Language Models (LLMs) have been applied to Math Word Problems (MWPs) with transformative impacts, revolutionizing how these complex problems are approached and solved in various domains including educational settings. However, the…

Computation and Language · Computer Science 2024-06-18 Joykirat Singh , Akshay Nambi , Vibhav Vineet

Due to the remarkable language understanding and generation abilities of large language models (LLMs), their use in educational applications has been explored. However, little work has been done on investigating the pedagogical ability of…

Computation and Language · Computer Science 2023-10-23 An-Zi Yen , Wei-Ling Hsu

We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for…

Artificial Intelligence · Computer Science 2026-03-02 Antoine Peyronnet , Fabian Gloeckle , Amaury Hayat

Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual…

Computation and Language · Computer Science 2024-11-04 Wentao Liu , Qianjun Pan , Yi Zhang , Zhuo Liu , Ji Wu , Jie Zhou , Aimin Zhou , Qin Chen , Bo Jiang , Liang He
‹ Prev 1 2 3 10 Next ›