Related papers: Solving Quantitative Reasoning Problems with Langu…

Reasoning Elicitation in Language Models via Counterfactual Feedback

Despite the increasing effectiveness of language models, their reasoning capabilities remain underdeveloped. In particular, causal reasoning through counterfactual question answering is lacking. This work aims to bridge this gap. We first…

Computation and Language · Computer Science 2025-03-18 Alihan Hüyük , Xinnuo Xu , Jacqueline Maasch , Aditya V. Nori , Javier González

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer…

Computation and Language · Computer Science 2024-10-10 Wenhao Shi , Zhiqiang Hu , Yi Bin , Junhua Liu , Yang Yang , See-Kiong Ng , Lidong Bing , Roy Ka-Wei Lee

Multi-Step Reasoning with Large Language Models, a Survey

Large language models (LLMs) with billions of parameters exhibit in-context learning abilities, enabling few-shot learning on tasks that the model was not specifically trained for. Traditional models achieve breakthrough performance on…

Artificial Intelligence · Computer Science 2025-11-04 Aske Plaat , Annie Wong , Suzan Verberne , Joost Broekens , Niki van Stein , Thomas Back

MINERVA: Evaluating Complex Video Reasoning

Multimodal LLMs are turning their focus to video benchmarks, however most video benchmarks only provide outcome supervision, with no intermediate or interpretable reasoning steps. This makes it challenging to assess if models are truly able…

Machine Learning · Computer Science 2025-05-02 Arsha Nagrani , Sachit Menon , Ahmet Iscen , Shyamal Buch , Ramin Mehran , Nilpa Jha , Anja Hauth , Yukun Zhu , Carl Vondrick , Mikhail Sirotenko , Cordelia Schmid , Tobias Weyand

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

Large Language Models (LLMs) have shown remarkable performance in various natural language processing tasks but face challenges in mathematical reasoning, where complex problem-solving requires both linguistic understanding and mathematical…

Computation and Language · Computer Science 2025-03-20 Shuguang Chen , Guang Lin

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Quantitative reasoning is a critical skill to analyze data, yet the assessment of such ability remains limited. To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language…

Computation and Language · Computer Science 2024-06-11 Xiao Liu , Zirui Wu , Xueqing Wu , Pan Lu , Kai-Wei Chang , Yansong Feng

Large Language Models for Mathematical Reasoning: Progresses and Challenges

Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the…

Computation and Language · Computer Science 2024-09-18 Janice Ahn , Rishu Verma , Renze Lou , Di Liu , Rui Zhang , Wenpeng Yin

Language Models are Multilingual Chain-of-Thought Reasoners

We evaluate the reasoning abilities of large language models in multilingual settings. We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250 grade-school math problems from the GSM8K dataset (Cobbe et…

Computation and Language · Computer Science 2022-10-07 Freda Shi , Mirac Suzgun , Markus Freitag , Xuezhi Wang , Suraj Srivats , Soroush Vosoughi , Hyung Won Chung , Yi Tay , Sebastian Ruder , Denny Zhou , Dipanjan Das , Jason Wei

Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets

Recent advances in reasoning with large language models (LLMs) have demonstrated strong performance on complex mathematical tasks, including combinatorial optimization. Techniques such as Chain-of-Thought and In-Context Learning have…

Artificial Intelligence · Computer Science 2025-09-17 Marylou Fauchard , Florian Carichon , Margarida Carvalho , Golnoosh Farnadi

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from…

Computation and Language · Computer Science 2024-05-06 Longhui Yu , Weisen Jiang , Han Shi , Jincheng Yu , Zhengying Liu , Yu Zhang , James T. Kwok , Zhenguo Li , Adrian Weller , Weiyang Liu

Beyond Words: How Large Language Models Perform in Quantitative Management Problem-Solving

This study examines how Large Language Models (LLMs) perform when tackling quantitative management decision problems in a zero-shot setting. Drawing on 900 responses generated by five leading models across 20 diverse managerial scenarios,…

Computation and Language · Computer Science 2025-02-25 Jonathan Kuzmanko

Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems

Recent advances in large language models (LLMs) and multimodal LLMs (MLLMs) have led to strong reasoning ability across a wide range of tasks. However, their ability to perform mathematical reasoning from spoken input remains underexplored.…

Computation and Language · Computer Science 2025-05-22 Chengwei Wei , Bin Wang , Jung-jae Kim , Nancy F. Chen

Towards Reasoning in Large Language Models: A Survey

Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in…

Computation and Language · Computer Science 2023-05-29 Jie Huang , Kevin Chen-Chuan Chang

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Pan Lu , Hritik Bansal , Tony Xia , Jiacheng Liu , Chunyuan Li , Hannaneh Hajishirzi , Hao Cheng , Kai-Wei Chang , Michel Galley , Jianfeng Gao

Addressing Longstanding Challenges in Cognitive Science with Language Models

Cognitive science faces ongoing challenges in research integration, formalization, conceptual clarity, and other areas, in part due to its multifaceted and interdisciplinary nature. Recent advances in artificial intelligence, particularly…

Artificial Intelligence · Computer Science 2026-03-03 Dirk U. Wulff , Rui Mata

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic…

Computation and Language · Computer Science 2020-05-08 Mostafa Abdou , Vinit Ravishankar , Maria Barrett , Yonatan Belinkov , Desmond Elliott , Anders Søgaard

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models

Large Language Models (LLMs) achieve impressive performance in a wide range of tasks, even if they are often trained with the only objective of chatting fluently with users. Among other skills, LLMs show emergent abilities in mathematical…

Computation and Language · Computer Science 2024-06-12 Flavio Petruzzellis , Alberto Testolin , Alessandro Sperduti

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language…

Artificial Intelligence · Computer Science 2024-07-31 Tian Ye , Zicheng Xu , Yuanzhi Li , Zeyuan Allen-Zhu

VIVA+: Human-Centered Situational Decision-Making

Multimodal Large Language Models (MLLMs) show promising results for embodied agents in operating meaningfully in complex, human-centered environments. Yet, evaluating their capacity for nuanced, human-like reasoning and decision-making…

Computation and Language · Computer Science 2025-09-30 Zhe Hu , Yixiao Ren , Guanzhong Liu , Jing Li , Yu Yin

MINERVA-Cultural: A Benchmark for Cultural and Multilingual Long Video Reasoning

Recent advancements in video models have shown tremendous progress, particularly in long video understanding. However, current benchmarks predominantly feature western-centric data and English as the dominant language, introducing…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Darshan Singh , Arsha Nagrani , Kawshik Manikantan , Harman Singh , Dinesh Tewari , Tobias Weyand , Cordelia Schmid , Anelia Angelova , Shachi Dave