English
Related papers

Related papers: Benchmarking Large Language Models with Integer Se…

200 papers

Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers…

Large Language Models (LLMs) have recently achieved impressive performance in math and reasoning benchmarks. However, they often struggle with logic problems and puzzles that are relatively easy for humans. To further investigate this, we…

Artificial Intelligence · Computer Science 2025-09-16 Nasim Borazjanizadeh , Roei Herzig , Trevor Darrell , Rogerio Feris , Leonid Karlinsky

Thinking Large Language Models (LLMs) generate explicit intermediate reasoning traces before final answers, potentially improving transparency, interpretability, and solution accuracy for code generation. However, the quality of these…

Artificial Intelligence · Computer Science 2025-11-11 Haoran Xue , Gias Uddin , Song Wang

We examine the reasoning and planning capabilities of large language models (LLMs) in solving complex tasks. Recent advances in inference-time techniques demonstrate the potential to enhance LLM reasoning without additional training by…

Artificial Intelligence · Computer Science 2025-02-19 Shubham Parashar , Blake Olson , Sambhav Khurana , Eric Li , Hongyi Ling , James Caverlee , Shuiwang Ji

With the emergence of advanced reasoning models like OpenAI o3 and DeepSeek-R1, large language models (LLMs) have demonstrated remarkable reasoning capabilities. However, their ability to perform rigorous logical reasoning remains an open…

Artificial Intelligence · Computer Science 2025-02-14 Hanmeng Liu , Zhizhang Fu , Mengru Ding , Ruoxi Ning , Chaoli Zhang , Xiaozhang Liu , Yue Zhang

Enabling Large Language Models (LLMs) to handle a wider range of complex tasks (e.g., coding, math) has drawn great attention from many researchers. As LLMs continue to evolve, merely increasing the number of model parameters yields…

Benchmarks are critical for measuring Large Language Model (LLM) reasoning capabilities. Some benchmarks have even become the de facto indicator of such capabilities. However, as LLM reasoning capabilities improve, existing widely-used…

Computation and Language · Computer Science 2025-02-26 Stephen Miner , Yoshiki Takashima , Simeng Han , Sam Kouteili , Ferhat Erata , Ruzica Piskac , Scott J Shapiro

This paper investigates the mathematical reasoning capabilities of large language models (LLMs) using 50 newly constructed high-school-level word problems. Unlike prior studies that focus solely on answer correctness, we rigorously analyze…

Artificial Intelligence · Computer Science 2025-02-24 Johan Boye , Birger Moell

Recent developments, particularly OpenAI's O1 model, have demonstrated the remarkable potential of Large Language Models (LLMs) for complex reasoning tasks. Through analysis of O1's outputs and provided sample Chain-of-Thought (CoT)…

Artificial Intelligence · Computer Science 2024-12-09 Toby Simonds , Jey Han Lau , Chaithanya Bandi

Test-time scaling, which leverages additional computation during inference to improve model accuracy, has enabled a new class of Large Language Models (LLMs) that are able to reason through complex problems by understanding the goal,…

Computation and Language · Computer Science 2025-11-25 Shaltiel Shmidman , Asher Fredman , Oleg Sudakov , Meriem Bendris

The rapid advancement of Large Language Models (LLMs) has sparked growing interest in their application to time series analysis tasks. However, their ability to perform complex reasoning over temporal data in real-world application domains…

Machine Learning · Computer Science 2025-09-03 Wen Ye , Jinbo Liu , Defu Cao , Wei Yang , Yan Liu

Large language models (LLMs) can perform reasoning computations both internally within their latent space and externally by generating explicit token sequences like chains of thought. Significant progress in enhancing reasoning abilities…

Computation and Language · Computer Science 2025-04-16 Thilo Hagendorff , Sarah Fabi

Reasoning-enabled large language models (LLMs) excel in logical tasks, yet their utility for evaluating natural language generation remains unexplored. This study systematically compares reasoning LLMs with non-reasoning counterparts across…

Computation and Language · Computer Science 2025-06-02 Daniil Larionov , Sotaro Takeshita , Ran Zhang , Yanran Chen , Christoph Leiter , Zhipin Wang , Christian Greisinger , Steffen Eger

Large Language Models (LLMs) are highly proficient in language-based tasks. Their language capabilities have positioned them at the forefront of the future AGI (Artificial General Intelligence) race. However, on closer inspection, Valmeekam…

Computation and Language · Computer Science 2025-03-17 Dibyanayan Bandyopadhyay , Soham Bhattacharjee , Asif Ekbal

Large Language Models (LLMs) are increasingly utilized in AI-driven educational instruction and assessment, particularly within mathematics education. The capability of LLMs to generate accurate answers and detailed solutions for math…

Artificial Intelligence · Computer Science 2025-08-15 Liang Zhang , Edith Aurora Graf

Large language models (LLMs) achieve impressive performance on complex mathematical benchmarks yet sometimes fail on basic math reasoning while generating unnecessarily verbose responses. In this paper, we present LLMThinkBench, a…

Computation and Language · Computer Science 2026-04-24 Gaurav Srivastava , Aafiya Hussain , Sriram Srinivasan , Xuan Wang

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language processing tasks, such as text generation and semantic understanding. However, their performance on numerical reasoning tasks, such as basic…

Computation and Language · Computer Science 2025-06-04 Haoyang Li , Xuejia Chen , Zhanchao XU , Darian Li , Nicole Hu , Fei Teng , Yiming Li , Luyu Qiu , Chen Jason Zhang , Qing Li , Lei Chen

Large language models (LLMs) such as GPT-5 and Gemini 3 have pushed the frontier of automated reasoning and code generation. Yet current benchmarks emphasize accuracy and output quality, neglecting a critical dimension: efficiency of token…

Computation and Language · Computer Science 2026-02-25 Zheng Du , Hao Kang , Song Han , Tushar Krishna , Ligeng Zhu

Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1…

Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general…

Artificial Intelligence · Computer Science 2024-01-18 Zhiming Li , Yushi Cao , Xiufeng Xu , Junzhe Jiang , Xu Liu , Yon Shin Teo , Shang-wei Lin , Yang Liu
‹ Prev 1 2 3 10 Next ›