English
Related papers

Related papers: How Robustly do LLMs Understand Execution Semantic…

200 papers

LLMs have made significant progress in the field of mathematical reasoning, but whether they have true the mathematical understanding ability is still controversial. To explore this issue, we propose a new perturbation framework to evaluate…

Artificial Intelligence · Computer Science 2025-11-12 Zhishen Sun , Guang Dai , Ivor Tsang , Haishan Ye

Large language models (LLMs) are being increasingly adopted in the software engineering domain, yet the robustness of their grasp on core software design concepts remains unclear. We conduct an empirical study to systematically evaluate…

Software Engineering · Computer Science 2025-12-30 Mootez Saad , Boqi Chen , José Antonio Hernández López , Dániel Varró , Tushar Sharma

With the increasing use of large language models (LLMs), ensuring reliable performance in diverse, real-world environments is essential. Despite their remarkable achievements, LLMs often struggle with adversarial inputs, significantly…

Computation and Language · Computer Science 2024-06-18 Yuqing Wang , Yun Zhao

Large Language Models (LLMs) have achieved remarkable success in tasks requiring complex reasoning, such as code generation, mathematical problem solving, and algorithmic synthesis -- especially when aided by reasoning tokens and…

Computation and Language · Computer Science 2025-06-13 Jaechul Roh , Varun Gandhi , Shivani Anilkumar , Arin Garg

Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their…

Artificial Intelligence · Computer Science 2025-11-21 Parshin Shojaee , Iman Mirzadeh , Keivan Alizadeh , Maxwell Horton , Samy Bengio , Mehrdad Farajtabar

This study investigates the reasoning robustness of large language models (LLMs) on mathematical problem-solving tasks under systematically introduced input perturbations. Using the GSM8K dataset as a controlled testbed, we evaluate how…

Artificial Intelligence · Computer Science 2025-04-04 Giannis Chatziveroglou , Richard Yun , Maura Kelleher

With the widespread adoption of vibe coding, understanding the reasoning and robustness of Large Language Models (LLMs) is critical for their reliable use in programming tasks. While recent studies assess LLMs' ability to predict program…

Software Engineering · Computer Science 2026-05-08 Pedro Orvalho , Marta Kwiatkowska

While Large Language Models (LLMs) achieve high performance on standard mathematical benchmarks, their problem-solving abilities depend on the context and textual formatting. We introduce the Robust Reasoning Benchmark (RRB), a pipeline of…

Machine Learning · Computer Science 2026-05-22 Pavel Golikov , Evgenii Opryshko , Gennady Pekhimenko , Mark C. Jeffrey

Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance. However, the true depth of their competencies and robustness…

Computation and Language · Computer Science 2024-11-05 Pengfei Hong , Navonil Majumder , Deepanway Ghosal , Somak Aditya , Rada Mihalcea , Soujanya Poria

In this paper, we present a challenging code reasoning task: vulnerability detection. Large Language Models (LLMs) have shown promising results in natural-language and math reasoning, but state-of-the-art (SOTA) models reported only 54.5%…

Software Engineering · Computer Science 2025-01-09 Benjamin Steenhoek , Md Mahbubur Rahman , Monoshi Kumar Roy , Mirza Sanjida Alam , Hengbo Tong , Swarna Das , Earl T. Barr , Wei Le

Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as code generation, completion, analysis, and bug fixing. Ensuring the robustness of these models…

Software Engineering · Computer Science 2026-02-13 Yang Liu , Armstrong Foundjem , Xingfang Wu , Heng Li , Foutse Khomh

Inductive reasoning, a cornerstone of human cognition, enables generalization from limited data but hasn't yet been fully achieved by large language models (LLMs). While modern LLMs excel at reasoning tasks, their ability to maintain stable…

Artificial Intelligence · Computer Science 2025-05-29 Chunyang Li , Weiqi Wang , Tianshi Zheng , Yangqiu Song

Understanding a program's runtime reasoning behavior, meaning how intermediate states and control flows lead to final execution results, is essential for reliable code generation, debugging, and automated reasoning. Although large language…

Software Engineering · Computer Science 2025-12-02 Mohammad Abdollahi , Khandaker Rifah Tasnia , Soumit Kanti Saha , Jinqiu Yang , Song Wang , Hadi Hemmati

Large language models (LLMs) demonstrate remarkable breadth of knowledge, yet their ability to reason about computational processes remains poorly understood. Closing this gap matters for practitioners who rely on LLMs to guide algorithm…

Computation and Language · Computer Science 2026-04-07 Sohan Venkatesh , Ashish Mahendran Kurapath , Tejas Melkote

This paper proposes CES, a task to evaluate the abilities of LLMs in simulating program execution and using that reasoning in programming tasks. Besides measuring the correctness of variable predictions during execution simulation, CES…

Software Engineering · Computer Science 2026-04-08 Changshu Liu , Yang Chen , Reyhaneh Jabbarvand

Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings…

Artificial Intelligence · Computer Science 2025-10-28 Revanth Rameshkumar , Jimson Huang , Yunxin Sun , Fei Xia , Abulhair Saparov

While large language models (LLMs) excel in mathematical and code reasoning, we observe they struggle with social reasoning tasks, exhibiting cognitive confusion, logical inconsistencies, and conflation between objective world states and…

Computation and Language · Computer Science 2025-10-14 Jialu Du , Guiyang Hou , Yihui Fu , Chen Wu , Wenqi Zhang , Yongliang Shen , Weiming Lu

Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations, causing concerns about trust. To enhance trust, it is imperative to gain a comprehensive understanding of the…

Computation and Language · Computer Science 2025-01-03 Vatsal Gupta , Pranshu Pandya , Tushar Kataria , Vivek Gupta , Dan Roth

Large Language Models (LLMs) have recently demonstrated strong capabilities in code-related tasks, but their robustness in code reasoning under perturbations remains underexplored. We introduce CodeCrash, a stress-testing framework with…

Artificial Intelligence · Computer Science 2025-10-14 Man Ho Lam , Chaozheng Wang , Jen-tse Huang , Michael R. Lyu

Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models…

Computation and Language · Computer Science 2022-11-09 Soumya Sanyal , Zeyi Liao , Xiang Ren
‹ Prev 1 2 3 10 Next ›