Related papers: Code Simulation Challenges for Large Language Mode…

Code Simulation as a Proxy for High-order Tasks in Large Language Models

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. We collect pairs of naturalistic and synthetic reasoning tasks to…

Machine Learning · Computer Science 2025-07-08 Emanuele La Malfa , Christoph Weinhuber , Orazio Torre , Fangru Lin , X. Angelo Huang , Samuele Marro , Anthony Cohn , Nigel Shadbolt , Michael Wooldridge

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought…

Computation and Language · Computer Science 2024-07-31 Chengshu Li , Jacky Liang , Andy Zeng , Xinyun Chen , Karol Hausman , Dorsa Sadigh , Sergey Levine , Li Fei-Fei , Fei Xia , Brian Ichter

Can Language Models Pretend Solvers? Logic Code Simulation with LLMs

Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers…

Artificial Intelligence · Computer Science 2024-03-29 Minyu Chen , Guoqiang Li , Ling-I Wu , Ruibang Liu , Yuxin Su , Xi Chang , Jianxin Xue

Chain of Methodologies: Scaling Test Time Computation without Training

Large Language Models (LLMs) often struggle with complex reasoning tasks due to insufficient in-depth insights in their training data, which are typically absent in publicly available documents. This paper introduces the Chain of…

Computation and Language · Computer Science 2025-06-10 Cong Liu , Jie Wu , Weigang Wu , Xu Chen , Liang Lin , Wei-Shi Zheng

Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation

Large Language Models (LLMs) have achieved remarkable success in tasks requiring complex reasoning, such as code generation, mathematical problem solving, and algorithmic synthesis -- especially when aided by reasoning tokens and…

Computation and Language · Computer Science 2025-06-13 Jaechul Roh , Varun Gandhi , Shivani Anilkumar , Arin Garg

Code Execution as Grounded Supervision for LLM Reasoning

Training large language models (LLMs) with chain-of-thought (CoT) supervision has proven effective for enhancing their reasoning abilities. However, obtaining reliable and accurate reasoning supervision remains a significant challenge. We…

Computation and Language · Computer Science 2025-10-21 Dongwon Jung , Wenxuan Zhou , Muhao Chen

Evaluating Prompting and Execution-Based Methods for Deterministic Computation in LLMs

Large Language Models (LLMs) have demonstrated strong capabilities in natural language understanding and reasoning. However, their ability to perform exact, deterministic computation remains unclear. In this work, we systematically evaluate…

Artificial Intelligence · Computer Science 2026-05-08 Hongkun Yu

CodeMind: Evaluating Large Language Models for Code Reasoning

Large Language Models (LLMs) have been widely used to automate programming tasks. Their capabilities have been evaluated by assessing the quality of generated code through tests or proofs. The extent to which they can reason about code is a…

Software Engineering · Computer Science 2026-04-08 Changshu Liu , Yang Chen , Reyhaneh Jabbarvand

Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective

Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the…

Machine Learning · Computer Science 2023-12-27 Guhao Feng , Bohang Zhang , Yuntian Gu , Haotian Ye , Di He , Liwei Wang

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. Their capacity to comprehend and generate human-like code has spurred research into…

Software Engineering · Computer Science 2024-03-07 Chongzhou Fang , Ning Miao , Shaurya Srivastav , Jialin Liu , Ruoyu Zhang , Ruijie Fang , Asmita , Ryan Tsang , Najmeh Nazari , Han Wang , Houman Homayoun

Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models

Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or…

Software Engineering · Computer Science 2024-08-06 Guang Yang , Yu Zhou , Xiang Chen , Xiangyu Zhang , Terry Yue Zhuo , Taolue Chen

Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

Large Language Models (LLMs) have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate…

Software Engineering · Computer Science 2024-03-21 Zhihong Sun , Chen Lyu , Bolun Li , Yao Wan , Hongyu Zhang , Ge Li , Zhi Jin

Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings

Code reasoning tasks are becoming prevalent in large language model (LLM) assessments. Yet, there is a dearth of studies on the impact of real-world complexities on code reasoning, e.g., inter- or intra-procedural dependencies, API calls,…

Software Engineering · Computer Science 2026-04-27 Changshu Liu , Alireza Ghazanfari , Yang Chen , Reyhaneh Jabbarvand

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

Chain of Thoughtlessness? An Analysis of CoT in Planning

Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated with chain of thought prompting-a method of demonstrating solution…

Artificial Intelligence · Computer Science 2025-03-13 Kaya Stechly , Karthik Valmeekam , Subbarao Kambhampati

Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models

Large language models (LLMs) have scaled up to unlock a wide range of complex reasoning tasks with the aid of various prompting methods. However, current prompting methods generate natural language intermediate steps to help reasoning,…

Computation and Language · Computer Science 2023-10-10 Yi Hu , Haotong Yang , Zhouchen Lin , Muhan Zhang

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly…

Computation and Language · Computer Science 2024-10-07 Jiaxin Wen , Jian Guan , Hongning Wang , Wei Wu , Minlie Huang

How Likely Do LLMs with CoT Mimic Human Reasoning?

Chain-of-thought emerges as a promising technique for eliciting reasoning capabilities from Large Language Models (LLMs). However, it does not always improve task performance or accurately represent reasoning processes, leaving unresolved…

Computation and Language · Computer Science 2024-12-13 Guangsheng Bao , Hongbo Zhang , Cunxiang Wang , Linyi Yang , Yue Zhang

Computational Thinking Reasoning in Large Language Models

While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they often struggle with complex tasks that require specific thinking paradigms, such as divide-and-conquer and procedural deduction, \etc Previous…

Software Engineering · Computer Science 2025-06-05 Kechi Zhang , Ge Li , Jia Li , Huangzhao Zhang , Jingjing Xu , Hao Zhu , Lecheng Wang , Jia Li , Yihong Dong , Jing Mai , Bin Gu , Zhi Jin

Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities

Security vulnerabilities are increasingly prevalent in modern software and they are widely consequential to our society. Various approaches to defending against these vulnerabilities have been proposed, among which those leveraging deep…

Cryptography and Security · Computer Science 2024-02-28 Yu Nong , Mohammed Aldeen , Long Cheng , Hongxin Hu , Feng Chen , Haipeng Cai