Related papers: Benchmarking and Explaining Large Language Model-b…

Understanding Defects in Generated Codes by Language Models

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila

Prompting or Fine-tuning? Exploring Large Language Models for Causal Graph Validation

This study explores the capability of Large Language Models (LLMs) to evaluate causality in causal graphs generated by conventional statistical causal discovery methods-a task traditionally reliant on manual assessment by human subject…

Computation and Language · Computer Science 2025-04-16 Yuni Susanti , Nina Holsmoelle

The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code

Causal reasoning, the ability to identify cause-and-effect relationship, is crucial in human thinking. Although large language models (LLMs) succeed in many NLP tasks, it is still challenging for them to conduct complex causal reasoning…

Computation and Language · Computer Science 2023-05-31 Xiao Liu , Da Yin , Chen Zhang , Yansong Feng , Dongyan Zhao

The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

As Large Language Models (LLMs) are transforming software development, the functional quality of generated code has become a central focus, leaving readability, one of critical non-functional attributes, understudied. Given that…

Software Engineering · Computer Science 2026-05-14 Hengzhi Ye , Fengyuan Ran , Weiwei Xu , Minghui Zhou

Benchmarking Causal Study to Interpret Large Language Models for Source Code

One of the most common solutions adopted by software researchers to address code generation is by training Large Language Models (LLMs) on massive amounts of source code. Although a number of studies have shown that LLMs have been…

Software Engineering · Computer Science 2023-08-25 Daniel Rodriguez-Cardenas , David N. Palacio , Dipin Khati , Henry Burke , Denys Poshyvanyk

Large Language Models for Code Generation: The Practitioners Perspective

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

Software Engineering · Computer Science 2025-01-29 Zeeshan Rasheed , Muhammad Waseem , Kai Kristian Kemell , Aakash Ahmad , Malik Abdul Sami , Jussi Rasku , Kari Systä , Pekka Abrahamsson

Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization

Large Language Models (LLMs) are nowadays extensively used for various types of software engineering tasks, primarily code generation. Previous research has shown how suitable prompt engineering could help developers in improving their code…

Software Engineering · Computer Science 2026-01-21 Alessandro Midolo , Alessandro Giagnorio , Fiorella Zampetti , Rosalia Tufano , Gabriele Bavota , Massimiliano Di Penta

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial"…

Artificial Intelligence · Computer Science 2024-08-21 Emre Kıcıman , Robert Ness , Amit Sharma , Chenhao Tan

Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms

Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to…

Neural and Evolutionary Computing · Computer Science 2025-03-24 Niki van Stein , Anna V. Kononova , Lars Kotthoff , Thomas Bäck

Testing LLMs on Code Generation with Varying Levels of Prompt Specificity

Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. Among the myriad of applications that benefit from LLMs, automated code generation is increasingly promising. The…

Software Engineering · Computer Science 2023-11-15 Lincoln Murr , Morgan Grainger , David Gao

Causality for Large Language Models

Recent breakthroughs in artificial intelligence have driven a paradigm shift, where large language models (LLMs) with billions or trillions of parameters are trained on vast datasets, achieving unprecedented success across a series of…

Computation and Language · Computer Science 2024-10-22 Anpeng Wu , Kun Kuang , Minqin Zhu , Yingrong Wang , Yujia Zheng , Kairong Han , Baohong Li , Guangyi Chen , Fei Wu , Kun Zhang

Re-Evaluating Code LLM Benchmarks Under Semantic Mutation

In the era of large language models (LLMs), code benchmarks have become an important research area in software engineering and are widely used by practitioners. These benchmarks evaluate the performance of LLMs on specific code-related…

Software Engineering · Computer Science 2025-06-24 Zhiyuan Pan , Xing Hu , Xin Xia , Xiaohu Yang

Code Roulette: How Prompt Variability Affects LLM Code Generation

Code generation is one of the most active areas of application of Large Language Models (LLMs). While LLMs lower barriers to writing code and accelerate development process, the overall quality of generated programs depends on the quality…

Software Engineering · Computer Science 2026-03-19 Andrei Paleyes , Radzim Sendyka , Diana Robinson , Christian Cabrera , Neil D. Lawrence

The Impact of Prompt Programming on Function-Level Code Generation

Large Language Models (LLMs) are increasingly used by software engineers for code generation. However, limitations of LLMs such as irrelevant or incorrect code have highlighted the need for prompt programming (or prompt engineering) where…

Software Engineering · Computer Science 2025-07-09 Ranim Khojah , Francisco Gomes de Oliveira Neto , Mazen Mohamad , Philipp Leitner

Prompting Techniques for Secure Code Generation: A Systematic Investigation

Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce…

Software Engineering · Computer Science 2025-02-27 Catherine Tony , Nicolás E. Díaz Ferreyra , Markus Mutas , Salem Dhiff , Riccardo Scandariato

Assessing, Exploiting, and Mitigating Syntactic Robustness Failures in LLM-Based Code Generation

Rapid advances in the field of Large Language Models (LLMs) have made LLM-based code generation an important area for investigation. An LLM-based code generator takes a prompt as input and produces code that implements the requirements…

Software Engineering · Computer Science 2026-05-11 Laboni Sarker , Mara Downing , Achintya Desai , Tevfik Bultan

Integrating Large Language Model for Improved Causal Discovery

Recovering the structure of causal graphical models from observational data is an essential yet challenging task for causal discovery in scientific scenarios. Domain-specific causal discovery usually relies on expert validation or prior…

Artificial Intelligence · Computer Science 2025-08-27 Taiyu Ban , Lyuzhou Chen , Derui Lyu , Xiangyu Wang , Qinrui Zhu , Qiang Tu , Huanhuan Chen

CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models

Causal reasoning capabilities are essential for large language models (LLMs) in a wide range of applications, such as education and healthcare. But there is still a lack of benchmarks for a better understanding of such capabilities. Current…

Computation and Language · Computer Science 2024-12-25 Ruibo Tu , Hedvig Kjellström , Gustav Eje Henter , Cheng Zhang

CodeSCM: Causal Analysis for Multi-Modal Code Generation

In this paper, we propose CodeSCM, a Structural Causal Model (SCM) for analyzing multi-modal code generation using large language models (LLMs). By applying interventions to CodeSCM, we measure the causal effects of different prompt…

Computation and Language · Computer Science 2025-02-10 Mukur Gupta , Noopur Bhatt , Suman Jana

CausalBench: A Comprehensive Benchmark for Causal Learning Capability of LLMs

The ability to understand causality significantly impacts the competence of large language models (LLMs) in output explanation and counterfactual reasoning, as causality reveals the underlying data distribution. However, the lack of a…

Machine Learning · Computer Science 2024-09-30 Yu Zhou , Xingyu Wu , Beicheng Huang , Jibin Wu , Liang Feng , Kay Chen Tan