English
Related papers

Related papers: Dynamic Stability of LLM-Generated Code

200 papers

Large language models (LLMs) can generate programs that pass unit tests, but passing tests does not guarantee reliable runtime behavior. We find that different correct solutions to the same task can show very different memory and…

Assessing the stability of code generation from large language models (LLMs) is essential for judging their reliability in real-world development. We extend prior "structural-entropy concepts" to the program domain by pairing entropy with…

Software Engineering · Computer Science 2025-08-21 Yewei Song , Tiezhu Sun , Xunzhu Tang , Prateek Rajput , Tegawende F. Bissyande , Jacques Klein

The use of Large Language Models (LLMs) in software engineering tasks is growing, especially in the areas of bug fixing and code generation. Nevertheless, these models often yield unstable results; when executed at different times with the…

Software Engineering · Computer Science 2025-09-09 Mehmet Bilal Er , Nagehan İlhan , Umut Kuran

LLMs show strong performance in code generation, but their outputs lack correctness guarantees. Sample-based uncertainty estimators address this by generating multiple candidate programs and measuring their disagreement. However, existing…

Software Engineering · Computer Science 2026-05-12 Weilin He , Arindam Sharma , Cristina David

Large language models (LLMs) have demonstrated impressive capabilities in code generation, achieving high scores on benchmarks such as HumanEval and MBPP. However, these benchmarks primarily assess functional correctness and neglect broader…

Software Engineering · Computer Science 2025-08-21 Scott Blyth , Sherlock A. Licorish , Christoph Treude , Markus Wagner

Training stability is typically regarded as a prerequisite for reliable optimization in large language models. In this work, we analyze how stabilizing training dynamics affects the induced generation distribution. We show that under…

Artificial Intelligence · Computer Science 2026-02-10 Xianzhe Meng , Qiangsheng Zeng , Ling Luo , Qinghan Yang , Jiarui Hao , Wenbo Wu , Qinyu Wang , Rui Yin , Lin Qi , Renzhi Lu

Robots capable of learning from demonstration (LfD) must exhibit stability while executing learned motion skills. To be effective in the real world, they should also remember multiple skills over time -- a capability lacking in current…

Large language models (LLMs) are increasingly used as decision-support tools in data-constrained scientific workflows, where correctness and validity are critical. However, evaluation practices often emphasize stability or reproducibility…

Machine Learning · Computer Science 2026-03-18 Nazia Riasat

Large Language Models (LLMs) are increasingly deployed for structured data generation, yet output consistency remains critical for production applications. We introduce a comprehensive framework for evaluating and improving consistency in…

Computation and Language · Computer Science 2026-01-01 Guanghui Wang , Jinze Yu , Xing Zhang , Dayuan Jiang , Yin Song , Tomal Deb , Xuefeng Liu , Peiyang He

Large language models (LLMs) are increasingly used to generate requirements specifications, design documents, code, and test cases. In contrast, much less attention has been given to a more difficult assurance problem: statically verifying…

Software Engineering · Computer Science 2026-05-19 Zhi Quan Zhou , Dave Towey , Tsong Yueh Chen

Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence, and the software engineering community has readily noticed their potential role in the software development life-cycle.…

Software Engineering · Computer Science 2026-03-16 Greta Dolcetti , Vincenzo Arceri , Eleonora Iotti , Sergio Maffeis , Agostino Cortesi , Enea Zaffanella

We introduce a general stochastic differential equation framework for modelling multiobjective optimization dynamics in iterative Large Language Model (LLM) interactions. Our framework captures the inherent stochasticity of LLM responses…

Machine Learning · Computer Science 2025-10-14 Shivani Shukla , Himanshu Joshi

Large language models (LLMs) are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current…

Cryptography and Security · Computer Science 2025-11-14 Shih-Chieh Dai , Jun Xu , Guanhong Tao

Diffusion large language models (dLLMs) enable parallel text generation by iteratively denoising a fully masked sequence, unmasking a subset of masked tokens at each step. Existing decoding strategies rely on static confidence metrics…

Computation and Language · Computer Science 2026-04-21 Yue Wu , Jian Huang

Language models (LMs) have exhibited impressive abilities in generating codes from natural language requirements. In this work, we highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation…

Software Engineering · Computer Science 2024-08-28 Heejae Chon , Seonghyeon Lee , Jinyoung Yeo , Dongha Lee

Code generation models are widely used in software development, yet their sensitivity to prompt phrasing remains under-examined. Identical requirements expressed with different emotions or communication styles can yield divergent outputs,…

Software Engineering · Computer Science 2025-09-18 Wei Ma , Yixiao Yang , Jingquan Ge , Xiaofei Xie , Lingxiao Jiang

Feature transformation enhances data representation by deriving new features from the original data. Generative AI offers potential for this task, but faces challenges in stable generation (consistent outputs) and valid generation…

Machine Learning · Computer Science 2025-06-12 Xinyuan Wang , Haoyue Bai , Nanxu Gong , Wangyang Ying , Sixun Dong , Xiquan Cui , Yanjie Fu

Code quality is an attribute composed of various metrics, such as complexity, readability, testability, interoperability, reusability, and the use of good or bad practices, among others. Static code analysis tools aim to measure a set of…

Software Engineering · Computer Science 2024-10-07 Igor Regis da Silva Simões , Elaine Venson

Assessing the quality of LLM-generated text remains a fundamental challenge in natural language processing. Current evaluation approaches often rely on isolated metrics or simplistic aggregations that fail to capture the nuanced trade-offs…

Computation and Language · Computer Science 2025-06-25 Esteban Garces Arias , Hannah Blocher , Julian Rodemann , Matthias Aßenmacher , Christoph Jansen

Scientific software relies on high-precision computation, yet finite floating-point representations can introduce precision errors that propagate in safety-critical domains. Despite the growing use of large language models (LLMs) in…

Software Engineering · Computer Science 2026-04-10 Tien Nguyen , Kirshanthan Sundararajah , Muhammad Ali Gulzar
‹ Prev 1 2 3 10 Next ›