English
Related papers

Related papers: Correctness isnt Efficiency: Runtime Memory Diverg…

200 papers

LLMs show strong performance in code generation, but their outputs lack correctness guarantees. Sample-based uncertainty estimators address this by generating multiple candidate programs and measuring their disagreement. However, existing…

Software Engineering · Computer Science 2026-05-12 Weilin He , Arindam Sharma , Cristina David

Current evaluations of LLMs for code generation emphasize functional correctness, overlooking the fact that functionally correct solutions can differ significantly in algorithmic complexity. For instance, an $(O(n^2))$ versus $(O(n \log…

The use of Large Language Models (LLMs) in software engineering tasks is growing, especially in the areas of bug fixing and code generation. Nevertheless, these models often yield unstable results; when executed at different times with the…

Software Engineering · Computer Science 2025-09-09 Mehmet Bilal Er , Nagehan İlhan , Umut Kuran

Large Language Models (LLMs) changed the way we design and interact with software systems. Their ability to process and extract information from text has drastically improved productivity in a number of routine tasks. Developers that want…

Machine Learning · Computer Science 2025-08-26 Federico Errica , Giuseppe Siracusano , Davide Sanvito , Roberto Bifulco

While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when…

Computation and Language · Computer Science 2023-04-13 Harsh Raj , Domenic Rosati , Subhabrata Majumdar

Large language models (LLMs) are increasingly used as decision-support tools in data-constrained scientific workflows, where correctness and validity are critical. However, evaluation practices often emphasize stability or reproducibility…

Machine Learning · Computer Science 2026-03-18 Nazia Riasat

Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model…

Software Engineering · Computer Science 2026-04-15 Sabiya Banu Masthan Ali , Oussema Kirmani , Aroosa Hameed , Syed Muhammad Danish , Gautam Srivastava

Large language models (LLMs) are increasingly deployed under diverse numerical precision configurations, including standard floating-point formats (e.g., bfloat16 and float16) and quantized integer formats (e.g., int16 and int8), to meet…

Artificial Intelligence · Computer Science 2026-04-23 Yifei Wang , Tianlin Li , Xiaohan Zhang , Xiaoyu Zhang , Wei Ma , Mingfei Cheng , Li Pan

Large Language Models (LLMs) have demonstrated impressive performance in code generation tasks under idealized conditions, where task descriptions are clear and precise. However, in practice, task descriptions frequently exhibit ambiguity,…

Software Engineering · Computer Science 2025-07-29 Maya Larbi , Amal Akli , Mike Papadakis , Rihab Bouyousfi , Maxime Cordy , Federica Sarro , Yves Le Traon

In this paper, we present a challenging code reasoning task: vulnerability detection. Large Language Models (LLMs) have shown promising results in natural-language and math reasoning, but state-of-the-art (SOTA) models reported only 54.5%…

Software Engineering · Computer Science 2025-01-09 Benjamin Steenhoek , Md Mahbubur Rahman , Monoshi Kumar Roy , Mirza Sanjida Alam , Hengbo Tong , Swarna Das , Earl T. Barr , Wei Le

Large Language Models (LLMs) have achieved state-of-the-art performance across software engineering tasks, from code generation to translation. However, we identify and systematically evaluate a critical failure mode: Programming Language…

Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses…

Computation and Language · Computer Science 2024-03-25 Yukun Zhao , Lingyong Yan , Weiwei Sun , Guoliang Xing , Shuaiqiang Wang , Chong Meng , Zhicong Cheng , Zhaochun Ren , Dawei Yin

Large Language Models (LLMs) have been widely employed in programming language analysis to enhance human productivity. Yet, their reliability can be compromised by various code distribution shifts, leading to inconsistent outputs. While…

Software Engineering · Computer Science 2024-02-12 Yufei Li , Simin Chen , Yanghong Guo , Wei Yang , Yue Dong , Cong Liu

Reasoning failures in large language models (LLMs) are typically measured only at the end of a generation, yet many failures manifest as a process-level breakdown: the model "loses the thread" mid-reasoning. We study whether such breakdowns…

Artificial Intelligence · Computer Science 2026-02-04 Jinkun Chen , Fengxiang Cheng , Sijia Han , Vlado Keselj

Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as code generation, completion, analysis, and bug fixing. Ensuring the robustness of these models…

Software Engineering · Computer Science 2026-02-13 Yang Liu , Armstrong Foundjem , Xingfang Wu , Heng Li , Foutse Khomh

Memory consistency models (MCMs) are at the heart of concurrent programming. They represent the behaviour of concurrent programs at the chip level. To test these models small program snippets called litmus test are generated, which show…

Programming Languages · Computer Science 2018-08-30 Ruth Hoffmann , Özgür Akgün , Susmit Sarkar

Large language models (LLMs) often present answers with high apparent confidence despite lacking an explicit mechanism for reasoning about certainty or truth. While existing benchmarks primarily evaluate single-turn accuracy, truthfulness…

Computation and Language · Computer Science 2026-03-05 Mohammadreza Saadat , Steve Nemzer

Understanding a program's runtime reasoning behavior, meaning how intermediate states and control flows lead to final execution results, is essential for reliable code generation, debugging, and automated reasoning. Although large language…

Software Engineering · Computer Science 2025-12-02 Mohammad Abdollahi , Khandaker Rifah Tasnia , Soumit Kanti Saha , Jinqiu Yang , Song Wang , Hadi Hemmati

The rapid advancement of large language models (LLMs) has shown remarkable progress in complex reasoning tasks. However, a significant disparity exists between benchmark performances and real-world applications. We attribute this gap…

Artificial Intelligence · Computer Science 2025-08-11 Junnan Liu , Hongwei Liu , Linchen Xiao , Ziyi Wang , Kuikun Liu , Songyang Gao , Wenwei Zhang , Songyang Zhang , Kai Chen

Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a…

Computation and Language · Computer Science 2019-06-14 Mark Braverman , Xinyi Chen , Sham M. Kakade , Karthik Narasimhan , Cyril Zhang , Yi Zhang
‹ Prev 1 2 3 10 Next ›