English
Related papers

Related papers: DiffSpec: Differential Testing with LLMs using Nat…

200 papers

Large language models (LLMs) are increasingly deployed under diverse numerical precision configurations, including standard floating-point formats (e.g., bfloat16 and float16) and quantized integer formats (e.g., int16 and int8), to meet…

Artificial Intelligence · Computer Science 2026-04-23 Yifei Wang , Tianlin Li , Xiaohan Zhang , Xiaoyu Zhang , Wei Ma , Mingfei Cheng , Li Pan

Differential testing offers a promising strategy to alleviate the test oracle problem by comparing the test results between alternative implementations. However, existing differential testing techniques for deep learning (DL) libraries are…

Software Engineering · Computer Science 2025-05-09 Meiziniu Li , Dongze Li , Jianmeng Liu , Jialun Cao , Yongqiang Tian , Shing-Chi Cheung

Compilers are complex, and significant effort has been expended on testing them. Techniques such as random program generation and differential testing have proved highly effective and have uncovered thousands of bugs in production…

Software Engineering · Computer Science 2025-01-03 Davide Italiano , Chris Cummins

We introduce Differential Performance Evaluation (DPE), a framework designed to reliably evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding benchmarks often fail to provide reliable insights into code…

Software Engineering · Computer Science 2024-08-14 Jiawei Liu , Songrun Xie , Junhao Wang , Yuxiang Wei , Yifeng Ding , Lingming Zhang

File systems are critical OS components that require constant evolution to support new hardware and emerging application needs. However, the traditional paradigm of developing features, fixing bugs, and maintaining the system incurs…

Operating Systems · Computer Science 2026-02-11 Qingyuan Liu , Mo Zou , Hengbin Zhang , Dong Du , Yubin Xia , Haibo Chen

Formal specification generation has recently drawn attention in software engineering as a way to improve program correctness without requiring manual annotations. Large Language Models (LLMs) have shown promise in this area, but early…

Software Engineering · Computer Science 2026-04-07 Ragib Shahariar Ayon , Shibbir Ahmed

Effective decision-making often relies on identifying what makes each candidate distinctive. While existing benchmarks for LLMs emphasize retrieving or summarizing information relevant to a given query, they do not evaluate a model's…

Computation and Language · Computer Science 2025-10-02 Seiji Maekawa , Hayate Iso , Nikita Bhutani

Code-documentation inconsistencies are common and undesirable: they can lead to developer misunderstandings and software defects. This paper introduces DocPrism, a multi-language, code-documentation inconsistency detection tool. DocPrism…

Software Engineering · Computer Science 2025-11-04 Xiaomeng Xu , Zahin Wahab , Reid Holmes , Caroline Lemieux

Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the…

Software Engineering · Computer Science 2023-04-11 Sarah Fakhoury , Saikat Chakraborty , Madan Musuvathi , Shuvendu K. Lahiri

Automated test-generation research overwhelmingly assumes the correctness of focal methods, yet practitioners routinely face non-regression scenarios where the focal method may be defective. A baseline evaluation of EVOSUITE and two leading…

Software Engineering · Computer Science 2026-02-03 Pengyu Xue , Yuxiang Zhang , Zhen Yang , Xiaoxue Ren , Xiang Li , Pengfei Hu , Linhao Wu , Kainan Li

Large language models (LLMs) are being used in many applications and prompts for these models are integrated into software applications as code-like artifacts. These prompts behave much like traditional software in that they take inputs,…

Software Engineering · Computer Science 2026-02-09 Reshabh K Sharma , Jonathan De Halleux , Shraddha Barke , Dan Grossman , Benjamin Zorn

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila

Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly…

Software Engineering · Computer Science 2023-11-10 Sungmin Kang , Juyeon Yoon , Nargiz Askarbekkyzy , Shin Yoo

This dissertation presents an evaluation of several language models on software defect datasets. A language Model (LM) "can provide word representation and probability indication of word sequences as the core component of an NLP system."…

Software Engineering · Computer Science 2019-09-24 Kailun Wang

With the rapid adoption of large language models (LLMs) in automated code refactoring, assessing and ensuring functional equivalence between LLM-generated refactoring and the original implementation becomes critical. While prior work…

Software Engineering · Computer Science 2026-02-18 Simantika Bhattacharjee Dristi , Matthew B. Dwyer

Deep Learning (DL) library bugs affect downstream DL applications, emphasizing the need for reliable systems. Generating valid input programs for fuzzing DL libraries is challenging due to the need for satisfying both language…

Software Engineering · Computer Science 2023-04-05 Yinlin Deng , Chunqiu Steven Xia , Chenyuan Yang , Shizhuo Dylan Zhang , Shujing Yang , Lingming Zhang

In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their…

Computation and Language · Computer Science 2024-07-10 Tingyu Xia , Bowen Yu , Yuan Wu , Yi Chang , Chang Zhou

Recent frontier large language models (LLMs) have shown strong performance in identifying security vulnerabilities in large, mature open-source systems. As LLM-generated code becomes increasingly common, a natural goal is to prevent such…

Software Engineering · Computer Science 2026-05-13 Zhaorui Li , Chengyu Song

Large Language models (LLMs) can generate complicated source code from natural language prompts. However, LLMs can generate output that deviates from what the user wants, requiring supervision and editing. To support this process, we offer…

Software Engineering · Computer Science 2026-01-01 David Gros , Prem Devanbu

Large Language Models (LLMs) have shown impressive proficiency in code generation. Unfortunately, these models share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. These…

Cryptography and Security · Computer Science 2024-10-17 Kamel Alrashedy , Abdullah Aljasser , Pradyumna Tambwekar , Matthew Gombolay
‹ Prev 1 2 3 10 Next ›