Related papers: Next Steps in LLM-Supported Java Verification

Automated Annotation with Generative AI Requires Validation

Generative large language models (LLMs) can be a powerful tool for augmenting text annotation procedures, but their performance varies across annotation tasks due to prompt quality, text data idiosyncrasies, and conceptual difficulty.…

Computation and Language · Computer Science 2023-06-02 Nicholas Pangakis , Samuel Wolken , Neil Fasching

A Short Survey on Formalising Software Requirements using Large Language Models

This paper presents a focused literature survey on the use of large language models (LLM) to assist in writing formal specifications for software. A summary of thirty-five key papers is presented, including examples for specifying programs…

Software Engineering · Computer Science 2025-06-16 Arshad Beg , Diarmuid O'Donoghue , Rosemary Monahan

Natural Language based Specification and Verification

Recent frontier large language models (LLMs) have shown strong performance in identifying security vulnerabilities in large, mature open-source systems. As LLM-generated code becomes increasingly common, a natural goal is to prevent such…

Software Engineering · Computer Science 2026-05-13 Zhaorui Li , Chengyu Song

Leveraging LLMs for Formal Software Requirements -- Challenges and Prospects

Software correctness is ensured mathematically through formal verification, which involves the resources of generating formal requirement specifications and having an implementation that must be verified. Tools such as model-checkers and…

Software Engineering · Computer Science 2025-08-29 Arshad Beg , Diarmuid O'Donoghue , Rosemary Monahan

(Security) Assertions by Large Language Models

The security of computer systems typically relies on a hardware root of trust. As vulnerabilities in hardware can have severe implications on a system, there is a need for techniques to support security verification activities.…

Cryptography and Security · Computer Science 2024-07-11 Rahul Kande , Hammond Pearce , Benjamin Tan , Brendan Dolan-Gavitt , Shailja Thakur , Ramesh Karri , Jeyavijayan Rajendran

Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?

Large Language Models (LLMs) have shown potential for solving mathematical tasks. We show that LLMs can be utilized to generate proofs by induction for hardware verification and thereby replace some of the manual work done by Formal…

Logic in Computer Science · Computer Science 2026-05-04 Romy Peled , Daniel Kroening , Michael Tautschnig , Yakir Vizel

Validating Formal Specifications with LLM-generated Test Cases

Validation is a central activity when developing formal specifications. Similarly to coding, a possible validation technique is to define upfront test cases or scenarios that a future specification should satisfy or not. Unfortunately,…

Software Engineering · Computer Science 2026-02-19 Alcino Cunha , Nuno Macedo

OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering

Large Language Models (LLMs) are increasingly used in empirical software engineering (ESE) to automate or assist annotation tasks such as labeling commits, issues, and qualitative artifacts. Yet the reliability and reproducibility of such…

Software Engineering · Computer Science 2026-01-27 Mia Mohammad Imran , Tarannum Shaila Zaman

A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants

Large language models (LLMs) can potentially help with verification using proof assistants by automating proofs. However, it is unclear how effective LLMs are in this task. In this paper, we perform a case study based on two mature Rocq…

Programming Languages · Computer Science 2025-08-27 Barış Bayazıt , Yao Li , Xujie Si

Generation of Programmatic Rules for Document Forgery Detection Using Large Language Models

Document forgery poses a growing threat to legal, economic, and governmental processes, requiring increasingly sophisticated verification mechanisms. One approach involves the use of plausibility checks, rule-based procedures that assess…

Artificial Intelligence · Computer Science 2025-12-23 Valentin Schmidberger , Manuel Eberhardinger , Setareh Maghsudi , Johannes Maucher

RE-oriented Model Development with LLM Support and Deduction-based Verification

The requirements engineering (RE) phase is pivotal in developing high-quality software. Integrating advanced modelling techniques with large language models (LLMs) and formal verification in a logical style can significantly enhance this…

Software Engineering · Computer Science 2025-06-11 Radoslaw Klimek

Combining LLM Code Generation with Formal Specifications and Reactive Program Synthesis

In the past few years, Large Language Models (LLMs) have exploded in usefulness and popularity for code generation tasks. However, LLMs still struggle with accuracy and are unsuitable for high-risk applications without additional oversight…

Software Engineering · Computer Science 2024-10-29 William Murphy , Nikolaus Holzer , Feitong Qiao , Leyi Cui , Raven Rothkopf , Nathan Koenig , Mark Santolucito

Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection

Large language models have achieved remarkable success on final-answer mathematical problems, largely due to the ease of applying reinforcement learning with verifiable rewards. However, the reasoning underlying these solutions is often…

Artificial Intelligence · Computer Science 2025-11-18 Sadegh Mahdavi , Branislav Kisacanin , Shubham Toshniwal , Wei Du , Ivan Moshkov , George Armstrong , Renjie Liao , Christos Thrampoulidis , Igor Gitman

Large Language Models for Unit Test Generation: Achievements, Challenges, and Opportunities

Automated unit test generation is critical for software quality but traditional structure-driven methods often lack the semantic understanding required to produce realistic inputs and oracles. Large language models (LLMs) address this…

Software Engineering · Computer Science 2026-01-01 Bei Chu , Yang Feng , Kui Liu , Zhaoqiang Guo , Yichi Zhang , Hange Shi , Zifan Nan , Baowen Xu

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

Automating hardware design could obviate a significant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating…

Programming Languages · Computer Science 2022-12-22 Shailja Thakur , Baleegh Ahmad , Zhenxing Fan , Hammond Pearce , Benjamin Tan , Ramesh Karri , Brendan Dolan-Gavitt , Siddharth Garg

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused…

Cryptography and Security · Computer Science 2025-02-14 Karl Tamberg , Hayretdin Bahsi

Are LLMs Reliable Code Reviewers? Systematic Overcorrection in Requirement Conformance Judgement

Large language models (LLMs) have become essential tools in software development, widely used for requirements engineering, code generation and review tasks. Software engineers often rely on LLMs to verify if code implementation satisfy…

Software Engineering · Computer Science 2026-03-03 Haolin Jin , Huaming Chen

Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing

A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content, including grammatically precise sentences, human-like paragraphs, and syntactically accurate code snippets. LLMs can play…

Software Engineering · Computer Science 2023-12-11 Robson Santos , Italo Santos , Cleyton Magalhaes , Ronnie de Souza Santos

Formal Methods Meets Readability: Auto-Documenting JML Java Code

This paper investigates whether formal specifications using Java Modeling Language (JML) can enhance the quality of Large Language Model (LLM)-generated Javadocs. While LLMs excel at producing documentation from code alone, we hypothesize…

Software Engineering · Computer Science 2025-06-12 Juan Carlos Recio Abad , Ruben Saborido , Francisco Chicano

Do LLMs generate test oracles that capture the actual or the expected program behaviour?

Software testing is an essential part of the software development cycle to improve the code quality. Typically, a unit test consists of a test prefix and a test oracle which captures the developer's intended behaviour. A known limitation of…

Software Engineering · Computer Science 2024-10-29 Michael Konstantinou , Renzo Degiovanni , Mike Papadakis