Related papers: Evaluating LLM-driven User-Intent Formalization fo…

Dafny as Verification-Aware Intermediate Language for Code Generation

Using large language models (LLMs) to generate source code from natural language prompts is a popular and promising idea with a wide range of applications. One of its limitations is that the generated code can be faulty at times, often in a…

Software Engineering · Computer Science 2025-01-14 Yue Chen Li , Stefan Zetzsche , Siva Somayyajula

A Short Survey on Formalising Software Requirements using Large Language Models

This paper presents a focused literature survey on the use of large language models (LLM) to assist in writing formal specifications for software. A summary of thirty-five key papers is presented, including examples for specifying programs…

Software Engineering · Computer Science 2025-06-16 Arshad Beg , Diarmuid O'Donoghue , Rosemary Monahan

From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

Large Language Models (LLMs) show promise in automated software engineering, yet their guarantee of correctness is frequently undermined by erroneous or hallucinated code. To enforce model honesty, formal verification requires LLMs to…

Software Engineering · Computer Science 2026-04-27 Md Erfan , Md Kamal Hossain Chowdhury , Ahmed Ryan , Md Rayhanur Rahman

Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

Existing informal language-based (e.g., human language) Large Language Models (LLMs) trained with Reinforcement Learning (RL) face a significant challenge: their verification processes, which provide crucial training signals, are neither…

Computation and Language · Computer Science 2025-10-14 Chuanhao Yan , Fengdi Che , Xuhan Huang , Xu Xu , Xin Li , Yizhi Li , Xingwei Qu , Jingzhe Shi , Chenghua Lin , Yaodong Yang , Binhang Yuan , Hang Zhao , Yu Qiao , Bowen Zhou , Jie Fu

Leveraging LLMs for Formal Software Requirements -- Challenges and Prospects

Software correctness is ensured mathematically through formal verification, which involves the resources of generating formal requirement specifications and having an implementation that must be verified. Tools such as model-checkers and…

Software Engineering · Computer Science 2025-08-29 Arshad Beg , Diarmuid O'Donoghue , Rosemary Monahan

Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents

Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise…

Software Engineering · Computer Science 2026-03-19 Shuvendu K. Lahiri

Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny

Students in computing education increasingly use large language models (LLMs) such as ChatGPT. Yet, the role of LLMs in supporting cognitively demanding tasks, like deductive program verification, remains poorly understood. This paper…

Software Engineering · Computer Science 2025-09-09 Carolina Carreira , Álvaro Silva , Alexandre Abreu , Alexandra Mendes

Natural Language based Specification and Verification

Recent frontier large language models (LLMs) have shown strong performance in identifying security vulnerabilities in large, mature open-source systems. As LLM-generated code becomes increasingly common, a natural goal is to prevent such…

Software Engineering · Computer Science 2026-05-13 Zhaorui Li , Chengyu Song

Case studies of development of verified programs with Dafny for accessibility assessment

Formal verification techniques aim at formally proving the correctness of a computer program with respect to a formal specification, but the expertise and effort required for applying formal specification and verification techniques and…

Software Engineering · Computer Science 2023-01-10 João Pascoal Faria , Rui Abreu

A benchmark for vericoding: formally verified program synthesis

We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications - in contrast to vibe coding, which generates potentially buggy code from a natural language description. Our…

Software Engineering · Computer Science 2025-09-30 Sergiu Bursuc , Theodore Ehrenborg , Shaowei Lin , Lacramioara Astefanoaei , Ionel Emilian Chiosa , Jure Kukovec , Alok Singh , Oliver Butterley , Adem Bizid , Quinn Dougherty , Miranda Zhao , Max Tan , Max Tegmark

Leveraging Large Language Models to Boost Dafny's Developers Productivity

This research idea paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers. Although the use of verification-aware languages, such as Dafny, has increased considerably in the last decade, these…

Software Engineering · Computer Science 2024-01-03 Álvaro Silva , Alexandra Mendes , João F. Ferreira

Assured Automatic Programming via Large Language Models

With the advent of AI-based coding engines, it is possible to convert natural language requirements to executable code in standard programming languages. However, AI-generated code can be unreliable, and the natural language requirements…

Software Engineering · Computer Science 2024-11-06 Martin Mirchev , Andreea Costea , Abhishek Kr Singh , Abhik Roychoudhury

Automatic Generation of Formal Specification and Verification Annotations Using LLMs and Test Oracles

Recent verification tools aim to make formal verification more accessible to software engineers by automating most of the verification process. However, annotating conventional programs with the formal specification and verification…

Software Engineering · Computer Science 2026-01-21 João Pascoal Faria , Emanuel Trigo , Vinicius Honorato , Rui Abreu

dafny-annotator: AI-Assisted Verification of Dafny Programs

Formal verification has the potential to drastically reduce software bugs, but its high additional cost has hindered large-scale adoption. While Dafny presents a promise to significantly reduce the effort to write verified programs, users…

Software Engineering · Computer Science 2024-11-26 Gabriel Poesia , Chloe Loughridge , Nada Amin

DafnyBench: A Benchmark for Formal Software Verification

We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification. We test the ability of LLMs such as GPT-4 and Claude 3 to auto-generate enough hints for the…

Software Engineering · Computer Science 2024-06-13 Chloe Loughridge , Qinyi Sun , Seth Ahrenbach , Federico Cassano , Chuyue Sun , Ying Sheng , Anish Mudide , Md Rakib Hossain Misu , Nada Amin , Max Tegmark

Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving

Large Language Models (LLMs) have demonstrated formidable capabilities in solving mathematical problems, yet they may still commit logical reasoning and computational errors during the problem-solving process. Thus, this paper proposes a…

Artificial Intelligence · Computer Science 2025-05-28 Kuo Zhou , Lu Zhang

Can Large Language Models Model Programs Formally?

In the digital age, ensuring the correctness, safety, and reliability of software through formal verification is paramount, particularly as software increasingly underpins critical infrastructure. Formal verification, split into theorem…

Software Engineering · Computer Science 2026-04-03 Zhiyong Chen , Jialun Cao , Jiarong Wu , Chang Xu , Shing-Chi Cheung

Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference

Large Language Models (LLMs) are increasingly being used to automate programming tasks. Yet, LLMs' capabilities in reasoning about program semantics are still inadequately studied, leaving significant potential for further exploration. This…

Programming Languages · Computer Science 2025-05-30 Thanh Le-Cong , Bach Le , Toby Murray

Can LLMs Enable Verification in Mainstream Programming?

Although formal methods are capable of producing reliable software, they have seen minimal adoption in everyday programming. Automatic code generation using large language models is becoming increasingly widespread, but it rarely considers…

Software Engineering · Computer Science 2025-03-19 Aleksandr Shefer , Igor Engel , Stanislav Alekseev , Daniil Berezun , Ekaterina Verbitskaia , Anton Podkopaev

Towards AI-Assisted Synthesis of Verified Dafny Methods

Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models…

Software Engineering · Computer Science 2024-06-12 Md Rakib Hossain Misu , Cristina V. Lopes , Iris Ma , James Noble