Related papers: AutoReSpec: A Framework for Generating Specificati…

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view…

Software Engineering · Computer Science 2024-04-03 Cheng Wen , Jialun Cao , Jie Su , Zhiwu Xu , Shengchao Qin , Mengda He , Haokun Li , Shing-Chi Cheung , Cong Tian

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models

Formal program specifications play a crucial role in various stages of software development. However, manually crafting formal program specifications is rather difficult, making the job time-consuming and labor-intensive. It is even more…

Software Engineering · Computer Science 2025-02-26 Lezhi Ma , Shangqing Liu , Yi Li , Xiaofei Xie , Lei Bu

Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation

Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have _natural language_ specifications, from which test cases are derived manually - a process that is slow,…

Software Engineering · Computer Science 2025-11-25 Kuangxiangzi Liu , Dhiman Chakraborty , Alexander Liggesmeyer , Andreas Zeller

How Effective are Large Language Models in Generating Software Specifications?

Software specifications are essential for many Software Engineering (SE) tasks such as bug detection and test generation. Many existing approaches are proposed to extract the specifications defined in natural language form (e.g., comments)…

Software Engineering · Computer Science 2025-02-11 Danning Xie , Byungwoo Yoo , Nan Jiang , Mijung Kim , Lin Tan , Xiangyu Zhang , Judy S. Lee

AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, with code generation emerging as a key area of focus. While numerous benchmarks have been proposed to evaluate their code generation abilities,…

Computation and Language · Computer Science 2025-08-13 Jason Chou , Ao Liu , Yuchi Deng , Zhiying Zeng , Tao Zhang , Haotian Zhu , Jianwei Cai , Yue Mao , Chenchen Zhang , Lingyun Tan , Ziyan Xu , Bohui Zhai , Hengyi Liu , Speed Zhu , Wiggin Zhou , Fengzong Lian

SpecSyn: LLM-based Synthesis and Refinement of Formal Specifications for Real-world Program Verification

Program verification is a formal technique to rigorously ensure the correctness and fault-freeness of software systems. However, constructing comprehensive interprocedural specifications for full verification obligations is time-consuming…

Software Engineering · Computer Science 2026-04-24 Lezhi Ma , Shangqing Liu , Yi Li , Qiong Wu , Han Wang , Lei Bu

Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software

The paper studies how code generation by LLMs can be combined with formal verification to produce critical embedded software. The first contribution is a general framework, spec2code, in which LLMs are combined with different types of…

Software Engineering · Computer Science 2024-11-21 Minal Suresh Patil , Gustav Ung , Mattias Nyberg

Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference

Large Language Models (LLMs) are increasingly being used to automate programming tasks. Yet, LLMs' capabilities in reasoning about program semantics are still inadequately studied, leaving significant potential for further exploration. This…

Programming Languages · Computer Science 2025-05-30 Thanh Le-Cong , Bach Le , Toby Murray

RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area. Existing benchmarks often fall short by relying on synthetic…

Cryptography and Security · Computer Science 2026-02-02 Yanlin Wang , Ziyao Zhang , Chong Wang , Xinyi Xu , Mingwei Liu , Yong Wang , Jiachi Chen , Zibin Zheng

VerifyThisBench: Generating Code, Specifications, and Proofs All at Once

Large language models (LLMs) have demonstrated remarkable progress in code generation, but many existing benchmarks are approaching saturation and offer little guarantee on the trustworthiness of the generated programs. To improve…

Software Engineering · Computer Science 2025-10-08 Xun Deng , Sicheng Zhong , Barış Bayazıt , Andreas Veneris , Fan Long , Xujie Si

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce StructEval, a comprehensive benchmark for evaluating LLMs'…

Software Engineering · Computer Science 2026-04-06 Jialin Yang , Dongfu Jiang , Lipeng He , Sherman Siu , Yuxuan Zhang , Disen Liao , Zhuofeng Li , Huaye Zeng , Yiming Jia , Haozhe Wang , Benjamin Schneider , Chi Ruan , Wentao Ma , Zhiheng Lyu , Yifei Wang , Yi Lu , Quy Duc Do , Ziyan Jiang , Ping Nie , Wenhu Chen

Doc2Spec: Synthesizing Formal Programming Specifications from Natural Language via Grammar Induction

Ensuring that API implementations and usage comply with natural language programming rules is critical for software correctness, security, and reliability. Formal verification can provide strong guarantees but requires precise…

Programming Languages · Computer Science 2026-02-06 Shihao Xia , Mengting He , Haomin Jia , Linhai Song

CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design

Functional simulation is an essential step in digital hardware design. Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for hardware testbench generation tasks. However, the inherent instability…

Software Engineering · Computer Science 2024-11-14 Ruidi Qiu , Grace Li Zhang , Rolf Drechsler , Ulf Schlichtmann , Bing Li

Assessing Large Language Models in Generating RTL Design Specifications

As IC design grows more complex, automating comprehension and documentation of RTL code has become increasingly important. Engineers currently should manually interpret existing RTL code and write specifications, a slow and error-prone…

Hardware Architecture · Computer Science 2025-12-02 Hung-Ming Huang , Yu-Hsin Yang , Fu-Chieh Chang , Yun-Chia Hsu , Yin-Yu Lin , Ming-Fang Tsai , Chun-Chih Yang , Pei-Yuan Wu

Evaluating LLMs on Sequential API Call Through Automated Test Generation

By integrating tools from external APIs, Large Language Models (LLMs) have expanded their promising capabilities in a diverse spectrum of complex real-world tasks. However, testing, evaluation, and analysis of LLM tool use remain in their…

Software Engineering · Computer Science 2025-12-03 Yuheng Huang , Jiayang Song , Da Song , Zhenlan Ji , Wenhan Wang , Shuai Wang , Lei Ma

LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation

Formal specification is essential for rigorous program verification, yet writing correct specifications remains costly and difficult to automate. Although large language models (LLMs) and agents have shown promising progress, their true…

Software Engineering · Computer Science 2026-05-05 Dong Xu , Jialun Cao , Guozhao Mo , Junjie Hu , Cheng Wen , Hongyu Lin , Xianpei Han , Shengchao Qin , Cong Tian , Shing-Chi Cheung , Le Sun , Yaojie Lu

Intent-aligned Formal Specification Synthesis via Traceable Refinement

Large language models are increasingly used to generate code from natural language, but ensuring correctness remains challenging. Formal verification offers a principled way to obtain such guarantees by proving that a program satisfies a…

Machine Learning · Computer Science 2026-04-14 Zhe Ye , Aidan Z. H. Yang , Huangyuan Su , Zhenyu Liao , Samuel Tenka , Zhizhen Qin , Udaya Ghai , Dawn Song , Soonho Kong

Breaking the Myth: Can Small Models Infer Postconditions Too?

Formal specifications are essential for ensuring software correctness, yet manually writing them is tedious and error-prone. Large Language Models (LLMs) have shown promise in generating such specifications from natural language intents,…

Software Engineering · Computer Science 2025-07-15 Gehao Zhang , Zhenting Wang , Juan Zhai

AutoSpec: Automated Generation of Neural Network Specifications

The increasing adoption of neural networks in learning-augmented systems highlights the importance of model safety and robustness, particularly in safety-critical domains. Despite progress in the formal verification of neural networks,…

Machine Learning · Computer Science 2024-10-25 Shuowei Jin , Francis Y. Yan , Cheng Tan , Anuj Kalia , Xenofon Foukas , Z. Morley Mao

CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation

Large language models (LLMs) can generate code from natural language, but the extent to which they capture intended program behavior remains unclear. Executable behavioral specifications, defined via preconditions and postconditions,…

Software Engineering · Computer Science 2026-04-15 Zaoyu Chen , Jianbo Dai , Boyu Zhu , Jingdong Wang , Huiming Wang , Xin Xu , Haoyang Yuan , Zhijiang Guo , Xiao-Ming Wu