Related papers: Software Model Checking via Large-Block Encoding
Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…
The emergence of Large Code Models (LCMs) has transformed software engineering (SE) automation, driving significant advancements in tasks such as code generation, source code documentation, code review, and bug fixing. However, these…
Large Language Models (LLMs) are transforming a wide range of domains, yet verifying their outputs remains a significant challenge, especially for complex open-ended tasks such as consolidation, summarization, and knowledge extraction. To…
Static analysis is the analysis of a program without executing it, usually carried out by an automated tool. Symbolic execution is a popular static analysis technique used both in program verification and in bug detection software. It works…
Boolean symbolic reasoning for gate-level netlists is a critical step in verification, logic and datapath synthesis, and hardware security. Specifically, reasoning datapath and adder tree in bit-blasted Boolean networks is particularly…
Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…
Large language models can generate plausible code, but remain brittle for formal verification in proof assistants such as Lean. A central scalability challenge is that verified synthesis requires consistent artifacts across several coupled…
Large Language Models (LLMs) are powerful candidates for complex decision-making, leveraging vast encoded knowledge and remarkable zero-shot abilities. However, their adoption in high-stakes environments is hindered by their opacity; their…
Predicate abstraction is a key enabling technology for applying finite-state model checkers to programs written in mainstream languages. It has been used very successfully for debugging sequential system-level C code. Although model…
Large language models (LLMs) are increasingly used to generate software artifacts across many software engineering (SE) tasks, yet ensuring the semantic validity of these artifacts remains a fundamental challenge. Existing constrained…
Speculative decoding is a technique to leverage hardware concurrency in order to enable multiple steps of token generation in a single forward pass, thus improving the efficiency of large-scale autoregressive (AR) Transformer models.…
As large language models (LLMs) become increasingly capable and widely adopted, benchmarks play a central role in assessing their practical utility. For example, SWE-Bench Verified has emerged as a critical benchmark for evaluating LLMs'…
Large language models (LLMs) are increasingly used for program verification, and yet little is known about \emph{how} they reason about program semantics during this process. In this work, we focus on abstract interpretation based-reasoning…
Formal verification offers a path to provably correct software, but writing verified code remains expensive enough that the technique is rarely used in production. Recent large language models can accelerate this work, and recent benchmarks…
Bounded model checking is among the most efficient techniques for the automatic verification of concurrent programs. However, encoding all possible interleavings often requires a huge and complex formula, which significantly limits the…
The complexity of software in embedded systems has increased significantly over the last years so that software verification now plays an important role in ensuring the overall product quality. In this context, SAT-based bounded model…
We present a technique for the automated verification of abstract models of multithreaded programs providing fresh name generation, name mobility, and unbounded control. As high level specification language we adopt here an extension of…
Large language models (LLMs) can generate code from natural language descriptions. Their performance is typically evaluated using programming benchmarks that simulate real-world tasks. These benchmarks provide specifications in the form of…
Due to the limitations of SCADE models in expressing and verifying abstract specifications in safety-critical systems, this study proposes a formal verification framework based on the B-Method. By establishing a semantic equivalence…
Large language models (LLMs) have demonstrated remarkable performance, yet their diverse strengths and weaknesses prevent any single LLM from achieving dominance across all tasks. Ensembling multiple LLMs is a promising approach to generate…