Related papers: Explaining Counterexamples with Giant-Step Asserti…
Applying deductive verification to formally prove that a program respects its formal specification is a very complex and time-consuming task due in particular to the lack of feedback in case of proof failures. Along with a non-compliance…
Satisfiability solving is a common technique for formal verification forming the basis of many proof and model checking systems. Failure to show a proof obligation will produce a counterexample or failure trace with typically many thousands…
In software verification, a successful automated program proof is the ultimate triumph. The road to such success is, however, paved with many failed proof attempts. The message produced by the prover when a proof fails is often obscure,…
A reliable technique for deductive program verification should be proven sound with respect to the semantics of the programming language. For each different language, the construction of a separate soundness proof is often a laborious…
Test or prove? These two approaches to software verification have long been presented as opposites. One is dynamic, the other static: a test executes the program, a proof only analyzes the program text. A different perspective is emerging,…
Satisfiability-based automated reasoning is an approach that is being successfully used in software engineering to validate complex software, including for safety-critical systems. Such reasoning underlies many validation activities, from…
Software verification is a tedious process that involves the analysis of multiple failed verification attempts, and adjustments of the program or specification. This is especially the case for complex requirements, e.g., regarding security…
Designing correct replicated data types (RDTs) is challenging because replicas evolve independently and must be merged while preserving application intent. A promising approach is correct-by-construction development in a proof-oriented…
Programmers often use an iterative process of hypothesis generation ("perhaps this function is called twice?") and hypothesis testing ("let's count how many times this breakpoint fires") to understand the behavior of unfamiliar or…
Software artifacts are ubiquitous in our lives being an essential part of home appliances, cars, cel phones, and even in more critical activities like aeronautics and health sciences. In this context software failures may produce enormous…
Statistical methods have been widely misused and misinterpreted in various scientific fields, raising significant concerns about the integrity of scientific research. To mitigate this problem, we propose a tool-assisted method for formally…
Recently, given the docstring for the target problem and the target function signature, large language models (LLMs) have been used not only to generate source code, but also to generate test cases, consisting of test inputs and assertions…
We present a tool for verification of deterministic programs with shared mutable references against specifications such as assertions, preconditions, postconditions, and read/write effects. We implement our tool by encoding programs with…
The challenge of automatically determining the correctness of test executions is referred to as the test oracle problem and is one of the key remaining issues for automated testing. The goal in this paper is to solve the test oracle problem…
Even competent programmers make mistakes. Automatic verification can detect errors, but leaves the frustrating task of finding the erroneous line of code to the user. This paper presents an automatic approach for identifying potential error…
Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We introduce a controlled diagnostic…
In Multi-Criteria Decision Analysis, Rank Reversals are a serious problem that can greatly affect the results of a Multi-Criteria Decision Method against a particular set of alternatives. It is therefore useful to have a mechanism that…
Fundamentally, every static program analyser searches for a proof through a combination of heuristics providing candidate solutions and a candidate validation technique. Essentially, the heuristic reduces a second-order problem to a…
Outcome-rewarded Large Language Models (LLMs) have demonstrated remarkable success in mathematical problem-solving. However, this success often masks a critical issue: models frequently achieve correct answers through fundamentally unsound…
Although the halting problem is undecidable, imperfect testers that fail on some instances are possible. Such instances are called hard for the tester. One variant of imperfect testers replies "I don't know" on hard instances, another…