Related papers: Explaining Counterexamples with Giant-Step Asserti…

Your Proof Fails? Testing Helps to Find the Reason

Applying deductive verification to formally prove that a program respects its formal specification is a very complex and time-consuming task due in particular to the lack of feedback in case of proof failures. Along with a non-compliance…

Software Engineering · Computer Science 2015-08-10 Guillaume Petiot , Nikolai Kosmatov , Bernard Botella , Alain Giorgetti , Jacques Julliand

Why does it fail? Explanation of verification failures

Satisfiability solving is a common technique for formal verification forming the basis of many proof and model checking systems. Failure to show a proof obligation will produce a counterexample or failure trace with typically many thousands…

Logic in Computer Science · Computer Science 2026-03-24 Lars-Henrik Eriksson

Improving Counterexample Quality from Failed Program Verification

In software verification, a successful automated program proof is the ultimate triumph. The road to such success is, however, paved with many failed proof attempts. The message produced by the prover when a proof fails is often obscure,…

Software Engineering · Computer Science 2022-08-29 Li Huang , Bertrand Meyer , Manuel Oriol

Reasoning about Iteration and Recursion Uniformly based on Big-step Semantics

A reliable technique for deductive program verification should be proven sound with respect to the semantics of the programming language. For each different language, the construction of a separate soundness proof is often a laborious…

Programming Languages · Computer Science 2021-08-05 Ximeng Li , Qianying Zhang , Guohui Wang , Zhiping Shi , Yong Guan

Combining Tests and Proofs for Better Software Verification

Test or prove? These two approaches to software verification have long been presented as opposites. One is dynamic, the other static: a test executes the program, a proof only analyzes the program text. A different perspective is emerging,…

Software Engineering · Computer Science 2026-02-10 Li Huang , Bertrand Meyer , Manuel Oriol

Diagnosis via Proofs of Unsatisfiability for First-Order Logic with Relational Objects

Satisfiability-based automated reasoning is an approach that is being successfully used in software engineering to validate complex software, including for safety-critical systems. Such reasoning underlies many validation activities, from…

Software Engineering · Computer Science 2024-09-17 Nick Feng , Lina Marsso , Marsha Chechik

Understanding Counterexamples for Relational Properties with DIbugger

Software verification is a tedious process that involves the analysis of multiple failed verification attempts, and adjustments of the program or specification. This is especially the case for complex requirements, e.g., regarding security…

Software Engineering · Computer Science 2019-07-10 Mihai Herda , Michael Kirsten , Etienne Brunner , Joana Plewnia , Ulla Scheler , Chiara Staudenmaier , Benedikt Wagner , Pascal Zwick , Bernhard Beckert

Sal: Multi-modal Verification of Replicated Data Types

Designing correct replicated data types (RDTs) is challenging because replicas evolve independently and must be merged while preserving application intent. A promising approach is correct-by-construction development in a proof-oriented…

Programming Languages · Computer Science 2026-03-31 Pranav Ramesh , Vimala Soundarapandian , KC Sivaramakrishnan

Tracers for debugging and program exploration

Programmers often use an iterative process of hypothesis generation ("perhaps this function is called twice?") and hypothesis testing ("let's count how many times this breakpoint fires") to understand the behavior of unfamiliar or…

Programming Languages · Computer Science 2026-04-14 Shardul Chiplunkar , Clément Pit-Claudel

Integrating deduction and model finding in a language independent setting

Software artifacts are ubiquitous in our lives being an essential part of home appliances, cars, cel phones, and even in more critical activities like aeronautics and health sciences. In this context software failures may produce enormous…

Software Engineering · Computer Science 2022-06-16 Carlos Gustavo Lopez Pombo , Agustín Eloy Martinez Suñé

StatWhy: Formal Verification Tool for Statistical Hypothesis Testing Programs

Statistical methods have been widely misused and misinterpreted in various scientific fields, raising significant concerns about the integrity of scientific research. To mitigate this problem, we propose a tool-assisted method for formally…

Software Engineering · Computer Science 2025-06-05 Yusuke Kawamoto , Kentaro Kobayashi , Kohei Suenaga

DeCon: Detecting Incorrect Assertions via Postconditions Generated by a Large Language Model

Recently, given the docstring for the target problem and the target function signature, large language models (LLMs) have been used not only to generate source code, but also to generate test cases, consisting of test inputs and assertions…

Software Engineering · Computer Science 2025-01-07 Hao Yu , Tianyu Chen , Jiaming Huang , Zongyang Li , Dezhi Ran , Xinyu Wang , Ying Li , Assaf Marron , David Harel , Yuan Xie , Tao Xie

Proving and Disproving Programs with Shared Mutable Data

We present a tool for verification of deterministic programs with shared mutable references against specifications such as assertions, preconditions, postconditions, and read/write effects. We implement our tool by encoding programs with…

Logic in Computer Science · Computer Science 2021-03-16 Georg Schmid , Viktor Kunčak

Learning to Encode and Classify Test Executions

The challenge of automatically determining the correctness of test executions is referred to as the test oracle problem and is one of the key remaining issues for automated testing. The goal in this paper is to solve the test oracle problem…

Software Engineering · Computer Science 2023-10-03 Foivos Tsimpourlas , Ajitha Rajan , Miltiadis Allamanis

Automatic Error Localization for Software using Deductive Verification

Even competent programmers make mistakes. Automatic verification can detect errors, but leaves the frustrating task of finding the erroneous line of code to the user. This paper presents an automatic approach for identifying potential error…

Logic in Computer Science · Computer Science 2014-09-17 Robert Koenighofer , Ronald Toegl , Roderick Bloem

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

Large language models (LLMs) often achieve strong performance on reasoning benchmarks, but final-answer accuracy alone does not show whether they faithfully execute the procedure specified in a prompt. We introduce a controlled diagnostic…

Computation and Language · Computer Science 2026-05-26 Sailesh Panda , Pritam Kadasi , Abhishek Upperwal , Mayank Singh

Algorithmic Detection of Rank Reversals, Transitivity Violations, and Decomposition Inconsistencies in Multi-Criteria Decision Analysis

In Multi-Criteria Decision Analysis, Rank Reversals are a serious problem that can greatly affect the results of a Multi-Criteria Decision Method against a particular set of alternatives. It is therefore useful to have a mechanism that…

Artificial Intelligence · Computer Science 2025-08-04 Agustín Borda , Juan Bautista Cabral , Gonzalo Giarda , Diego Nicolás Gimenez Irusta , Paula Pacheco , Alvaro Roy Schachner

Second-Order Propositional Satisfiability

Fundamentally, every static program analyser searches for a proof through a combination of heuristics providing candidate solutions and a candidate validation technique. Essentially, the heuristic reduces a second-order problem to a…

Logic in Computer Science · Computer Science 2015-01-20 Cristina David , Daniel Kroening , Matt Lewis

Right Is Not Enough: The Pitfalls of Outcome Supervision in Training LLMs for Math Reasoning

Outcome-rewarded Large Language Models (LLMs) have demonstrated remarkable success in mathematical problem-solving. However, this success often masks a critical issue: models frequently achieve correct answers through fundamentally unsound…

Computation and Language · Computer Science 2025-06-25 Jiaxing Guo , Wenjie Yang , Shengzhong Zhang , Tongshan Xu , Lun Du , Da Zheng , Zengfeng Huang

Asymptotic Proportion of Hard Instances of the Halting Problem

Although the halting problem is undecidable, imperfect testers that fail on some instances are possible. Such instances are called hard for the tester. One variant of imperfect testers replies "I don't know" on hard instances, another…

Logic in Computer Science · Computer Science 2014-12-01 Antti Valmari