Related papers: Learning to Format Coq Code Using Language Models
Coding conventions for naming, spacing, and other essentially stylistic properties are necessary for developers to effectively understand, review, and modify source code in large software projects. Consistent conventions in verification…
For performance and verification in machine learning, new methods have recently been proposed that optimise learning systems to satisfy formally expressed logical properties. Among these methods, differentiable logics (DLs) are used to…
In the realm of formal theorem proving, the Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness. Despite the advances in artificial intelligence and machine learning, the…
Matching logic is a formalism for specifying, and reasoning about, mathematical structures, using patterns and pattern matching. Growing in popularity, it has been used to define many logical systems such as separation logic with recursive…
CoqQ is a framework for reasoning about quantum programs in the Coq proof assistant. Its main components are: a deeply embedded quantum programming language, in which classic quantum algorithms are easily expressed, and an expressive…
Dependently typed programming languages such as Coq, Agda, Idris, and F*, allow programmers to write detailed specifications of their programs and prove their programs meet these specifications. However, these specifications can be violated…
Standard Large Language Model (LLM) pre-training typically treats corpora as flattened token sequences, often overlooking the real-world context that humans naturally rely on to contextualize information. To bridge this gap, we introduce…
Source code is usually formatted with elements like indentation and newlines to improve readability for human developers. However, these visual aids do not seem to be beneficial for large language models (LLMs) in the same way since the…
General-purpose language models are trained to produce varied natural language outputs, but for some tasks, like annotation or classification, we need more specific output formats. LLM systems increasingly support structured output, which…
While loops are present in virtually all imperative programming languages. They are important both for practical reasons (performing a number of iterations not known in advance) and theoretical reasons (achieving Turing completeness). In…
To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code…
What provides the highest level of assurance for correctness of execution within a programming language? One answer, and our solution in particular, to this problem is to provide a formalization for, if it exists, the denotational semantics…
Programmable Logic Controllers (PLC) and its programming standard IEC 61131-3 are widely used in embedded systems for the industrial automation domain. We propose a framework for the formal treatment of PLC based on the IEC 61131-3…
Large language models that enhance software development tasks, such as code generation, code completion, and code question answering (QA), have been extensively studied in both academia and the industry. The models are integrated into…
Code corpora, as observed in large software systems, are now known to be far more repetitive and predictable than natural language corpora. But why? Does the difference simply arise from the syntactic limitations of programming languages?…
Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we…
We describe a formalization of higher-order rewriting theory and formally prove that an AFS is strongly normalizing if it can be interpreted in a well-founded domain. To do so, we use Coq, which is a proof assistant based on dependent type…
Recent advancements in Unified Multimodal Models (UMMs) have significantly advanced text-to-image (T2I) generation, particularly through the integration of Chain-of-Thought (CoT) reasoning. However, existing CoT-based T2I methods largely…
Dependently typed languages such as Coq are used to specify and verify the full functional correctness of source programs. Type-preserving compilation can be used to preserve these specifications and proofs of correctness through…
Context: Software programs can be written in different but functionally equivalent ways. Even though previous research has compared specific formatting elements to find out which alternatives affect code legibility, seeing the bigger…