Related papers: SAT-DIFF: A Tree Diffing Framework Using SAT Solvi…
Code generation is increasingly critical for real-world applications. Still, diffusion-based large language models continue to struggle with this demand. Unlike free-form text, code requires syntactic precision; even minor structural…
Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST…
Software undergoes constant changes to support new requirements, address bugs, enhance performance, and ensure maintainability. Thus, developers spend a great portion of their workday trying to understand and review the code changes of…
This paper describes diff-SAT, an Answer Set and SAT solver which combines regular solving with the capability to use probabilistic clauses, facts and rules, and to sample an optimal world-view (multiset of satisfying Boolean variable…
In this paper we propose, implement, and test the first practical decomposition algorithms for the width parameters treecut width and treedepth. These two parameters have recently gained a lot of attention in the theoretical research…
Answer Set Programming (ASP) is a paradigm for modeling and solving problems for knowledge representation and reasoning. There are plenty of results dedicated to studying the hardness of (fragments of) ASP. So far, these studies resulted in…
A source code difference (diff) indicates changes made by comparing new and old source codes, and it can be utilized in code reviews to help developers understand the changes made to the code. Although many diff generation methods have been…
The boolean satisfiability (SAT) problem asks whether there exists an assignment of boolean values to the variables of an arbitrary boolean formula making the formula evaluate to True. It is well-known that all NP-problems can be coded as…
We present a novel algorithm for the minimum-depth elimination tree problem, which is equivalent to the optimal treedepth decomposition problem. Our algorithm makes use of two cheaply-computed lower bound functions to prune the search tree,…
This paper addresses the Data-Diff problem: given a dataset and a subsequent version of the dataset, find the shortest sequence of operations that transforms the dataset to the subsequent version, under a restricted family of operations. We…
Decision tree models, including random forests and gradient-boosted decision trees, are widely used in machine learning due to their high predictive performance. However, their complex structures often make them difficult to interpret,…
In this study, we present an incremental machine learning framework called Adaptive Decision Forest (ADF), which produces a decision forest to classify new records. Based on our two novel theorems, we introduce a new splitting strategy…
Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to…
Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has…
Decision trees are a widely used method for classification, both by themselves and as the building blocks of multiple different ensemble learning methods. The Max-Cut decision tree involves novel modifications to a standard, baseline model…
Weighted Max-SAT is the optimization version of SAT and many important problems can be naturally encoded as such. Solving weighted Max-SAT is an important problem from both a theoretical and a practical point of view. In recent years, there…
Phylogenetic trees are leaf-labelled trees used to model the evolution of species. In practice it is not uncommon to obtain two topologically distinct trees for the same set of species, and this motivates the use of distance measures to…
Computer programs, so-called solvers, for solving the well-known Boolean satisfiability problem (Sat) have been improving for decades. Among the reasons, why these solvers are so fast, is the implicit usage of the formula's structural…
Bridging logical reasoning and deep learning is crucial for advanced AI systems. In this work, we present a new framework that addresses this goal by generating interpretable and verifiable logical rules through differentiable learning,…
Semi-supervised image segmentation has attracted great attention recently. The key is how to leverage unlabeled images in the training process. Most methods maintain consistent predictions of the unlabeled images under variations (e.g.,…