Related papers: Term Orders for Optimistic Lambda-Superposition
We introduce a reduction order called the weighted path order (WPO) that subsumes many existing reduction orders. WPO compares weights of terms as in the Knuth-Bendix order (KBO), while WPO allows weights to be computed by a wide class of…
The superposition calculus for reasoning in first-order logic with equality relies on simplification orderings on terms. Modern saturation provers use the Knuth-Bendix order (KBO) and the lexicographic path order (LPO) for discovering…
The $\lambda$-superposition calculus is a successful approach to proving higher-order formulas. However, some parts of the calculus are extremely explosive, notably due to the higher-order unifier enumeration and the functional…
We introduce refutationally complete superposition calculi for intentional and extensional clausal $\lambda$-free higher-order logic, two formalisms that allow partial application and applied variables. The calculi are parameterized by a…
This paper presents two new approaches to prove termination of rewrite systems with the Knuth-Bendix order efficiently. The constraints for the weight function and for the precedence are encoded in (pseudo-)propositional logic and the…
To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance…
We introduce two extensions of the $\lambda$-calculus with a probabilistic choice operator, $\Lambda_\oplus^{cbv}$ and $\Lambda_\oplus^{cbn}$, modeling respectively call-by-value and call-by-name probabilistic computation. We prove that…
In this paper we present a novel termination order the {\em predicative lexicographic path order} (PLPO for short), a syntactic restriction of the lexicographic path order. As well as lexicographic path orders, several non-trivial primitive…
Recent alignment methods based on Direct Preference Optimization (DPO) reformulate preference learning as supervised optimization over pairwise comparisons, offering improved efficiency and stability over reinforcement learning from human…
Group Relative Policy Optimization(GRPO) has become a cornerstone of modern reinforcement learning alignment, prized for its efficacy in foregoing an explicit value-critic by leveraging reward normalization across sampled trajectory…
Robust optimization (RO) is one of the key paradigms for solving optimization problems affected by uncertainty. Two principal approaches for RO, the robust counterpart method and the adversarial approach, potentially lead to excessively…
We designed a superposition calculus for a clausal fragment of extensional polymorphic higher-order logic that includes anonymous functions but excludes Booleans. The inference rules work on $\beta\eta$-equivalence classes of…
This seminar report is concerned with expressing LPO-termination of term rewrite systems as a satisfiability problem in propositional logic. After relevant algorithms are explained, experimental results are reported.
At the forefront of state-of-the-art human alignment methods are preference optimization methods (*PO). Prior research has often concentrated on identifying the best-performing method, typically involving a grid search over hyperparameters,…
Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. The…
First, we extend Leifer-Milner RPO theory, by giving general conditions to obtain IPO labelled transition systems (and bisimilarities) with a reduced set of transitions, and possibly finitely branching. Moreover, we study the weak variant…
We define sound and adequate denotational and operational semantics for the stochastic lambda calculus. These two semantic approaches build on previous work that used similar techniques to reason about higher-order probabilistic programs,…
Large language models (LLMs) require careful alignment to balance competing objectives - factuality, safety, conciseness, proactivity, and diversity. Existing studies focus on individual techniques or specific dimensions, lacking a holistic…
This paper aims at carrying out termination proofs for simply typed higher-order calculi automatically by using ordering comparisons. To this end, we introduce the computability path ordering (CPO), a recursive relation on terms obtained by…
The rapid development of large language model (LLM) alignment algorithms has resulted in a complex and fragmented landscape, with limited clarity on the effectiveness of different methods and their inter-connections. This paper introduces…