Related papers: GLISp-r: A preference-based optimization algorithm…

C-GLISp: Preference-Based Global Optimization under Unknown Constraints with Applications to Controller Calibration

Preference-based global optimization algorithms minimize an unknown objective function only based on whether the function is better, worse, or similar for given pairs of candidate optimization vectors. Such optimization problems arise in…

Optimization and Control · Mathematics 2021-12-21 Mengjia Zhu , Dario Piga , Alberto Bemporad

Regularized GLISp for sensor-guided human-in-the-loop optimization

Human-in-the-loop calibration is often addressed via preference-based optimization, where algorithms learn from pairwise comparisons rather than explicit cost evaluations. While effective, methods such as Preferential Bayesian Optimization…

Machine Learning · Computer Science 2025-11-10 Matteo Cercola , Michele Lomuscio , Dario Piga , Simone Formentin

Preference-based MPC calibration

Automating the calibration of the parameters of a control policy by means of global optimization requires quantifying a closed-loop performance function. As this can be impractical in many situations, in this paper we suggest a…

Optimization and Control · Mathematics 2021-05-27 Mengjia Zhu , Alberto Bemporad , Dario Piga

A unified surrogate-based scheme for black-box and preference-based optimization

Black-box and preference-based optimization algorithms are global optimization procedures that aim to find the global solutions of an optimization problem using, respectively, the least amount of function evaluations or sample comparisons…

Optimization and Control · Mathematics 2022-02-04 Davide Previtali , Mirko Mazzoleni , Antonio Ferramosca , Fabio Previdi

Active preference learning based on radial basis functions

This paper proposes a method for solving optimization problems in which the decision-maker cannot evaluate the objective function, but rather can only express a preference such as "this is better than that" between two candidate decision…

Machine Learning · Computer Science 2019-10-01 Alberto Bemporad , Dario Piga

Experience in Engineering Complex Systems: Active Preference Learning with Multiple Outcomes and Certainty Levels

Black-box optimization refers to the optimization problem whose objective function and/or constraint sets are either unknown, inaccessible, or non-existent. In many applications, especially with the involvement of humans, the only way to…

Machine Learning · Computer Science 2023-03-01 Le Anh Dao , Loris Roveda , Marco Maccarini , Matteo Lavit Nicora , Marta Mondellini , Matteo Meregalli Falerni , Palaniappan Veerappan , Lorenzo Mantovani , Dario Piga , Simone Formentin , Matteo Malosio

Preference-based optimization from noisy pairwise comparisons

In interactive systems, feedback is often provided in the form of preference between queried options rather than precise scores, which motivates optimization methods to learn from such comparisons. In this work, we propose a…

Optimization and Control · Mathematics 2025-12-23 Siyi Wang , Zifan Wang , Karl Henrik Johanssson

Towards Fast Algorithms for the Preference Consistency Problem Based on Hierarchical Models

In this paper, we construct and compare algorithmic approaches to solve the Preference Consistency Problem for preference statements based on hierarchical models. Instances of this problem contain a set of preference statements that are…

Logic in Computer Science · Computer Science 2024-11-01 Anne-Marie George , Nic Wilson , Barry O'Sullivan

Learning Optimal and Near-Optimal Lexicographic Preference Lists

We consider learning problems of an intuitive and concise preference model, called lexicographic preference lists (LP-lists). Given a set of examples that are pairwise ordinal preferences over a universe of objects built of attributes of…

Artificial Intelligence · Computer Science 2019-09-20 Ahmed Moussa , Xudong Liu

Computation and Complexity of Preference Inference Based on Hierarchical Models

Preference Inference involves inferring additional user preferences from elicited or observed preferences, based on assumptions regarding the form of the user's preference relation. In this paper we consider a situation in which…

Logic in Computer Science · Computer Science 2024-09-18 Nic Wilson , Anne-Marie George , Barry O'Sullivan

Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining…

Machine Learning · Computer Science 2018-02-22 Luisa M Zintgraf , Diederik M Roijers , Sjoerd Linders , Catholijn M Jonker , Ann Nowé

Leveraging Robust Optimization for LLM Alignment under Distribution Shifts

Preference alignment methods are increasingly critical for steering large language models (LLMs) to generate outputs consistent with human values. While recent approaches often rely on synthetic data generated by LLMs for scalability and…

Computation and Language · Computer Science 2025-10-21 Mingye Zhu , Yi Liu , Zheren Fu , Yongdong Zhang , Zhendong Mao

A tutorial on learning from preferences and choices with Gaussian Processes

Preference modelling lies at the intersection of economics, decision theory, machine learning and statistics. By understanding individuals' preferences and how they make choices, we can build products that closely match their expectations,…

Machine Learning · Computer Science 2026-05-19 Alessio Benavoli , Dario Azzimonti

Discovering Preference Optimization Algorithms with and for Large Language Models

Offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. Typically, preference optimization is approached as an offline supervised learning task using manually-crafted…

Machine Learning · Computer Science 2024-11-05 Chris Lu , Samuel Holt , Claudio Fanconi , Alex J. Chan , Jakob Foerster , Mihaela van der Schaar , Robert Tjarko Lange

Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains

The class of direct preference optimization (DPO) algorithms has emerged as a promising approach for solving the alignment problem in foundation models. These algorithms work with very limited feedback in the form of pairwise preferences…

Machine Learning · Computer Science 2026-02-03 Luca Viano , Ruida Zhou , Yifan Sun , Mahdi Namazifar , Volkan Cevher , Shoham Sabach , Mohammad Ghavamzadeh

Global and Preference-based Optimization with Mixed Variables using Piecewise Affine Surrogates

Optimization problems involving mixed variables (i.e., variables of numerical and categorical nature) can be challenging to solve, especially in the presence of mixed-variable constraints. Moreover, when the objective function is the result…

Optimization and Control · Mathematics 2024-12-12 Mengjia Zhu , Alberto Bemporad

Preference Optimization for Combinatorial Optimization Problems

Reinforcement Learning (RL) has emerged as a powerful tool for neural combinatorial optimization, enabling models to learn heuristics that solve complex problems without requiring expert knowledge. Despite significant progress, existing RL…

Machine Learning · Computer Science 2025-05-14 Mingjun Pan , Guanquan Lin , You-Wei Luo , Bin Zhu , Zhien Dai , Lijun Sun , Chun Yuan

Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making

It is challenging to quantify numerical preferences for different objectives in a multi-objective decision-making problem. However, the demonstrations of a user are often accessible. We propose an algorithm to infer linear preference…

Artificial Intelligence · Computer Science 2023-04-28 Junlin Lu

ROPO: Robust Preference Optimization for Large Language Models

Preference alignment is pivotal for empowering large language models (LLMs) to generate helpful and harmless responses. However, the performance of preference alignment is highly sensitive to the prevalent noise in the preference data.…

Machine Learning · Computer Science 2024-05-29 Xize Liang , Chao Chen , Shuang Qiu , Jie Wang , Yue Wu , Zhihang Fu , Zhihao Shi , Feng Wu , Jieping Ye

What is the Alignment Objective of GRPO?

In this note, we examine the aggregation of preferences achieved by the Group Policy Optimisation (GRPO) algorithm, a reinforcement learning method used to train advanced artificial intelligence models such as DeepSeek-R1-Zero and…

Machine Learning · Computer Science 2025-03-14 Milan Vojnovic , Se-Young Yun