Related papers: Preference-based optimization from noisy pairwise …

How Well Can Preference Optimization Generalize Under Noisy Feedback?

As large language models (LLMs) advance their capabilities, aligning these models with human preferences has become crucial. Preference optimization, which trains models to distinguish between preferred and non-preferred responses based on…

Machine Learning · Computer Science 2026-02-02 Shawn Im , Sharon Li

ComPO: Preference Alignment via Comparison Oracles

Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of verbosity and likelihood displacement, which can be driven by the noisy…

Computation and Language · Computer Science 2025-10-28 Peter Chen , Xi Chen , Wotao Yin , Tianyi Lin

On Symmetric Losses for Robust Policy Optimization with Noisy Preferences

Optimizing policies based on human preferences is key to aligning language models with human intent. This work focuses on reward modeling, a core component in reinforcement learning from human feedback (RLHF), and offline preference…

Machine Learning · Computer Science 2025-06-02 Soichiro Nishimori , Yu-Jie Zhang , Thanawat Lodkaew , Masashi Sugiyama

GLISp-r: A preference-based optimization algorithm with convergence guarantees

Preference-based optimization algorithms are iterative procedures that seek the optimal calibration of a decision vector based only on comparisons between couples of different tunings. At each iteration, a human decision-maker expresses a…

Optimization and Control · Mathematics 2023-10-03 Davide Previtali , Mirko Mazzoleni , Antonio Ferramosca , Fabio Previdi

On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison

User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic…

Machine Learning · Computer Science 2023-11-27 Tanya Ignatenko , Kirill Kondrashov , Marco Cox , Bert de Vries

Human-in-the-loop: Real-time Preference Optimization

Optimization with preference feedback is an active research area with many applications in engineering systems where humans play a central role, such as building control and autonomous vehicles. While most existing studies focus on…

Optimization and Control · Mathematics 2026-03-31 Wenbin Wang , Wenjie Xu , Colin N. Jones

Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making

It is challenging to quantify numerical preferences for different objectives in a multi-objective decision-making problem. However, the demonstrations of a user are often accessible. We propose an algorithm to infer linear preference…

Artificial Intelligence · Computer Science 2023-04-28 Junlin Lu

Faster Convergence with Multiway Preferences

We address the problem of convex optimization with preference feedback, where the goal is to minimize a convex function given a weaker form of comparison queries. Each query consists of two points and the dueling feedback returns a (noisy)…

Machine Learning · Computer Science 2023-12-20 Aadirupa Saha , Vitaly Feldman , Tomer Koren , Yishay Mansour

Time-Varying Optimization of Networked Systems with Human Preferences

This paper considers a time-varying optimization problem associated with a network of systems, with each of the systems shared by (and affecting) a number of individuals. The objective is to minimize cost functions associated with the…

Optimization and Control · Mathematics 2022-03-15 Ana M. Ospina , Andrea Simonetto , Emiliano Dall'Anese

Online Learning with Preference Feedback

We propose a new online learning model for learning with preference feedback. The model is especially suited for applications like web search and recommender systems, where preference data is readily available from implicit user feedback…

Machine Learning · Computer Science 2011-11-04 Pannagadatta K. Shivaswamy , Thorsten Joachims

A Finite Time Analysis of Thompson Sampling for Bayesian Optimization with Preferential Feedback

Preference feedback, in the form of pairwise comparisons rather than scalar scores, has seen increasing use in applications such as human-, laboratory-, and expert-in-the-loop design, as well as scientific discovery. We propose a Thompson…

Machine Learning · Statistics 2026-04-29 Joseph Lazzaro , Davide Buffelli , Da-shan Shiu , Sattar Vakili

Impact of Preference Noise on the Alignment Performance of Generative Language Models

A key requirement in developing Generative Language Models (GLMs) is to have their values aligned with human values. Preference-based alignment is a widely used paradigm for this purpose, in which preferences over generation pairs are first…

Computation and Language · Computer Science 2024-04-16 Yang Gao , Dana Alon , Donald Metzler

Stochastic Iterative Methods for Online Rank Aggregation from Pairwise Comparisons

In this paper, we consider large-scale ranking problems where one is given a set of (possibly non-redundant) pairwise comparisons and the underlying ranking explained by those comparisons is desired. We show that stochastic gradient descent…

Optimization and Control · Mathematics 2024-07-04 Benjamin Jarman , Lara Kassab , Deanna Needell , Alexander Sietsema

Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment

A preference-based subjective evaluation is a key method for evaluating generative media reliably. However, its huge combinations of pairs prohibit it from being applied to large-scale evaluation using crowdsourcing. To address this issue,…

Human-Computer Interaction · Computer Science 2024-03-12 Yusuke Yasuda , Tomoki Toda

Anchor-Based Heteroscedastic Noise for Preferential Bayesian Optimization

Preferential Bayesian optimization (PBO) learns latent utilities from pairwise comparisons, but most existing methods assume homoscedastic comparison noise. This is inadequate in human-in-the-loop settings, where a user may compare some…

Machine Learning · Computer Science 2026-05-19 Marshal Arijona Sinaga , Julien Martinelli , Samuel Kaski

Minimax Rates and Efficient Algorithms for Noisy Sorting

There has been a recent surge of interest in studying permutation-based models for ranking from pairwise comparison data. Despite being structurally richer and more robust than parametric ranking models, permutation-based models are less…

Machine Learning · Statistics 2017-10-31 Cheng Mao , Jonathan Weed , Philippe Rigollet

Consecutive Preferential Bayesian Optimization

Preferential Bayesian optimization allows optimization of objectives that are either expensive or difficult to measure directly, by relying on a minimal number of comparative evaluations done by a human expert. Generating candidate…

Machine Learning · Computer Science 2025-11-10 Aras Erarslan , Carlos Sevilla Salcedo , Ville Tanskanen , Anni Nisov , Eero Päiväkumpu , Heikki Aisala , Kaisu Honkapää , Arto Klami , Petrus Mikkola

Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains

The class of direct preference optimization (DPO) algorithms has emerged as a promising approach for solving the alignment problem in foundation models. These algorithms work with very limited feedback in the form of pairwise preferences…

Machine Learning · Computer Science 2026-02-03 Luca Viano , Ruida Zhou , Yifan Sun , Mahdi Namazifar , Volkan Cevher , Shoham Sabach , Mohammad Ghavamzadeh

Noisy Optimization: Fast Convergence Rates with Comparison-Based Algorithms

Derivative Free Optimization is known to be an efficient and robust method to tackle the black-box optimization problem. When it comes to noisy functions, classical comparison-based algorithms are slower than gradient-based algorithms. For…

Optimization and Control · Mathematics 2016-04-29 Marie-Liesse Cauwet , Olivier Teytaud

Preference Measurement Error, Concentration in Recommendation Systems, and Persuasion

Algorithmic recommendation based on noisy preference measurement is prevalent in recommendation systems. This paper discusses the consequences of such recommendation on market concentration and inequality. Binary types denoting a…

Theoretical Economics · Economics 2025-10-21 Andreas Haupt