English
Related papers

Related papers: Preference-based optimization from noisy pairwise …

200 papers

As large language models (LLMs) advance their capabilities, aligning these models with human preferences has become crucial. Preference optimization, which trains models to distinguish between preferred and non-preferred responses based on…

Machine Learning · Computer Science 2026-02-02 Shawn Im , Sharon Li

Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of verbosity and likelihood displacement, which can be driven by the noisy…

Computation and Language · Computer Science 2025-10-28 Peter Chen , Xi Chen , Wotao Yin , Tianyi Lin

Optimizing policies based on human preferences is key to aligning language models with human intent. This work focuses on reward modeling, a core component in reinforcement learning from human feedback (RLHF), and offline preference…

Machine Learning · Computer Science 2025-06-02 Soichiro Nishimori , Yu-Jie Zhang , Thanawat Lodkaew , Masashi Sugiyama

Preference-based optimization algorithms are iterative procedures that seek the optimal calibration of a decision vector based only on comparisons between couples of different tunings. At each iteration, a human decision-maker expresses a…

Optimization and Control · Mathematics 2023-10-03 Davide Previtali , Mirko Mazzoleni , Antonio Ferramosca , Fabio Previdi

User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic…

Machine Learning · Computer Science 2023-11-27 Tanya Ignatenko , Kirill Kondrashov , Marco Cox , Bert de Vries

Optimization with preference feedback is an active research area with many applications in engineering systems where humans play a central role, such as building control and autonomous vehicles. While most existing studies focus on…

Optimization and Control · Mathematics 2026-03-31 Wenbin Wang , Wenjie Xu , Colin N. Jones

It is challenging to quantify numerical preferences for different objectives in a multi-objective decision-making problem. However, the demonstrations of a user are often accessible. We propose an algorithm to infer linear preference…

Artificial Intelligence · Computer Science 2023-04-28 Junlin Lu

We address the problem of convex optimization with preference feedback, where the goal is to minimize a convex function given a weaker form of comparison queries. Each query consists of two points and the dueling feedback returns a (noisy)…

Machine Learning · Computer Science 2023-12-20 Aadirupa Saha , Vitaly Feldman , Tomer Koren , Yishay Mansour

This paper considers a time-varying optimization problem associated with a network of systems, with each of the systems shared by (and affecting) a number of individuals. The objective is to minimize cost functions associated with the…

Optimization and Control · Mathematics 2022-03-15 Ana M. Ospina , Andrea Simonetto , Emiliano Dall'Anese

We propose a new online learning model for learning with preference feedback. The model is especially suited for applications like web search and recommender systems, where preference data is readily available from implicit user feedback…

Machine Learning · Computer Science 2011-11-04 Pannagadatta K. Shivaswamy , Thorsten Joachims

Preference feedback, in the form of pairwise comparisons rather than scalar scores, has seen increasing use in applications such as human-, laboratory-, and expert-in-the-loop design, as well as scientific discovery. We propose a Thompson…

Machine Learning · Statistics 2026-04-29 Joseph Lazzaro , Davide Buffelli , Da-shan Shiu , Sattar Vakili

A key requirement in developing Generative Language Models (GLMs) is to have their values aligned with human values. Preference-based alignment is a widely used paradigm for this purpose, in which preferences over generation pairs are first…

Computation and Language · Computer Science 2024-04-16 Yang Gao , Dana Alon , Donald Metzler

In this paper, we consider large-scale ranking problems where one is given a set of (possibly non-redundant) pairwise comparisons and the underlying ranking explained by those comparisons is desired. We show that stochastic gradient descent…

Optimization and Control · Mathematics 2024-07-04 Benjamin Jarman , Lara Kassab , Deanna Needell , Alexander Sietsema

A preference-based subjective evaluation is a key method for evaluating generative media reliably. However, its huge combinations of pairs prohibit it from being applied to large-scale evaluation using crowdsourcing. To address this issue,…

Human-Computer Interaction · Computer Science 2024-03-12 Yusuke Yasuda , Tomoki Toda

Preferential Bayesian optimization (PBO) learns latent utilities from pairwise comparisons, but most existing methods assume homoscedastic comparison noise. This is inadequate in human-in-the-loop settings, where a user may compare some…

Machine Learning · Computer Science 2026-05-19 Marshal Arijona Sinaga , Julien Martinelli , Samuel Kaski

There has been a recent surge of interest in studying permutation-based models for ranking from pairwise comparison data. Despite being structurally richer and more robust than parametric ranking models, permutation-based models are less…

Machine Learning · Statistics 2017-10-31 Cheng Mao , Jonathan Weed , Philippe Rigollet

Preferential Bayesian optimization allows optimization of objectives that are either expensive or difficult to measure directly, by relying on a minimal number of comparative evaluations done by a human expert. Generating candidate…

The class of direct preference optimization (DPO) algorithms has emerged as a promising approach for solving the alignment problem in foundation models. These algorithms work with very limited feedback in the form of pairwise preferences…

Machine Learning · Computer Science 2026-02-03 Luca Viano , Ruida Zhou , Yifan Sun , Mahdi Namazifar , Volkan Cevher , Shoham Sabach , Mohammad Ghavamzadeh

Derivative Free Optimization is known to be an efficient and robust method to tackle the black-box optimization problem. When it comes to noisy functions, classical comparison-based algorithms are slower than gradient-based algorithms. For…

Optimization and Control · Mathematics 2016-04-29 Marie-Liesse Cauwet , Olivier Teytaud

Algorithmic recommendation based on noisy preference measurement is prevalent in recommendation systems. This paper discusses the consequences of such recommendation on market concentration and inequality. Binary types denoting a…

Theoretical Economics · Economics 2025-10-21 Andreas Haupt
‹ Prev 1 2 3 10 Next ›