Related papers: Online Learning with Preference Feedback

Active Reward Learning from Online Preferences

Robot policies need to adapt to human preferences and/or new environments. Human experts may have the domain knowledge required to help robots achieve this adaptation. However, existing works often require costly offline re-training on…

Machine Learning · Computer Science 2023-02-28 Vivek Myers , Erdem Bıyık , Dorsa Sadigh

Learning Preferences for Manipulation Tasks from Online Coactive Feedback

We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than simple geometric constraints on trajectories; they…

Robotics · Computer Science 2016-01-06 Ashesh Jain , Shikhar Sharma , Thorsten Joachims , Ashutosh Saxena

Preference-based Search using Example-Critiquing with Suggestions

We consider interactive tools that help users search for their most preferred item in a large collection of options. In particular, we examine example-critiquing, a technique for enabling users to incrementally construct preference models…

Artificial Intelligence · Computer Science 2011-10-04 B. Faltings , P. Pu , P. Viappiani

Representation Learning and Pairwise Ranking for Implicit Feedback in Recommendation Systems

In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random…

Machine Learning · Statistics 2021-09-15 Sumit Sidana , Mikhail Trofimov , Oleg Horodnitskii , Charlotte Laclau , Yury Maximov , Massih-Reza Amini

Learning Trajectory Preferences for Manipulators via Iterative Improvement

We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online…

Robotics · Computer Science 2015-01-30 Ashesh Jain , Brian Wojcik , Thorsten Joachims , Ashutosh Saxena

Online Reciprocal Recommendation with Theoretical Performance Guarantees

A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e.g., a book), but to recommend the targeted user on one side another user from the other side such…

Machine Learning · Computer Science 2018-06-05 Fabio Vitale , Nikos Parotsidis , Claudio Gentile

Loss Aversion in Recommender Systems: Utilizing Negative User Preference to Improve Recommendation Quality

Negative user preference is an important context that is not sufficiently utilized by many existing recommender systems. This context is especially useful in scenarios where the cost of negative items is high for the users. In this work, we…

Information Retrieval · Computer Science 2021-02-19 Bibek Paudel , Sandro Luck , Abraham Bernstein

Debiasing Online Preference Learning via Preference Feature Preservation

Recent preference learning frameworks for large language models (LLMs) simplify human preferences with binary pairwise comparisons and scalar rewards. This simplification could make LLMs' responses biased to mostly preferred features, and…

Machine Learning · Computer Science 2025-06-16 Dongyoung Kim , Jinsung Yoon , Jinwoo Shin , Jaehyung Kim

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2023-01-05 Daniel Shin , Anca D. Dragan , Daniel S. Brown

Offline Preference-Based Apprenticeship Learning

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2022-02-18 Daniel Shin , Daniel S. Brown , Anca D. Dragan

Coactive Critiquing: Elicitation of Preferences and Features

When faced with complex choices, users refine their own preference criteria as they explore the catalogue of options. In this paper we propose an approach to preference elicitation suited for this scenario. We extend Coactive Learning,…

Artificial Intelligence · Computer Science 2016-12-07 Stefano Teso , Paolo Dragone , Andrea Passerini

Learning Reward Functions from Scale Feedback

Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While…

Robotics · Computer Science 2021-10-04 Nils Wilde , Erdem Bıyık , Dorsa Sadigh , Stephen L. Smith

Optimal Design for Human Preference Elicitation

Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study efficient human preference elicitation for…

Machine Learning · Computer Science 2026-02-17 Subhojyoti Mukherjee , Anusha Lalitha , Kousha Kalantari , Aniket Deshmukh , Ge Liu , Yifei Ma , Branislav Kveton

Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback

For summarization, human preference is critical to tame outputs of the summarizer in favor of human interests, as ground-truth summaries are scarce and ambiguous. Practical settings require dynamic exchanges between human and AI agent…

Artificial Intelligence · Computer Science 2022-05-13 Duy-Hung Nguyen , Nguyen Viet Dung Nghiem , Bao-Sinh Nguyen , Dung Tien Le , Shahab Sabahi , Minh-Tien Nguyen , Hung Le

Online Policy Learning from Offline Preferences

In preference-based reinforcement learning (PbRL), a reward function is learned from a type of human feedback called preference. To expedite preference collection, recent works have leveraged \emph{offline preferences}, which are…

Machine Learning · Computer Science 2024-03-18 Guoxi Zhang , Han Bao , Hisashi Kashima

Online Learning to Rank with Top-k Feedback

We consider two settings of online learning to rank where feedback is restricted to top ranked items. The problem is cast as an online game between a learner and sequence of users, over $T$ rounds. In both settings, the learners objective…

Machine Learning · Computer Science 2016-08-24 Sougata Chaudhuri , Ambuj Tewari

Online Learning to Rank with Features

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter. Only…

Machine Learning · Statistics 2019-05-28 Shuai Li , Tor Lattimore , Csaba Szepesvári

A First Look at Selection Bias in Preference Elicitation for Recommendation

Preference elicitation explicitly asks users what kind of recommendations they would like to receive. It is a popular technique for conversational recommender systems to deal with cold-starts. Previous work has studied selection bias in…

Information Retrieval · Computer Science 2024-05-02 Shashank Gupta , Harrie Oosterhuis , Maarten de Rijke

Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative…

Machine Learning · Computer Science 2025-12-05 Andreas Schlaginhaufen , Reda Ouhamma , Maryam Kamgarpour

Online Learning and Profit Maximization from Revealed Preferences

We consider the problem of learning from revealed preferences in an online setting. In our framework, each period a consumer buys an optimal bundle of goods from a merchant according to her (linear) utility function and current prices,…

Data Structures and Algorithms · Computer Science 2014-12-02 Kareem Amin , Rachel Cummings , Lili Dworkin , Michael Kearns , Aaron Roth