Related papers: Preference-based Teaching

Preference-Based Batch and Sequential Teaching

Algorithmic machine teaching studies the interaction between a teacher and a learner where the teacher selects labeled examples aiming at teaching a target hypothesis. In a quest to lower teaching complexity, several teaching models and…

Machine Learning · Computer Science 2020-10-21 Farnam Mansouri , Yuxin Chen , Ara Vartanian , Xiaojin Zhu , Adish Singla

Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes

In this work we study the quantitative relation between the recursive teaching dimension (RTD) and the VC dimension (VCD) of concept classes of finite sizes. The RTD of a concept class $\mathcal C \subseteq \{0, 1\}^n$, introduced by Zilles…

Machine Learning · Computer Science 2017-02-21 Lunjia Hu , Ruihan Wu , Tianhong Li , Liwei Wang

Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models

Algorithmic machine teaching studies the interaction between a teacher and a learner where the teacher selects labeled examples aiming at teaching a target hypothesis. In a quest to lower teaching complexity and to achieve more natural…

Machine Learning · Computer Science 2019-10-25 Farnam Mansouri , Yuxin Chen , Ara Vartanian , Xiaojin Zhu , Adish Singla

Online Learning with Preference Feedback

We propose a new online learning model for learning with preference feedback. The model is especially suited for applications like web search and recommender systems, where preference data is readily available from implicit user feedback…

Machine Learning · Computer Science 2011-11-04 Pannagadatta K. Shivaswamy , Thorsten Joachims

Finite Biased Teaching with Infinite Concept Classes

We investigate the teaching of infinite concept classes through the effect of the learning bias (which is used by the learner to prefer some concepts over others and by the teacher to devise the teaching examples) and the sampling bias…

Artificial Intelligence · Computer Science 2018-04-20 Jose Hernandez-Orallo , Jan Arne Telle

Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data

Large language models (LLMs) generally utilize a consistent data distribution throughout the pretraining process. However, as the model's capability improves, it is intuitive that its data preferences dynamically change, indicating the need…

Computation and Language · Computer Science 2025-02-18 Xuemiao Zhang , Liangyu Xu , Feiyu Duan , Yongwei Zhou , Sirui Wang , Rongxiang Weng , Jingang Wang , Xunliang Cai

Teaching and compressing for low VC-dimension

In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension.…

Machine Learning · Computer Science 2016-11-28 Shay Moran , Amir Shpilka , Avi Wigderson , Amir Yehudayoff

The complexity of unsupervised learning of lexicographic preferences

This paper considers the task of learning users' preferences on a combinatorial set of alternatives, as generally used by online configurators, for example. In many settings, only a set of selected alternatives during past interactions is…

Artificial Intelligence · Computer Science 2022-09-26 Hélène Fargier , Pierre-François Gimenez , Jérôme Mengin , Bao Ngoc Le Nguyen

Preference-based Reinforcement Learning with Finite-Time Guarantees

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or…

Machine Learning · Computer Science 2020-10-27 Yichong Xu , Ruosong Wang , Lin F. Yang , Aarti Singh , Artur Dubrawski

Recognising Multidimensional Euclidean Preferences

Euclidean preferences are a widely studied preference model, in which decision makers and alternatives are embedded in d-dimensional Euclidean space. Decision makers prefer those alternatives closer to them. This model, also known as…

Computer Science and Game Theory · Computer Science 2016-02-29 Dominik Peters

The Sample Complexity of Teaching-by-Reinforcement on Q-Learning

We study the sample complexity of teaching, termed as "teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm, where the teacher guides the student through rewards. This is distinct from the…

Machine Learning · Computer Science 2021-03-09 Xuezhou Zhang , Shubham Kumar Bharti , Yuzhe Ma , Adish Singla , Xiaojin Zhu

What Does Preference Learning Recover from Pairwise Comparison Data?

Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets $(x, y^+, y^-)$, where response $y^+$ is preferred over…

Machine Learning · Computer Science 2026-02-12 Rattana Pukdee , Maria-Florina Balcan , Pradeep Ravikumar

Preference-based Conditional Treatment Effects and Policy Learning

We introduce a new preference-based framework for conditional treatment effect estimation and policy learning, built on the Conditional Preference-based Treatment Effect (CPTE). CPTE requires only that outcomes be ranked under a preference…

Machine Learning · Statistics 2026-02-04 Dovid Parnas , Mathieu Even , Julie Josse , Uri Shalit

A Systematic Examination of Preference Learning through the Lens of Instruction-Following

Preference learning is a widely adopted post-training technique that aligns large language models (LLMs) to human preferences and improves specific downstream task capabilities. In this work we systematically investigate how specific…

Computation and Language · Computer Science 2024-12-23 Joongwon Kim , Anirudh Goyal , Aston Zhang , Bo Xiong , Rui Hou , Melanie Kambadur , Dhruv Mahajan , Hannaneh Hajishirzi , Liang Tan

Classifying the Arithmetical Complexity of Teaching Models

This paper classifies the complexity of various teaching models by their position in the arithmetical hierarchy. In particular, we determine the arithmetical complexity of the index sets of the following classes: (1) the class of uniformly…

Logic · Mathematics 2016-10-28 Achilles A. Beros , Ziyuan Gao , Sandra Zilles

Probabilistic Parameterized Polynomial Time

We examine a parameterized complexity class for randomized computation where only the error bound and not the full runtime is allowed to depend more than polynomially on the parameter, based on a proposal by Kwisthout in [15,16]. We prove…

Computational Complexity · Computer Science 2018-11-06 Nils Donselaar

PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) has emerged as a promising approach for learning behaviors from human feedback without predefined reward functions. However, current PbRL methods face a critical challenge in effectively…

Artificial Intelligence · Computer Science 2025-06-17 Brahim Driss , Alex Davey , Riad Akrour

Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences

Preference-Based reinforcement learning (PBRL) learns directly from the preferences of human teachers regarding agent behaviors without needing meticulously designed reward functions. However, existing PBRL methods often learn primarily…

Machine Learning · Computer Science 2024-10-16 Ziang Liu , Junjie Xu , Xingjiao Wu , Jing Yang , Liang He

Lower Bounds for Greedy Teaching Set Constructions

A fundamental open problem in learning theory is to characterize the best-case teaching dimension $\operatorname{TS}_{\min}$ of a concept class $\mathcal{C}$ with finite VC dimension $d$. Resolving this problem will, in particular, settle…

Machine Learning · Statistics 2025-05-07 Spencer Compton , Chirag Pabbaraju , Nikita Zhivotovskiy

A model of discrete choice based on reinforcement learning under short-term memory

A family of models of individual discrete choice are constructed by means of statistical averaging of choices made by a subject in a reinforcement learning process, where the subject has short, k-term memory span. The choice probabilities…

Econometrics · Economics 2019-08-20 Misha Perepelitsa