Related papers: Mode Estimation with Partial Feedback

Contextual bandits with entropy-based human feedback

In recent years, preference-based human feedback mechanisms have become essential for enhancing model performance across diverse applications, including conversational AI systems such as ChatGPT. However, existing approaches often neglect…

Artificial Intelligence · Computer Science 2025-02-14 Raihan Seraj , Lili Meng , Tristan Sylvain

Sample-efficient estimation of entanglement entropy through supervised learning

We explore a supervised machine learning approach to estimate the entanglement entropy of multi-qubit systems from few experimental samples. We put a particular focus on estimating both aleatoric and epistemic uncertainty of the network's…

Quantum Physics · Physics 2024-01-04 Maximilian Rieger , Moritz Reh , Martin Gärttner

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

Efficient Online Conformal Selection with Limited Feedback

We address the problem of conformal selection, where an agent must select a minimal subset of options to ensure that at least one ``success'' is identified with a pre-specified target probability $\phi$. While traditional online conformal…

Machine Learning · Computer Science 2026-05-15 Sreenivas Gollapudi , Kostas Kollias , Kamesh Munagala , Ali Sinop

Being Patient and Persistent: Optimizing An Early Stopping Strategy for Deep Learning in Profiled Attacks

The absence of an algorithm that effectively monitors deep learning models used in side-channel attacks increases the difficulty of evaluation. If the attack is unsuccessful, the question is if we are dealing with a resistant implementation…

Cryptography and Security · Computer Science 2021-11-30 Servio Paguada , Lejla Batina , Ileana Buhan , Igor Armendariz

Robust Class-Conditional Distribution Alignment for Partial Domain Adaptation

Unwanted samples from private source categories in the learning objective of a partial domain adaptation setup can lead to negative transfer and reduce classification performance. Existing methods, such as re-weighting or aggregating target…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Sandipan Choudhuri , Arunabha Sen

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent…

Machine Learning · Computer Science 2023-11-15 Johannes Kirschner , Tor Lattimore , Andreas Krause

Human Activity Learning and Segmentation using Partially Hidden Discriminative Models

Learning and understanding the typical patterns in the daily activities and routines of people from low-level sensory data is an important problem in many application domains such as building smart environments, or providing intelligent…

Machine Learning · Computer Science 2014-08-14 Truyen Tran , Hung Bui , Svetha Venkatesh

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

Long-term training of large language models (LLMs) requires maintaining stable exploration to prevent the model from collapsing into sub-optimal behaviors. Entropy is crucial in this context, as it controls exploration and helps avoid…

Machine Learning · Computer Science 2026-02-03 Kai Yang , Xin Xu , Yangkun Chen , Weijie Liu , Jiafei Lyu , Zichuan Lin , Deheng Ye , Saiyong Yang

Online Variance Reduction for Stochastic Optimization

Modern stochastic optimization methods often rely on uniform sampling which is agnostic to the underlying characteristics of the data. This might degrade the convergence by yielding estimates that suffer from a high variance. A possible…

Machine Learning · Statistics 2018-06-07 Zalán Borsos , Andreas Krause , Kfir Y. Levy

Stochastic Online Conformal Prediction with Semi-Bandit Feedback

Conformal prediction has emerged as an effective strategy for uncertainty quantification by modifying a model to output sets of labels instead of a single label. These prediction sets come with the guarantee that they contain the true label…

Machine Learning · Computer Science 2025-05-28 Haosen Ge , Hamsa Bastani , Osbert Bastani

Causal Bandits: Learning Good Interventions via Causal Inference

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit…

Machine Learning · Statistics 2016-06-13 Finnian Lattimore , Tor Lattimore , Mark D. Reid

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

We present and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where…

Machine Learning · Computer Science 2014-10-01 Noga Alon , Nicolò Cesa-Bianchi , Claudio Gentile , Shie Mannor , Yishay Mansour , Ohad Shamir

Dynamic Feedback Engines: Layer-Wise Control for Self-Regulating Continual Learning

Continual learning aims to acquire new tasks while preserving performance on previously learned ones, but most methods struggle with catastrophic forgetting. Existing approaches typically treat all layers uniformly, often trading stability…

Machine Learning · Computer Science 2025-12-29 Hengyi Wu , Zhenyi Wang , Heng Huang

IRL with Partial Observations using the Principle of Uncertain Maximum Entropy

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world…

Machine Learning · Computer Science 2022-08-16 Kenneth Bogert , Yikang Gui , Prashant Doshi

Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization

Uncertainty quantification is crucial in safety-critical systems, where decisions must be made under uncertainty. In particular, we consider the problem of online uncertainty quantification, where data points arrive sequentially. Online…

Machine Learning · Computer Science 2026-04-21 Junyoung Yang , Kyungmin Kim , Sangdon Park

Utility-based Dueling Bandits as a Partial Monitoring Game

Partial monitoring is a generic framework for sequential decision-making with incomplete feedback. It encompasses a wide class of problems such as dueling bandits, learning with expect advice, dynamic pricing, dark pools, and label…

Machine Learning · Computer Science 2024-06-27 Pratik Gajane , Tanguy Urvoy

Semantic Segmentation with Active Semi-Supervised Learning

Using deep learning, we now have the ability to create exceptionally good semantic segmentation systems; however, collecting the prerequisite pixel-wise annotations for training images remains expensive and time-consuming. Therefore, it…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Aneesh Rangnekar , Christopher Kanan , Matthew Hoffman

Non-stochastic Bandits With Evolving Observations

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the…

Machine Learning · Computer Science 2024-05-28 Yogev Bar-On , Yishay Mansour

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is…

Machine Learning · Computer Science 2022-10-17 Jasmin Brandt , Viktor Bengs , Björn Haddenhorst , Eyke Hüllermeier