Related papers: Sequential Learning without Feedback

Online Algorithm for Unsupervised Sensor Selection

In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions…

Machine Learning · Computer Science 2019-03-05 Arun Verma , Manjesh K. Hanawal , Csaba Szepesvári , Venkatesh Saligrama

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated…

Machine Learning · Computer Science 2020-10-26 Arun Verma , Manjesh K. Hanawal , Csaba Szepesvári , Venkatesh Saligrama

Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2022-12-23 Arun Verma

Synopsis: Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2023-01-30 Arun Verma

Thompson Sampling for Unsupervised Sequential Selection

Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS)…

Machine Learning · Computer Science 2020-09-17 Arun Verma , Manjesh K. Hanawal , Nandyala Hemachandra

Weak Signal Asymptotics for Sequentially Randomized Experiments

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between…

Statistics Theory · Mathematics 2023-06-26 Xu Kuang , Stefan Wager

Towards Costless Model Selection in Contextual Bandits: A Bias-Variance Perspective

Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for cumulative regret minimization in the stochastic…

Machine Learning · Computer Science 2023-10-25 Sanath Kumar Krishnamurthy , Adrienne Margaret Propp , Susan Athey

A Sensing Policy Based on Confidence Bounds and a Restless Multi-Armed Bandit Model

A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts…

Information Theory · Computer Science 2012-11-20 Jan Oksanen , Visa Koivunen , H. Vincent Poor

Bayesian Design Principles for Frequentist Sequential Learning

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization…

Machine Learning · Computer Science 2024-02-12 Yunbei Xu , Assaf Zeevi

Logarithmic Regret for Nonlinear Control

We address the problem of learning to control an unknown nonlinear dynamical system through sequential interactions. Motivated by high-stakes applications in which mistakes can be catastrophic, such as robotics and healthcare, we study…

Machine Learning · Computer Science 2025-04-14 James Wang , Bruce D. Lee , Ingvar Ziemann , Nikolai Matni

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms'…

Machine Learning · Computer Science 2020-06-23 Kumar Ashutosh , Jayakrishnan Nair , Anmol Kagrecha , Krishna Jagannathan

Learning When Not to Learn: Risk-Sensitive Abstention in Bandits with Unbounded Rewards

In high-stakes AI applications, even a single action can cause irreparable damage. However, nearly all of sequential decision-making theory assumes that all errors are recoverable (e.g., by bounding rewards). Standard bandit algorithms that…

Machine Learning · Computer Science 2026-04-14 Sarah Liaw , Benjamin Plaut

Linear Bandits with Partially Observable Features

We study the linear bandit problem that accounts for partially observable features. Without proper handling, unobserved features can lead to linear regret in the decision horizon $T$, as their influence on rewards is unknown. To tackle this…

Machine Learning · Statistics 2025-08-19 Wonyoung Kim , Sungwoo Park , Garud Iyengar , Assaf Zeevi , Min-hwan Oh

Data Consistency for Weakly Supervised Learning

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Cheap Bandits

We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the…

Machine Learning · Computer Science 2015-06-22 Manjesh Kumar Hanawal , Venkatesh Saligrama , Michal Valko , R\' emi Munos

Norm-Agnostic Linear Bandits

Linear bandits have a wide variety of applications including recommendation systems yet they make one strong assumption: the algorithms must know an upper bound $S$ on the norm of the unknown parameter $\theta^*$ that governs the reward…

Machine Learning · Statistics 2022-05-04 Spencer , Gales , Sunder Sethuraman , Kwang-Sung Jun

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators…

Machine Learning · Computer Science 2026-03-30 Alberto Rumi , Andrew Jacobsen , Nicolò Cesa-Bianchi , Fabio Vitale

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly…

Machine Learning · Statistics 2013-07-29 Mohammad Gheshlaghi Azar , Alessandro Lazaric , Emma Brunskill

Non-Stochastic Control with Bandit Feedback

We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a…

Machine Learning · Computer Science 2020-08-14 Paula Gradu , John Hallman , Elad Hazan

Corruption-robust exploration in episodic reinforcement learning

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic…

Machine Learning · Computer Science 2023-11-02 Thodoris Lykouris , Max Simchowitz , Aleksandrs Slivkins , Wen Sun