Related papers: Sequential Learning without Feedback
In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions…
In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated…
This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…
This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…
Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS)…
We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between…
Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for cumulative regret minimization in the stochastic…
A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts…
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization…
We address the problem of learning to control an unknown nonlinear dynamical system through sequential interactions. Motivated by high-stakes applications in which mistakes can be catastrophic, such as robotics and healthcare, we study…
We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms'…
In high-stakes AI applications, even a single action can cause irreparable damage. However, nearly all of sequential decision-making theory assumes that all errors are recoverable (e.g., by bounding rewards). Standard bandit algorithms that…
We study the linear bandit problem that accounts for partially observable features. Without proper handling, unobserved features can lead to linear regret in the decision horizon $T$, as their influence on rewards is unknown. To tackle this…
In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…
We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the…
Linear bandits have a wide variety of applications including recommendation systems yet they make one strong assumption: the algorithms must know an upper bound $S$ on the norm of the unknown parameter $\theta^*$ that governs the reward…
We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators…
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly…
We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a…
We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic…