English
Related papers

Related papers: Sequential Learning without Feedback

200 papers

In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions…

Machine Learning · Computer Science 2019-03-05 Arun Verma , Manjesh K. Hanawal , Csaba Szepesvári , Venkatesh Saligrama

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated…

Machine Learning · Computer Science 2020-10-26 Arun Verma , Manjesh K. Hanawal , Csaba Szepesvári , Venkatesh Saligrama

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2022-12-23 Arun Verma

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2023-01-30 Arun Verma

Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS)…

Machine Learning · Computer Science 2020-09-17 Arun Verma , Manjesh K. Hanawal , Nandyala Hemachandra

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between…

Statistics Theory · Mathematics 2023-06-26 Xu Kuang , Stefan Wager

Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for cumulative regret minimization in the stochastic…

Machine Learning · Computer Science 2023-10-25 Sanath Kumar Krishnamurthy , Adrienne Margaret Propp , Susan Athey

A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts…

Information Theory · Computer Science 2012-11-20 Jan Oksanen , Visa Koivunen , H. Vincent Poor

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization…

Machine Learning · Computer Science 2024-02-12 Yunbei Xu , Assaf Zeevi

We address the problem of learning to control an unknown nonlinear dynamical system through sequential interactions. Motivated by high-stakes applications in which mistakes can be catastrophic, such as robotics and healthcare, we study…

Machine Learning · Computer Science 2025-04-14 James Wang , Bruce D. Lee , Ingvar Ziemann , Nikolai Matni

We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms'…

Machine Learning · Computer Science 2020-06-23 Kumar Ashutosh , Jayakrishnan Nair , Anmol Kagrecha , Krishna Jagannathan

In high-stakes AI applications, even a single action can cause irreparable damage. However, nearly all of sequential decision-making theory assumes that all errors are recoverable (e.g., by bounding rewards). Standard bandit algorithms that…

Machine Learning · Computer Science 2026-04-14 Sarah Liaw , Benjamin Plaut

We study the linear bandit problem that accounts for partially observable features. Without proper handling, unobserved features can lead to linear regret in the decision horizon $T$, as their influence on rewards is unknown. To tackle this…

Machine Learning · Statistics 2025-08-19 Wonyoung Kim , Sungwoo Park , Garud Iyengar , Assaf Zeevi , Min-hwan Oh

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the…

Machine Learning · Computer Science 2015-06-22 Manjesh Kumar Hanawal , Venkatesh Saligrama , Michal Valko , R\' emi Munos

Linear bandits have a wide variety of applications including recommendation systems yet they make one strong assumption: the algorithms must know an upper bound $S$ on the norm of the unknown parameter $\theta^*$ that governs the reward…

Machine Learning · Statistics 2022-05-04 Spencer , Gales , Sunder Sethuraman , Kwang-Sung Jun

We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators…

Machine Learning · Computer Science 2026-03-30 Alberto Rumi , Andrew Jacobsen , Nicolò Cesa-Bianchi , Fabio Vitale

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly…

Machine Learning · Statistics 2013-07-29 Mohammad Gheshlaghi Azar , Alessandro Lazaric , Emma Brunskill

We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a…

Machine Learning · Computer Science 2020-08-14 Paula Gradu , John Hallman , Elad Hazan

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic…

Machine Learning · Computer Science 2023-11-02 Thodoris Lykouris , Max Simchowitz , Aleksandrs Slivkins , Wen Sun
‹ Prev 1 2 3 10 Next ›