Related papers: Parameterized Exploration

Exploration via linearly perturbed loss minimisation

We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood…

Machine Learning · Computer Science 2024-03-07 David Janz , Shuai Liu , Alex Ayoub , Csaba Szepesvári

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a stochastic multi-armed bandit. The algorithm adds $O(t)$ i.i.d. pseudo-rewards to its history in round $t$ and then pulls the arm with the highest average reward in its…

Machine Learning · Computer Science 2019-11-06 Branislav Kveton , Csaba Szepesvari , Mohammad Ghavamzadeh , Craig Boutilier

Non-Asymptotic Pure Exploration by Solving Games

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed…

Machine Learning · Statistics 2019-06-26 Rémy Degenne , Wouter M. Koolen , Pierre Ménard

Sequential Design of Experiments via Linear Programming

The celebrated multi-armed bandit problem in decision theory models the basic trade-off between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In this paper we study the variant of the…

Data Structures and Algorithms · Computer Science 2013-06-19 Sudipto Guha , Kamesh Munagala

Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative…

Machine Learning · Computer Science 2025-12-05 Andreas Schlaginhaufen , Reda Ouhamma , Maryam Kamgarpour

Parameterized Complexity Analysis of Randomized Search Heuristics

This chapter compiles a number of results that apply the theory of parameterized algorithmics to the running-time analysis of randomized search heuristics such as evolutionary algorithms. The parameterized approach articulates the running…

Neural and Evolutionary Computing · Computer Science 2020-01-16 Frank Neumann , Andrew M. Sutton

Hyper-parameter Tuning for the Contextual Bandit

We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting. In the traditional algorithms that solve the contextual bandit problem, the exploration is a…

Machine Learning · Computer Science 2020-05-06 Djallel Bouneffouf , Emmanuelle Claeys

Navigating to the Best Policy in Markov Decision Processes

We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We…

Machine Learning · Statistics 2021-10-26 Aymen Al Marjani , Aurélien Garivier , Alexandre Proutiere

Optimization of Epsilon-Greedy Exploration

Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML)…

Machine Learning · Computer Science 2025-06-05 Ethan Che , Hakan Ceylan , James McInerney , Nathan Kallus

An Extended Treatment of Uncertainty Constrained robotic Exploration: An Integrated Exploration Planner

Efficient robotic exploration of unknown, sensor limited, global-information-deficient environments poses unique challenges to path planning algorithms. In these difficult environments, no deterministic guarantees on path completion and…

Robotics · Computer Science 2017-05-01 Alexander Ivanov , Mark Campbell

Rebounding Bandits for Modeling Satiation Effects

Psychological research shows that enjoyment of many goods is subject to satiation, with short-term satisfaction declining after repeated exposures to the same item. Nevertheless, proposed algorithms for powering recommender systems seldom…

Machine Learning · Computer Science 2021-10-28 Liu Leqi , Fatma Kilinc-Karzan , Zachary C. Lipton , Alan L. Montgomery

Meta-Learning for Contextual Bandit Exploration

We describe MELEE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting. Here, an algorithm must take actions based on contexts, and learn based only on a reward signal from the…

Machine Learning · Computer Science 2019-01-25 Amr Sharaf , Hal Daumé

Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy

Multi-armed bandits (MAB) are commonly used in sequential online decision-making when the reward of each decision is an unknown random variable. In practice however, the typical goal of maximizing total reward may be less important than…

Machine Learning · Computer Science 2025-12-22 Ishank Juneja , Carlee Joe-Wong , Osman Yağan

Systematic Analysis for Pretrained Language Model Priming for Parameter-Efficient Fine-tuning

Parameter-efficient (PE) methods (like Prompts or Adapters) for adapting pre-trained language models (PLM) to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential.…

Computation and Language · Computer Science 2024-05-31 Shih-Cheng Huang , Shih-Heng Wang , Min-Han Shih , Saurav Sahay , Hung-yi Lee

Multiplier Bootstrap-based Exploration

Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty. In this paper, we propose Multiplier Bootstrap-based…

Machine Learning · Computer Science 2023-02-06 Runzhe Wan , Haoyu Wei , Branislav Kveton , Rui Song

Bounded Optimal Exploration in MDP

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However,…

Artificial Intelligence · Computer Science 2016-04-06 Kenji Kawaguchi

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized…

Machine Learning · Computer Science 2020-03-23 Tian Tan , Zhihan Xiong , Vikranth R. Dwaracherla

SEA-PARAM: Exploring Schedulers in Parametric MDPs

We study parametric Markov decision processes (PMDPs) and their reachability probabilities "independent" of the parameters. Different to existing work on parameter synthesis (implemented in the tools PARAM and PRISM), our main focus is on…

Logic in Computer Science · Computer Science 2017-07-14 Sebastian Arming , Ezio Bartocci , Ana Sokolova

Probabilistic Exploration in Planning while Learning

Sequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued…

Artificial Intelligence · Computer Science 2013-02-21 Grigoris I. Karakoulas

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models. Our algorithm guarantees safety by leveraging Lipschitz-continuity to ensure that no unsafe states are visited during…

Robotics · Computer Science 2020-06-05 Erdem Bıyık , Jonathan Margoliash , Shahrouz Ryan Alimo , Dorsa Sadigh