Related papers: Off-policy Learning for Remote Electrical Tilt Opt…

Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. In this paper, we devise algorithms learning optimal tilt control policies from existing data (in the…

Machine Learning · Computer Science 2022-01-07 Filippo Vannella , Alexandre Proutiere , Yassir Jedra , Jaeseong Jeong

Remote Electrical Tilt Optimization via Safe Reinforcement Learning

Remote Electrical Tilt (RET) optimization is an efficient method for adjusting the vertical tilt angle of Base Stations (BSs) antennas in order to optimize Key Performance Indicators (KPIs) of the network. Reinforcement Learning (RL)…

Machine Learning · Computer Science 2021-01-18 Filippo Vannella , Grigorios Iakovidis , Ezeddin Al Hakim , Erik Aumayr , Saman Feghhi

Remote Contextual Bandits

We consider a remote contextual multi-armed bandit (CMAB) problem, in which the decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel. This…

Information Theory · Computer Science 2022-02-11 Francesco Pase , Deniz Gunduz , Michele Zorzi

Multi-Agent Reinforcement Learning with Common Policy for Antenna Tilt Optimization

This paper presents a method for optimizing wireless networks by adjusting cell parameters that affect both the performance of the cell being optimized and the surrounding cells. The method uses multiple reinforcement learning agents that…

Systems and Control · Electrical Eng. & Systems 2023-05-25 Adriano Mendo , Jose Outes-Carnero , Yak Ng-Molina , Juan Ramiro-Moreno

Rate-Constrained Remote Contextual Bandits

We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem. However, the contexts are observed by a remotely connected entity,…

Machine Learning · Computer Science 2022-04-28 Francesco Pase , Deniz Gündüz , Michele Zorzi

Optimal Regret for Policy Optimization in Contextual Bandits

We present the first high-probability optimal regret bound for a policy optimization technique applied to the problem of stochastic contextual multi-armed bandit (CMAB) with general offline function approximation. Our algorithm is both…

Machine Learning · Computer Science 2026-02-17 Orin Levy , Yishay Mansour

Offline Reinforcement Learning for Mobility Robustness Optimization

In this work we revisit the Mobility Robustness Optimisation (MRO) algorithm and study the possibility of learning the optimal Cell Individual Offset tuning using offline Reinforcement Learning. Such methods make use of collected offline…

Networking and Internet Architecture · Computer Science 2025-07-01 Pegah Alizadeh , Anastasios Giovanidis , Pradeepa Ramachandra , Vasileios Koutsoukis , Osama Arouk

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation

Recently, robust reinforcement learning (RL) methods designed to handle adversarial input observations have received significant attention, motivated by RL's inherent vulnerabilities. While existing approaches have demonstrated reasonable…

Machine Learning · Computer Science 2025-06-23 Kosuke Nakanishi , Akihiro Kubo , Yuji Yasui , Shin Ishii

A Safe Reinforcement Learning Architecture for Antenna Tilt Optimisation

Safe interaction with the environment is one of the most challenging aspects of Reinforcement Learning (RL) when applied to real-world problems. This is particularly important when unsafe actions have a high or irreversible negative impact…

Machine Learning · Computer Science 2021-10-22 Erik Aumayr , Saman Feghhi , Filippo Vannella , Ezeddin Al Hakim , Grigorios Iakovidis

Offline Contextual Bandits for Wireless Network Optimization

The explosion in mobile data traffic together with the ever-increasing expectations for higher quality of service call for the development of AI algorithms for wireless network optimization. In this paper, we investigate how to learn…

Artificial Intelligence · Computer Science 2021-11-17 Miguel Suau , Alexandros Agapitos , David Lynch , Derek Farrell , Mingqi Zhou , Aleksandar Milenovic

Offline Learning for Combinatorial Multi-armed Bandits

The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs…

Machine Learning · Computer Science 2025-05-30 Xutong Liu , Xiangxiang Dai , Jinhang Zuo , Siwei Wang , Carlee Joe-Wong , John C. S. Lui , Wei Chen

Restless Bandits with Individual Penalty Constraints: Near-Optimal Indices and Deep Reinforcement Learning

This paper investigates the Restless Multi-Armed Bandit (RMAB) framework under individual penalty constraints to address resource allocation challenges in dynamic wireless networked environments. Unlike conventional RMAB models, our model…

Machine Learning · Computer Science 2026-04-20 Nida Zamir , I-Hong Hou

Consistent On-Line Off-Policy Evaluation

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme. However, most Temporal Difference (TD) based…

Machine Learning · Statistics 2017-02-24 Assaf Hallak , Shie Mannor

On Minimax Optimal Offline Policy Evaluation

This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax…

Artificial Intelligence · Computer Science 2014-09-15 Lihong Li , Remi Munos , Csaba Szepesvari

Deep Reinforcement Learning Policies for Underactuated Satellite Attitude Control

Autonomy is a key challenge for future space exploration endeavours. Deep Reinforcement Learning holds the promises for developing agents able to learn complex behaviours simply by interacting with their environment. This paper investigates…

Robotics · Computer Science 2025-05-02 Matteo El Hariry , Andrea Cini , Giacomo Mellone , Alessandro Balossino

Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem

The (contextual) multi-armed bandit problem (MAB) provides a formalization of sequential decision-making which has many applications. However, validly evaluating MAB policies is challenging; we either resort to simulations which inherently…

Machine Learning · Computer Science 2019-08-22 Jules Kruijswijk , Petri Parvinen , Maurits Kaptein

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline…

Machine Learning · Computer Science 2023-10-13 Guanyu Nie , Yididiya Y Nadew , Yanhui Zhu , Vaneet Aggarwal , Christopher John Quinn

Non-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to…

Machine Learning · Computer Science 2021-04-06 Joey Hong , Branislav Kveton , Manzil Zaheer , Yinlam Chow , Amr Ahmed

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we…

Machine Learning · Computer Science 2023-03-31 Yicheng Luo , Jackie Kay , Edward Grefenstette , Marc Peter Deisenroth