Related papers: Learning in Networked Control Systems
We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices $A$ and $B$ may be initially unknown. We devise an online learning algorithm and provide guarantees…
We study the problem of adaptively controlling a known discrete-time nonlinear system subject to unmodeled disturbances. We prove the first finite-time regret bounds for adaptive nonlinear control with matched uncertainty in the stochastic…
This paper investigates the challenges of optimal online policy learning under missing data. State-of-the-art algorithms implicitly assume that rewards are always observable. I show that when rewards are missing at random, the Upper…
We propose an online learning algorithm that adaptively designs a decentralized linear quadratic regulator when the system model is unknown a priori and new data samples from a single system trajectory become progressively available. The…
We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions. The challenges of this problem arise from the tension among safety, exploration,…
By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose…
TWe establish regret lower bounds for adaptively controlling an unknown linear Gaussian system with quadratic costs. We combine ideas from experiment design, estimation theory and a perturbation bound of certain information matrices to…
We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward…
We study continuous-time online learning where data are generated by a diffusion process with unknown coefficients. The learner employs a two-layer neural network, continuously updating its parameters in a non-anticipative manner. The…
Understanding how to efficiently learn while adhering to safety constraints is essential for using online reinforcement learning in practical applications. However, proving rigorous regret bounds for safety-constrained reinforcement…
We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these…
A Network Control System (NCS) consists of control components that interact with the plant over a shared network. The system dynamics of a NCS could be subject to noise arising from randomness in the times at which the data is transmitted…
We address the problem of simultaneously learning and control in an online receding horizon control setting. We consider the control of an unknown linear dynamical system with general cost functions and affine constraints on the control…
This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each…
We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of…
Adaptive experimentation under unknown network interference requires solving two coupled problems: (i) learning the underlying dynamics of interference among units and (ii) using these dynamics to inform treatment allocation in order to…
This paper deals with the design of scheduling logics for networked control systems (NCSs) whose communication networks have limited capacity and are prone to data losses. Our contributions are twofold. First, we present a probabilistic…
We address the problem of learning to control an unknown nonlinear dynamical system through sequential interactions. Motivated by high-stakes applications in which mistakes can be catastrophic, such as robotics and healthcare, we study…
We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain…
This paper investigates the problem of controlling a linear system under possibly unbounded stochastic noise with unknown convex cost functions, known as an online control problem. In contrast to the existing work, which assumes the…