Related papers: Learning MDPs from Features: Predict-Then-Optimize…
Many real-world decision processes are modeled by optimization problems whose defining parameters are unknown and must be inferred from observable data. The Predict-Then-Optimize framework uses machine learning models to predict unknown…
The Predict-Then-Optimize framework uses machine learning models to predict unknown parameters of an optimization problem from exogenous features before solving. This setting is common to many real-world decision processes, and recently it…
The predict-then-optimize framework arises in a wide variety of applications where the unknown cost coefficients of an optimization problem are first predicted based on contextual features and then used to solve the problem. In this work,…
We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handle uncertainty, can be solved using dynamic…
We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing…
Contextual optimization, also known as predict-then-optimize or prescriptive analytics, considers an optimization problem with the presence of covariates (context or side information). The goal is to learn a prediction model (from the…
With the decreasing cost of data collection, the space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially. Therefore, identifying the most characterizing features…
We present an integrated prediction-optimization (PredOpt) framework to efficiently solve sequential decision-making problems by predicting the values of binary decision variables in an optimal solution. We address the key issues of…
In the last years decision-focused learning framework, also known as predict-and-optimize, have received increasing attention. In this setting, the predictions of a machine learning model are used as estimated cost coefficients in the…
We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become…
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small…
Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision…
Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is…
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which…
Bayesian approaches developed to solve the optimal design of sequential experiments are mathematically elegant but computationally challenging. Recently, techniques using amortization have been proposed to make these Bayesian approaches…
Optimal designs are usually model-dependent and likely to be sub-optimal if the postulated model is not correctly specified. In practice, it is common that a researcher has a list of candidate models at hand and a design has to be found…
In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…
Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and…
We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the…
Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task that greatly depends on the data available in the learning phase. Sometimes the dynamics of the model is invariant with respect to some…