Related papers: IMO$^3$: Interactive Multi-Objective Off-Policy Op…

Pessimistic Off-Policy Multi-Objective Optimization

Multi-objective optimization is a type of decision making problems where multiple conflicting objectives are optimized. We study offline optimization of multi-objective policies from data collected by an existing policy. We propose a…

Machine Learning · Computer Science 2023-10-31 Shima Alizadeh , Aniruddha Bhargava , Karthick Gopalswamy , Lalit Jain , Branislav Kveton , Ge Liu

Offline Multi-Objective Optimization

Offline optimization aims to maximize a black-box objective function with a static dataset and has wide applications. In addition to the objective function being black-box and expensive to evaluate, numerous complex real-world problems…

Machine Learning · Computer Science 2024-06-07 Ke Xue , Rong-Xi Tan , Xiaobin Huang , Chao Qian

Optimizing Interactive Systems via Data-Driven Objectives

Effective optimization is essential for real-world interactive systems to provide a satisfactory user experience in response to changing user behavior. However, it is often challenging to find an objective to optimize for interactive…

Artificial Intelligence · Computer Science 2020-06-24 Ziming Li , Julia Kiseleva , Alekh Agarwal , Maarten de Rijke , Ryen W. White

Detection of Hidden Objectives and Interactive Objective Reduction

In multi-objective optimization problems, there might exist hidden objectives that are important to the decision-maker but are not being optimized. On the other hand, there might also exist irrelevant objectives that are being optimized but…

Optimization and Control · Mathematics 2022-06-06 Seyed Mahdi Shavarani , Manuel López-Ibáñez , Richard Allmendinger

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Logging Policy Design for Off-Policy Evaluation

Off-policy evaluation (OPE) estimates the value of a target treatment policy (e.g., a recommender system) using data collected by a different logging policy. It enables high-stakes experimentation without live deployment, yet in practice…

Machine Learning · Statistics 2026-05-18 Connor Douglas , Joel Persson , Foster Provost

Explicit Multi-objective Model Predictive Control for Nonlinear Systems Under Uncertainty

In real-world problems, uncertainties (e.g., errors in the measurement, precision errors) often lead to poor performance of numerical algorithms when not explicitly taken into account. This is also the case for control problems, where…

Optimization and Control · Mathematics 2020-12-18 Carlos Ignacio Hernández Castellanos , Sina Ober-Blöbaum , Sebastian Peitz

A Distributional View on Multi-Objective Policy Optimization

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over…

Machine Learning · Computer Science 2020-05-18 Abbas Abdolmaleki , Sandy H. Huang , Leonard Hasenclever , Michael Neunert , H. Francis Song , Martina Zambelli , Murilo F. Martins , Nicolas Heess , Raia Hadsell , Martin Riedmiller

Objective Function Designing Led by User Preferences Acquisition

Many real world problems can be defined as optimisation problems in which the aim is to maximise an objective function. The quality of obtained solution is directly linked to the pertinence of the used objective function. However, designing…

Machine Learning · Computer Science 2012-04-24 Patrick Taillandier , Julien Gaffuri

Selected multi-criteria decision-making methods and their applications to product and system design

Optimization has found numerous applications in engineering, particularly since 1960s. Many optimization applications in engineering have more than one objective (or performance criterion). Such applications require multi-objective (or…

Chemical Physics · Physics 2024-07-16 Zhiyuan Wang , Seyed Reza Nabavi , Gade Pandu Rangaiah

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the…

Machine Learning · Computer Science 2023-01-12 Takumi Tanabe , Rei Sato , Kazuto Fukuchi , Jun Sakuma , Youhei Akimoto

$\Delta\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning…

Machine Learning · Computer Science 2024-09-17 Olivier Jeunen , Aleksei Ustimenko

Multi-objective Active Control Policy Design for Commensurate and Incommensurate Fractional Order Chaotic Financial Systems

In this paper, an active control policy design for a fractional order (FO) financial system is attempted, considering multiple conflicting objectives. An active control template as a nonlinear state feedback mechanism is developed and the…

Optimization and Control · Mathematics 2016-11-30 Indranil Pan , Saptarshi Das , Shantanu Das

Benchmarks for Deep Off-Policy Evaluation

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. The ability to learn offline is particularly important in many…

Machine Learning · Computer Science 2021-04-01 Justin Fu , Mohammad Norouzi , Ofir Nachum , George Tucker , Ziyu Wang , Alexander Novikov , Mengjiao Yang , Michael R. Zhang , Yutian Chen , Aviral Kumar , Cosmin Paduraru , Sergey Levine , Tom Le Paine

Non-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to…

Machine Learning · Computer Science 2021-04-06 Joey Hong , Branislav Kveton , Manzil Zaheer , Yinlam Chow , Amr Ahmed

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can…

Machine Learning · Statistics 2023-02-03 Yang Xu , Jin Zhu , Chengchun Shi , Shikai Luo , Rui Song

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a…

Artificial Intelligence · Computer Science 2020-12-23 Nymisha Bandi , Theja Tulabandhula

Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It

There has been a growing interest in off-policy evaluation in the literature such as recommender systems and personalized medicine. We have so far seen significant progress in developing estimators aimed at accurately estimating the…

Machine Learning · Computer Science 2024-04-24 Yuta Saito , Masahiro Nomura

Learning Data-Driven Objectives to Optimize Interactive Systems

Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture…

Artificial Intelligence · Computer Science 2019-12-17 Ziming Li , Julia Kiseleva , Alekh Agarwal , Maarten de Rijke

Offline Multi-Action Policy Learning: Generalization and Optimization

In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as…

Machine Learning · Statistics 2018-11-20 Zhengyuan Zhou , Susan Athey , Stefan Wager