Related papers: Matching-Based Policy Learning
The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones. In the meta reinforcement learning literature much recent work has…
Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a commonplace lack of overlap in the data for different…
Conventional treatment policies map patient covariates to a single recommended intervention in order to maximize expected clinical outcomes. Although a rich body of causal inference methods has been developed to estimate such policies,…
We introduce a new preference-based framework for conditional treatment effect estimation and policy learning, built on the Conditional Preference-based Treatment Effect (CPTE). CPTE requires only that outcomes be ranked under a preference…
Causal inference methods are widely applied in the fields of medicine, policy, and economics. Central to these applications is the estimation of treatment effects to make decisions. Current methods make binary yes-or-no decisions based on…
We present a new approach to the problems of evaluating and learning personalized decision policies from observational data of past contexts, decisions, and outcomes. Only the outcome of the enacted decision is available and the historical…
Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed…
This study proposes an end-to-end algorithm for policy learning in causal inference. We observe data consisting of covariates, treatment assignments, and outcomes, where only the outcome corresponding to the assigned treatment is observed.…
Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may…
Efficiently allocating treatments with a budget constraint constitutes an important challenge across various domains. In marketing, for example, the use of promotions to target potential customers and boost conversions is limited by the…
Many policies allocate harms or benefits that are uncertain in nature: they produce distributions over the population in which individuals have different probabilities of incurring harm or benefit. Comparing different policies thus involves…
Many social programs attempt to allocate scarce resources to people with the greatest need. Indeed, public services increasingly use algorithmic risk assessments motivated by this goal. However, targeting the highest-need recipients often…
Policy evaluation estimates the performance of a policy by (1) collecting data from the environment and (2) processing raw data into a meaningful estimate. Due to the sequential nature of reinforcement learning, any improper data-collecting…
Machine learning (ML) estimates of conditional average treatment effects (CATE) can guide policy decisions, either by allowing targeting of individuals with beneficial CATE estimates, or as inputs to decision trees that optimise overall…
Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in…
Transfer learning of prediction models has been extensively studied, while the corresponding policy learning approaches are rarely discussed. In this paper, we propose principled approaches for learning the optimal policy in the target…
Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward…
There are many economic contexts where the productivity and welfare performance of institutions and policies depend on who matches with whom. Examples include caseworkers and job seekers in job search assistance programs, medical doctors…
Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education. The growing policy learning literature focuses on settings where…
Policy learning utilizing observational data is pivotal across various domains, with the objective of learning the optimal treatment assignment policy while adhering to specific constraints such as fairness, budget, and simplicity. This…