Related papers: Offline A/B testing for Recommender Systems

Debiased Off-Policy Evaluation for Recommendation Systems

Efficient methods to evaluate new algorithms are critical for improving interactive bandit and reinforcement learning systems such as recommendation systems. A/B tests are reliable, but are time- and money-consuming, and entail a risk of…

Machine Learning · Computer Science 2021-08-04 Yusuke Narita , Shota Yasui , Kohei Yata

Counterfactually Evaluating Explanations in Recommender Systems

Modern recommender systems face an increasing need to explain their recommendations. Despite considerable progress in this area, evaluating the quality of explanations remains a significant challenge for researchers and practitioners. Prior…

Artificial Intelligence · Computer Science 2022-11-18 Yuanshun Yao , Chong Wang , Hang Li

Off-Policy Evaluation and Counterfactual Methods in Dynamic Auction Environments

Counterfactual estimators are critical for learning and refining policies using logged data, a process known as Off-Policy Evaluation (OPE). OPE allows researchers to assess new policies without costly experiments, speeding up the…

Artificial Intelligence · Computer Science 2025-01-10 Ritam Guha , Nilavra Pathak

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

The evaluation of recommendation systems is a complex task. The offline and online evaluation metrics for recommender systems are ambiguous in their true objectives. The majority of recently published papers benchmark their methods using…

Information Retrieval · Computer Science 2023-08-15 Petr Kasalický , Rodrigo Alves , Pavel Kordík

Estimating Error and Bias in Offline Evaluation Results

Offline evaluations of recommender systems attempt to estimate users' satisfaction with recommendations using static data from prior user interactions. These evaluations provide researchers and developers with first approximations of the…

Information Retrieval · Computer Science 2020-01-28 Mucun Tian , Michael D. Ekstrand

Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics…

Information Retrieval · Computer Science 2022-09-20 Imad Aouali , Amine Benhalloum , Martin Bompaire , Benjamin Heymann , Olivier Jeunen , David Rohde , Otmane Sakhi , Flavian Vasile

Counterfactual Inference under Thompson Sampling

Recommender systems exemplify sequential decision-making under uncertainty, strategically deciding what content to serve to users, to optimise a range of potential objectives. To balance the explore-exploit trade-off successfully, Thompson…

Information Retrieval · Computer Science 2025-07-09 Olivier Jeunen

Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking

Evaluation plays a crucial role in the development of ranking algorithms on search and recommender systems. It enables online platforms to create user-friendly features that drive commercial success in a steady and effective manner. The…

Information Retrieval · Computer Science 2025-08-04 Qing Zhang , Alex Deng , Michelle Du , Huiji Gao , Liwei He , Sanjeev Katariya

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in…

Machine Learning · Computer Science 2022-07-05 Yao Liu , Yannis Flet-Berliac , Emma Brunskill

Effective Evaluation using Logged Bandit Feedback from Multiple Loggers

Accurately evaluating new policies (e.g. ad-placement models, ranking functions, recommendation functions) is one of the key prerequisites for improving interactive systems. While the conventional approach to evaluation relies on online A/B…

Machine Learning · Computer Science 2017-06-27 Aman Agarwal , Soumya Basu , Tobias Schnabel , Thorsten Joachims

Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation

In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative evaluation using observational data can help practitioners understand the generalization performance of new policies. However, this type of…

Machine Learning · Computer Science 2023-10-27 Shengpu Tang , Jenna Wiens

Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model

A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production. Unfortunately, widely used off-policy evaluation methods either make strong assumptions…

Machine Learning · Computer Science 2022-10-19 Alexander Buchholz , Ben London , Giuseppe di Benedetto , Thorsten Joachims

Widespread Flaws in Offline Evaluation of Recommender Systems

Even though offline evaluation is just an imperfect proxy of online performance -- due to the interactive nature of recommenders -- it will probably remain the primary way of evaluation in recommender systems research for the foreseeable…

Information Retrieval · Computer Science 2023-07-28 Balázs Hidasi , Ádám Tibor Czapp

A comparative study of counterfactual estimators

We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Importance Sampling), detailing the different regimes where they are individually suboptimal. We then…

Machine Learning · Statistics 2019-01-30 Thomas Nedelec , Nicolas Le Roux , Vianney Perchet

CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning

The ability to perform offline A/B-testing and off-policy learning using logged contextual bandit feedback is highly desirable in a broad range of applications, including recommender systems, search engines, ad placement, and personalized…

Machine Learning · Computer Science 2019-08-30 Yi Su , Lequn Wang , Michele Santacatterina , Thorsten Joachims

On the Reliability of Sampling Strategies in Offline Recommender Evaluation

Offline evaluation plays a central role in benchmarking recommender systems when online testing is impractical or risky. However, it is susceptible to two key sources of bias: exposure bias, where users only interact with items they are…

Information Retrieval · Computer Science 2025-08-12 Bruno L. Pereira , Alan Said , Rodrygo L. T. Santos

Practical Improvements of A/B Testing with Off-Policy Estimation

We address the problem of A/B testing, a widely used protocol for evaluating the potential improvement achieved by a new decision system compared to a baseline. This protocol segments the population into two subgroups, each exposed to a…

Machine Learning · Statistics 2025-06-16 Otmane Sakhi , Alexandre Gilotte , David Rohde

Study of a bias in the offline evaluation of a recommendation algorithm

Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. It thus influences the way users interact with the system and, as a consequence, bias the…

Information Retrieval · Computer Science 2015-11-05 Arnaud De Myttenaere , Boris Golden , Bénédicte Le Grand , Fabrice Rossi

Online Evaluation Methods for the Causal Effect of Recommendations

Evaluating the causal effect of recommendations is an important objective because the causal effect on user interactions can directly leads to an increase in sales and user engagement. To select an optimal recommendation model, it is common…

Machine Learning · Computer Science 2021-07-16 Masahiro Sato

Online and Offline Evaluations of Collaborative Filtering and Content Based Recommender Systems

Recommender systems are widely used AI applications designed to help users efficiently discover relevant items. The effectiveness of such systems is tied to the satisfaction of both users and providers. However, user satisfaction is complex…

Information Retrieval · Computer Science 2024-11-05 Ali Elahi , Armin Zirak