Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

Tao Liu; Qi Xu; Wei Shi; Zhigang Hua; Shuang Yang

Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

Machine Learning 2025-01-13 v1

Authors: Tao Liu , Qi Xu , Wei Shi , Zhigang Hua , Shuang Yang

Abstract

Session-level dynamic ad load optimization aims to personalize the density and types of delivered advertisements in real time during a user's online session by dynamically balancing user experience quality and ad monetization. Traditional causal learning-based approaches struggle with key technical challenges, especially in handling confounding bias and distribution shifts. In this paper, we develop an offline deep Q-network (DQN)-based framework that effectively mitigates confounding bias in dynamic systems and demonstrates more than 80% offline gains compared to the best causal learning-based production baseline. Moreover, to improve the framework's robustness against unanticipated distribution shifts, we further enhance our framework with a novel offline robust dueling DQN approach. This approach achieves more stable rewards on multiple OpenAI-Gym datasets as perturbations increase, and provides an additional 5% offline gains on real-world ad delivery data. Deployed across multiple production systems, our approach has achieved outsized topline gains. Post-launch online A/B tests have shown double-digit improvements in the engagement-ad score trade-off efficiency, significantly enhancing our platform's capability to serve both consumers and advertisers.

Keywords

reinforcement learning hyperparameter optimization online advertising

Cite

@article{arxiv.2501.05591,
  title  = {Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning},
  author = {Tao Liu and Qi Xu and Wei Shi and Zhigang Hua and Shuang Yang},
  journal= {arXiv preprint arXiv:2501.05591},
  year   = {2025}
}

Comments

Will appear in KDD 2025

Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

Abstract

Keywords

Cite

Comments

Related papers