English

Data Augmentation for Imbalanced Regression

Machine Learning 2023-02-21 v1 Machine Learning Methodology

Abstract

In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied.

Keywords

Cite

@article{arxiv.2302.09288,
  title  = {Data Augmentation for Imbalanced Regression},
  author = {Samuel Stocksieker and Denys Pommeret and Arthur Charpentier},
  journal= {arXiv preprint arXiv:2302.09288},
  year   = {2023}
}

Comments

paper accepted at the AISTATS 2023 conference, to be published in PMLR (Proceedings of Machine Learning Research)