Regularized Data Programming with Automated Bayesian Prior Selection

Jacqueline R. M. A. Maasch; Hao Zhang; Qian Yang; Fei Wang; Volodymyr Kuleshov

Regularized Data Programming with Automated Bayesian Prior Selection

Machine Learning 2023-10-26 v2

Authors: Jacqueline R. M. A. Maasch , Hao Zhang , Qian Yang , Fei Wang , Volodymyr Kuleshov

Abstract

The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation with informative priors. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability, and bolsters performance in low-data regimes.

Keywords

regularization semi-supervised learning machine learning theory

Cite

@article{arxiv.2210.08677,
  title  = {Regularized Data Programming with Automated Bayesian Prior Selection},
  author = {Jacqueline R. M. A. Maasch and Hao Zhang and Qian Yang and Fei Wang and Volodymyr Kuleshov},
  journal= {arXiv preprint arXiv:2210.08677},
  year   = {2023}
}

Regularized Data Programming with Automated Bayesian Prior Selection

Abstract

Keywords

Cite

Related papers