English

Semi-supervised Active Regression

Machine Learning 2021-06-15 v1

Abstract

Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of semisupervisedsemi-supervised activeactive learninglearning through the frame of linear regression. In this setting, the learner has access to a dataset XR(n1+n2)×dX \in \mathbb{R}^{(n_1+n_2) \times d} which is composed of n1n_1 unlabelled examples that an algorithm can actively query, and n2n_2 examples labelled a-priori. Concretely, denoting the true labels by YRn1+n2Y \in \mathbb{R}^{n_1 + n_2}, the learner's objective is to find β^Rd\widehat{\beta} \in \mathbb{R}^d such that, \begin{equation} \| X \widehat{\beta} - Y \|_2^2 \le (1 + \epsilon) \min_{\beta \in \mathbb{R}^d} \| X \beta - Y \|_2^2 \end{equation} while making as few additional label queries as possible. In order to bound the label queries, we introduce an instance dependent parameter called the reduced rank, denoted by RXR_X, and propose an efficient algorithm with query complexity O(RX/ϵ)O(R_X/\epsilon). This result directly implies improved upper bounds for two important special cases: (i) active ridge regression, and (ii) active kernel ridge regression, where the reduced-rank equates to the statistical dimension, sdλsd_\lambda and effective dimension, dλd_\lambda of the problem respectively, where λ0\lambda \ge 0 denotes the regularization parameter. For active ridge regression we also prove a matching lower bound of O(sdλ/ϵ)O(sd_\lambda / \epsilon) on the query complexity of any algorithm. This subsumes prior work that only considered the unregularized case, i.e., λ=0\lambda = 0.

Keywords

Cite

@article{arxiv.2106.06676,
  title  = {Semi-supervised Active Regression},
  author = {Fnu Devvrit and Nived Rajaraman and Pranjal Awasthi},
  journal= {arXiv preprint arXiv:2106.06676},
  year   = {2021}
}