English

Online Active Regression

Machine Learning 2022-08-31 v2 Data Structures and Algorithms Machine Learning

Abstract

Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under p\ell_p loss where p[1,2]p\in[1,2]. To achieve a (1+ϵ)(1+\epsilon)-approximate solution, our proposed algorithms only require O~(ϵ1dlog(nκ))\tilde{\mathcal{O}}(\epsilon^{-1} d \log(n\kappa)) queries of labels, where nn is the number of data points and κ\kappa is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.

Keywords

Cite

@article{arxiv.2207.05945,
  title  = {Online Active Regression},
  author = {Cheng Chen and Yi Li and Yiming Sun},
  journal= {arXiv preprint arXiv:2207.05945},
  year   = {2022}
}

Comments

A preliminary version appeared in the Proceedings of the 39th International Conference on Machine Learning (ICML 2022), PMLR 162, pp 3320--3335, 2022. v2: optimal dependence on $\epsilon$ in query complexity