Linear regression with unmatched data: a deconvolution perspective

Mona Azadkia; Fadoua Balabdaoui

Linear regression with unmatched data: a deconvolution perspective

Statistics Theory 2023-09-19 v3 Methodology Statistics Theory

Authors: Mona Azadkia , Fadoua Balabdaoui

Abstract

Consider the regression problem where the response $Y\in\mathbb{R}$ and the covariate $X\in\mathbb{R}^d$ for $d\geq 1$ are \textit{unmatched}. Under this scenario, we do not have access to pairs of observations from the distribution of $(X, Y)$ , but instead, we have separate datasets $\{Y_i\}_{i=1}^n$ and $\{X_j\}_{j=1}^m$ , possibly collected from different sources. We study this problem assuming that the regression function is linear and the noise distribution is known or can be estimated. We introduce an estimator of the regression vector based on deconvolution and demonstrate its consistency and asymptotic normality under an identifiability assumption. In the general case, we show that our estimator (DLSE: Deconvolution Least Squared Estimator) is consistent in terms of an extended $\ell_2$ norm. Using this observation, we devise a method for semi-supervised learning, i.e., when we have access to a small sample of matched pairs $(X_k, Y_k)$ . Several applications with synthetic and real datasets are considered to illustrate the theory.

Keywords

nonparametric regression statistical estimation covariance estimation

Cite

@article{arxiv.2207.06320,
  title  = {Linear regression with unmatched data: a deconvolution perspective},
  author = {Mona Azadkia and Fadoua Balabdaoui},
  journal= {arXiv preprint arXiv:2207.06320},
  year   = {2023}
}

Comments

57 pages, 3 tables, 9 figures

Linear regression with unmatched data: a deconvolution perspective

Abstract

Keywords

Cite

Comments

Related papers