Related papers: Semi-Supervised linear regression

Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a…

Methodology · Statistics 2025-09-03 Kai Chen , Yuqian Zhang

Efficient and Adaptive Linear Regression in Semi-Supervised Settings

We consider the linear regression problem under semi-supervised settings wherein the available data typically consists of: (i) a small or moderate sized 'labeled' data, and (ii) a much larger sized 'unlabeled' data. Such data arises…

Methodology · Statistics 2018-07-02 Abhishek Chakrabortty , Tianxi Cai

Efficient semi-supervised inference for logistic regression under case-control studies

Semi-supervised learning has received increasingly attention in statistics and machine learning. In semi-supervised learning settings, a labeled data set with both outcomes and covariates and an unlabeled data set with covariates only are…

Machine Learning · Statistics 2024-02-26 Zhuojun Quan , Yuanyuan Lin , Kani Chen , Wen Yu

Semi-supervised Inference: General Theory and Estimation of Means

We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of…

Methodology · Statistics 2018-08-15 Anru Zhang , Lawrence D. Brown , T. Tony Cai

Semi-supervised learning

Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of not classified data, to perform classification, in situations when, typically, the labelled data are few. Even though this is not…

Statistics Theory · Mathematics 2017-12-18 Alejandro Cholaquidis , Ricardo Fraiman , Mariela Sued

Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

We consider the estimation problem in high-dimensional semi-supervised learning. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation of the regression parameters of linear model in light of…

Methodology · Statistics 2023-03-21 Siyi Deng , Yang Ning , Jiwei Zhao , Heping Zhang

Semi-Supervised learning with Density-Ratio Estimation

In this paper, we study statistical properties of semi-supervised learning, which is considered as an important problem in the community of machine learning. In the standard supervised learning, only the labeled data is observed. The…

Machine Learning · Statistics 2012-04-19 Masanori Kawakita , Takafumi Kanamori

Semi-supervised learning for linear extremile regression

Extremile regression, as a least squares analog of quantile regression, is potentially useful tool for modeling and understanding the extreme tails of a distribution. However, existing extremile regression methods, as nonparametric…

Methodology · Statistics 2025-07-03 Rong Jiang , Keming Yu , Jiangfeng Wang

Semi-supervised linear regression with missing covariates

Missing values in datasets are common in applied statistics. For regression problems, theoretical work thus far has largely considered the issue of missing covariates as distinct from missing responses. However, in practice, many datasets…

Statistics Theory · Mathematics 2026-02-17 Benedict M. Risebrow , Thomas B. Berrett

Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods…

Machine Learning · Computer Science 2020-07-31 Alexander Mey , Marco Loog

Linear regression with unmatched data: a deconvolution perspective

Consider the regression problem where the response $Y\in\mathbb{R}$ and the covariate $X\in\mathbb{R}^d$ for $d\geq 1$ are \textit{unmatched}. Under this scenario, we do not have access to pairs of observations from the distribution of $(X,…

Statistics Theory · Mathematics 2023-09-19 Mona Azadkia , Fadoua Balabdaoui

Semi-supervised Active Regression

Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset…

Machine Learning · Computer Science 2021-06-15 Fnu Devvrit , Nived Rajaraman , Pranjal Awasthi

Semi-supervised learning in unmatched linear regression using an empirical likelihood approach

Knowing the link between observed predictive variables and outcomes is crucial for making inference in any regression model. When this link is missing, partially or completely, classical estimation methods fail in recovering the true…

Statistics Theory · Mathematics 2026-01-28 Fadoua Balabdaoui , Jinyu Chen

On semi-supervised learning

Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of unclassified data, to perform a classification in situations when, typically, there is little labeled data. Even though this is not…

Machine Learning · Statistics 2020-12-11 Alejandro Cholaquidis , Ricardo Fraiman , Mariela Sued

Dependable Exploitation of High-Dimensional Unlabeled Data in an Assumption-Lean Framework

Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data. In this paper, we examine the statistical inference problem in an…

Methodology · Statistics 2026-03-31 Chao Ying , Siyi Deng , Yang Ning , Jiwei Zhao , Heping Zhang

Semi-supervised learning using copula-based regression and model averaging

The available data in semi-supervised learning usually consists of relatively small sized labeled data and much larger sized unlabeled data. How to effectively exploit unlabeled data is the key issue. In this paper, we write the regression…

Methodology · Statistics 2024-11-13 Ziwen Gao , Huihang Liu , Xinyu Zhang

Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction

We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process. Focusing on generalized linear regression, we analyze of the…

Machine Learning · Statistics 2022-03-08 Oren Yuval , Saharon Rosset

Semi-supervised learning and the question of true versus estimated propensity scores

A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved.…

Methodology · Statistics 2020-09-15 Andrew Herren , P. Richard Hahn

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

This paper considers statistical inference for the explained variance $\beta^{\intercal}\Sigma \beta$ under the high-dimensional linear model $Y=X\beta+\epsilon$ in the semi-supervised setting, where $\beta$ is the regression vector and…

Methodology · Statistics 2020-12-01 T. Tony Cai , Zijian Guo

Semi-supervised logistic discrimination via labeled data and unlabeled data from different sampling distributions

This article addresses the problem of classification method based on both labeled and unlabeled data, where we assume that a density function for labeled data is different from that for unlabeled data. We propose a semi-supervised logistic…

Machine Learning · Statistics 2014-02-20 Shuichi Kawano