Related papers: On semi-supervised estimation using exponential ti…

Semi-supervised Logistic Learning Based on Exponential Tilt Mixture Models

Consider semi-supervised learning for classification, where both labeled and unlabeled data are available for training. The goal is to exploit both datasets to achieve higher prediction accuracy than just using labeled data alone. We…

Machine Learning · Statistics 2019-06-20 Xinwei Zhang , Zhiqiang Tan

On the Semi-supervised Expectation Maximization

The Expectation Maximization (EM) algorithm is widely used as an iterative modification to maximum likelihood estimation when the data is incomplete. We focus on a semi-supervised case to learn the model from labeled and unlabeled samples.…

Machine Learning · Computer Science 2023-01-26 Erixhen Sula , Lizhong Zheng

Semi-supervised logistic discrimination via labeled data and unlabeled data from different sampling distributions

This article addresses the problem of classification method based on both labeled and unlabeled data, where we assume that a density function for labeled data is different from that for unlabeled data. We propose a semi-supervised logistic…

Machine Learning · Statistics 2014-02-20 Shuichi Kawano

Semi-Supervised learning with Density-Ratio Estimation

In this paper, we study statistical properties of semi-supervised learning, which is considered as an important problem in the community of machine learning. In the standard supervised learning, only the labeled data is observed. The…

Machine Learning · Statistics 2012-04-19 Masanori Kawakita , Takafumi Kanamori

Efficient semi-supervised inference for logistic regression under case-control studies

Semi-supervised learning has received increasingly attention in statistics and machine learning. In semi-supervised learning settings, a labeled data set with both outcomes and covariates and an unlabeled data set with covariates only are…

Machine Learning · Statistics 2024-02-26 Zhuojun Quan , Yuanyuan Lin , Kani Chen , Wen Yu

Semi-supervised Learning with the EM Algorithm: A Comparative Study between Unstructured and Structured Prediction

Semi-supervised learning aims to learn prediction models from both labeled and unlabeled samples. There has been extensive research in this area. Among existing work, generative mixture models with Expectation-Maximization (EM) is a popular…

Machine Learning · Computer Science 2020-08-31 Wenchong He , Zhe Jiang

Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a…

Methodology · Statistics 2025-09-03 Kai Chen , Yuqian Zhang

Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction

We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process. Focusing on generalized linear regression, we analyze of the…

Machine Learning · Statistics 2022-03-08 Oren Yuval , Saharon Rosset

Estimation of Classification Rules from Partially Classified Data

We consider the situation where the observed sample contains some observations whose class of origin is known (that is, they are classified with respect to the g underlying classes of interest), and where the remaining observations in the…

Machine Learning · Statistics 2020-04-15 Geoffrey J. McLachlan , Daniel Ahfock

Semi-supervised learning and the question of true versus estimated propensity scores

A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved.…

Methodology · Statistics 2020-09-15 Andrew Herren , P. Richard Hahn

A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning

We consider statistical inference under a semi-supervised setting where we have access to both a labeled dataset consisting of pairs $\{X_i, Y_i \}_{i=1}^n$ and an unlabeled dataset $\{ X_i \}_{i=n+1}^{n+N}$. We ask the question: under what…

Statistics Theory · Mathematics 2025-03-20 Zichun Xu , Daniela Witten , Ali Shojaie

On missing label patterns in semi-supervised learning

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse…

Methodology · Statistics 2019-04-08 Daniel Ahfock , Geoffrey J. McLachlan

Informative missingness and its implications in semi-supervised learning

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance…

Machine Learning · Statistics 2025-12-29 Jinran Wu , You-Gan Wang , Geoffrey J. McLachlan

Accuracy of Latent-Variable Estimation in Bayesian Semi-Supervised Learning

Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying…

Machine Learning · Statistics 2015-03-26 Keisuke Yamazaki

Transfer Learning under Group-Label Shift: A Semiparametric Exponential Tilting Approach

We propose a new framework for binary classification in transfer learning settings where both covariate and label distributions may shift between source and target domains. Unlike traditional covariate shift or label shift assumptions, we…

Methodology · Statistics 2025-09-29 Manli Cheng , Subha Maity , Qinglong Tian , Pengfei Li

Semiparametric Mixture Regression with Unspecified Error Distributions

In fitting a mixture of linear regression models, normal assumption is traditionally used to model the error and then regression parameters are estimated by the maximum likelihood estimators (MLE). This procedure is not valid if the normal…

Methodology · Statistics 2018-11-06 Yanyuan Ma , Shaoli Wang , Lin Xu , Weixin Yao

Efficient and Adaptive Linear Regression in Semi-Supervised Settings

We consider the linear regression problem under semi-supervised settings wherein the available data typically consists of: (i) a small or moderate sized 'labeled' data, and (ii) a much larger sized 'unlabeled' data. Such data arises…

Methodology · Statistics 2018-07-02 Abhishek Chakrabortty , Tianxi Cai

Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods…

Machine Learning · Computer Science 2020-07-31 Alexander Mey , Marco Loog

Are labels informative in semi-supervised learning? -- Estimating and leveraging the missing-data mechanism

Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of ``informative'' labels, which occur when some classes are more likely to be labeled…

Machine Learning · Statistics 2023-02-16 Aude Sportisse , Hugo Schmutz , Olivier Humbert , Charles Bouveyron , Pierre-Alexandre Mattei

Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting

Semi-supervised learning (SSL) uses unlabeled data for training and has been shown to greatly improve performance when compared to a supervised approach on the labeled data available. This claim depends both on the amount of labeled data…

Machine Learning · Computer Science 2019-10-01 Marc Lelarge , Leo Miolane