Related papers: The Use of Unlabeled Data in Predictive Modeling

Theoretical Foundations of Representation Learning using Unlabeled Data: Statistics and Optimization

Representation learning from unlabeled data has been extensively studied in statistics, data science and signal processing with a rich literature on techniques for dimension reduction, compression, multi-dimensional scaling among others.…

Machine Learning · Computer Science 2025-10-03 Pascal Esser , Maximilian Fleissner , Debarghya Ghoshdastidar

Dependable Exploitation of High-Dimensional Unlabeled Data in an Assumption-Lean Framework

Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data. In this paper, we examine the statistical inference problem in an…

Methodology · Statistics 2026-03-31 Chao Ying , Siyi Deng , Yang Ning , Jiwei Zhao , Heping Zhang

Exploring The Contribution of Unlabeled Data in Financial Sentiment Analysis

With the proliferation of its applications in various industries, sentiment analysis by using publicly available web data has become an active research area in text classification during these years. It is argued by researchers that…

Computation and Language · Computer Science 2013-08-06 Jimmy SJ. Ren , Wei Wang , Jiawei Wang , Stephen Shaoyi Liao

Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a…

Methodology · Statistics 2025-09-03 Kai Chen , Yuqian Zhang

Semi-Supervised learning with Density-Ratio Estimation

In this paper, we study statistical properties of semi-supervised learning, which is considered as an important problem in the community of machine learning. In the standard supervised learning, only the labeled data is observed. The…

Machine Learning · Statistics 2012-04-19 Masanori Kawakita , Takafumi Kanamori

Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods…

Machine Learning · Computer Science 2020-07-31 Alexander Mey , Marco Loog

Semi-supervised learning and the question of true versus estimated propensity scores

A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved.…

Methodology · Statistics 2020-09-15 Andrew Herren , P. Richard Hahn

Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques…

Machine Learning · Computer Science 2011-09-12 N. V. Chawla , Grigoris Karakoulas

Semi-Supervised Learning: the Case When Unlabeled Data is Equally Useful

Semi-supervised learning algorithms attempt to take advantage of relatively inexpensive unlabeled data to improve learning performance. In this work, we consider statistical models where the data distributions can be characterized by…

Machine Learning · Computer Science 2023-07-18 Jingge Zhu

Towards Utilizing Unlabeled Data in Federated Learning: A Survey and Prospective

Federated Learning (FL) proposed in recent years has received significant attention from researchers in that it can bring separate data sources together and build machine learning models in a collaborative but private manner. Yet, in most…

Machine Learning · Computer Science 2020-05-12 Yilun Jin , Xiguang Wei , Yang Liu , Qiang Yang

Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination

A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. While research is already underway to formalize a machine-learning concept of fairness and to design frameworks for…

Machine Learning · Computer Science 2020-09-28 Tao Zhang , Tianqing Zhu , Jing Li , Mengde Han , Wanlei Zhou , Philip S. Yu

Mixed Semi-Supervised Generalized-Linear-Regression with Applications to Deep-Learning and Interpolators

We present a methodology for using unlabeled data to design semi-supervised learning (SSL) methods that improve the predictive performance of supervised learning for regression tasks. The main idea is to design different mechanisms for…

Methodology · Statistics 2025-11-18 Oren Yuval , Saharon Rosset

When can unlabeled data improve the learning rate?

In semi-supervised classification, one is given access both to labeled and unlabeled data. As unlabeled data is typically cheaper to acquire than labeled data, this setup becomes advantageous as soon as one can exploit the unlabeled data in…

Machine Learning · Computer Science 2022-02-10 Christina Göpfert , Shai Ben-David , Olivier Bousquet , Sylvain Gelly , Ilya Tolstikhin , Ruth Urner

How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the…

Machine Learning · Computer Science 2022-02-15 Shuai Zhang , Meng Wang , Sijia Liu , Pin-Yu Chen , Jinjun Xiong

Impact of Strategic Sampling and Supervision Policies on Semi-supervised Learning

In semi-supervised representation learning frameworks, when the number of labelled data is very scarce, the quality and representativeness of these samples become increasingly important. Existing literature on semi-supervised learning…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Shuvendu Roy , Ali Etemad

Semi-Supervised Sparse Gaussian Classification: Provable Benefits of Unlabeled Data

The premise of semi-supervised learning (SSL) is that combining labeled and unlabeled data yields significantly more accurate models. Despite empirical successes, the theoretical understanding of SSL is still far from complete. In this…

Machine Learning · Statistics 2024-09-06 Eyar Azar , Boaz Nadler

Are labels informative in semi-supervised learning? -- Estimating and leveraging the missing-data mechanism

Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of ``informative'' labels, which occur when some classes are more likely to be labeled…

Machine Learning · Statistics 2023-02-16 Aude Sportisse , Hugo Schmutz , Olivier Humbert , Charles Bouveyron , Pierre-Alexandre Mattei

Reliable Semi-Supervised Learning when Labels are Missing at Random

Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been…

Machine Learning · Statistics 2019-10-25 Xiuming Liu , Dave Zachariah , Johan Wågberg , Thomas B. Schön

Semi-supervised Deep Learning for Image Classification with Distribution Mismatch: A Survey

Deep learning methodologies have been employed in several different fields, with an outstanding success in image recognition applications, such as material quality control, medical imaging, autonomous driving, etc. Deep learning models rely…

Computer Vision and Pattern Recognition · Computer Science 2022-03-11 Saul Calderon-Ramirez , Shengxiang Yang , David Elizondo

Uncoupled Regression from Pairwise Comparison Data

Uncoupled regression is the problem to learn a model from unlabeled data and the set of target values while the correspondence between them is unknown. Such a situation arises in predicting anonymized targets that involve sensitive…

Machine Learning · Computer Science 2019-06-04 Liyuan Xu , Junya Honda , Gang Niu , Masashi Sugiyama