Related papers: Reference Distance Estimator

Enhancing Classification with Semi-Supervised Deep Learning Using Distance-Based Sample Weights

Recent advancements in semi-supervised deep learning have introduced effective strategies for leveraging both labeled and unlabeled data to improve classification performance. This work proposes a semi-supervised framework that utilizes a…

Machine Learning · Computer Science 2025-05-21 Aydin Abedinia , Shima Tabakhi , Vahid Seydi

Semi-Supervised learning with Density-Ratio Estimation

In this paper, we study statistical properties of semi-supervised learning, which is considered as an important problem in the community of machine learning. In the standard supervised learning, only the labeled data is observed. The…

Machine Learning · Statistics 2012-04-19 Masanori Kawakita , Takafumi Kanamori

SLADE: A Self-Training Framework For Distance Metric Learning

Most existing distance metric learning approaches use fully labeled data to learn the sample similarities in an embedding space. We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Jiali Duan , Yen-Liang Lin , Son Tran , Larry S. Davis , C. -C. Jay Kuo

Reliable Semi-Supervised Learning when Labels are Missing at Random

Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been…

Machine Learning · Statistics 2019-10-25 Xiuming Liu , Dave Zachariah , Johan Wågberg , Thomas B. Schön

Adaptive Semisupervised Inference

Semisupervised methods inevitably invoke some assumption that links the marginal distribution of the features to the regression function of the label. Most commonly, the cluster or manifold assumptions are used which imply that the…

Statistics Theory · Mathematics 2011-12-02 Martin Azizyan , Aarti Singh , Larry Wasserman

Semisupervised Classifier Evaluation and Recalibration

How many labeled examples are needed to estimate a classifier's performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the…

Machine Learning · Computer Science 2012-10-09 Peter Welinder , Max Welling , Pietro Perona

Meta-Learning for Neural Relation Classification with Distant Supervision

Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification. However, the resulting labeled instances are very noisy, containing data with wrong labels. Many approaches have…

Computation and Language · Computer Science 2020-10-27 Zhenzhen Li , Jian-Yun Nie , Benyou Wang , Pan Du , Yuhan Zhang , Lixin Zou , Dongsheng Li

On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed Elements

In this paper, we investigate the problem of classifying feature vectors with mutually independent but non-identically distributed elements. First, we show the importance of this problem. Next, we propose a classifier and derive an…

Machine Learning · Computer Science 2021-09-01 Farzad Shahrivari , Nikola Zlatanov

Pattern Recognition for Conditionally Independent Data

In this work we consider the task of relaxing the i.i.d assumption in pattern recognition (or classification), aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete…

Machine Learning · Computer Science 2012-02-28 Daniil Ryabko

One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification

Distance-based unsupervised text classification is a method within text classification that leverages the semantic similarity between a label and a text to determine label relevance. This method provides numerous benefits, including fast…

Computation and Language · Computer Science 2025-10-14 Jens Van Nooten , Andriy Kosar , Guy De Pauw , Walter Daelemans

Evaluating multiple models using labeled and unlabeled data

It remains difficult to evaluate machine learning classifiers in the absence of a large, labeled dataset. While labeled data can be prohibitively expensive or impossible to obtain, unlabeled data is plentiful. Here, we introduce…

Machine Learning · Computer Science 2025-10-15 Divya Shanmugam , Shuvom Sadhuka , Manish Raghavan , John Guttag , Bonnie Berger , Emma Pierson

Joint Use of Node Attributes and Proximity for Semi-Supervised Classification on Graphs

The task of node classification is to infer unknown node labels, given the labels for some of the nodes along with the network structure and other node attributes. Typically, approaches for this task assume homophily, whereby neighboring…

Social and Information Networks · Computer Science 2021-09-15 Arpit Merchant , Michael Mathioudakis

Distance-based Positive and Unlabeled Learning for Ranking

Learning to rank -- producing a ranked list of items specific to a query and with respect to a set of supervisory items -- is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes…

Machine Learning · Computer Science 2022-09-29 Hayden S. Helm , Amitabh Basu , Avanti Athreya , Youngser Park , Joshua T. Vogelstein , Carey E. Priebe , Michael Winding , Marta Zlatic , Albert Cardona , Patrick Bourke , Jonathan Larson , Marah Abdin , Piali Choudhury , Weiwei Yang , Christopher W. White

Semi-supervised learning method based on predefined evenly-distributed class centroids

Compared to supervised learning, semi-supervised learning reduces the dependence of deep learning on a large number of labeled samples. In this work, we use a small number of labeled samples and perform data augmentation on unlabeled…

Machine Learning · Computer Science 2020-01-14 Qiuyu Zhu , Tiantian Li

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures…

Computation and Language · Computer Science 2015-08-04 Jian Tang , Meng Qu , Qiaozhu Mei

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Semi-supervised learning and the question of true versus estimated propensity scores

A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved.…

Methodology · Statistics 2020-09-15 Andrew Herren , P. Richard Hahn

Nearest Labelset Using Double Distances for Multi-label Classification

Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this…

Machine Learning · Statistics 2023-10-25 Hyukjun Gweon , Matthias Schonlau , Stefan Steiner

Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods…

Machine Learning · Computer Science 2020-07-31 Alexander Mey , Marco Loog

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning

Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study…

Machine Learning · Computer Science 2020-10-30 Zhongzheng Ren , Raymond A. Yeh , Alexander G. Schwing