Related papers: From Caesar Cipher to Unsupervised Learning: A New…
We consider learning a sequence classifier without labeled data by using sequential output statistics. The problem is highly valuable since obtaining labels in training data is often costly, while the sequential output statistics (e.g.,…
Classification is an important task in many fields including biomedical research and machine learning. Traditionally, a classification rule is constructed based a bunch of labeled data. Recently, due to technological innovation and…
Sequential sensor data is generated in a wide variety of practical applications. A fundamental challenge involves learning effective classifiers for such sequential data. While deep learning has led to impressive performance gains in recent…
We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping…
Continual learning aims to learn new tasks incrementally using less computation and memory resources instead of retraining the model from scratch whenever new task arrives. However, existing approaches are designed in supervised fashion…
Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain…
A central goal of unsupervised learning is to acquire representations from unlabeled data or experience that can be used for more effective learning of downstream tasks from modest amounts of labeled data. Many prior unsupervised learning…
Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been…
A key challenge for machine intelligence is to learn new visual concepts without forgetting the previously acquired knowledge. Continual learning is aimed towards addressing this challenge. However, there is a gap between existing…
In recent years, deep discriminative models have achieved extraordinary performance on supervised learning tasks, significantly outperforming their generative counterparts. However, their success relies on the presence of a large amount of…
Unsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific…
Unsupervised speech recognition is a task of training a speech recognition model with unpaired data. To determine when and how unsupervised speech recognition can succeed, and how classification error relates to candidate training…
We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying…
Semisupervised learning is a learning standard which deals with the study of how computers and natural systems such as human beings acquire knowledge in the presence of both labeled and unlabeled data. Semisupervised learning based methods…
Unsupervised learning is the most challenging problem in machine learning and especially in deep learning. Among many scenarios, we study an unsupervised learning problem of high economic value --- learning to predict without costly pairing…
Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a…
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. While research is already underway to formalize a machine-learning concept of fairness and to design frameworks for…
In this paper we revisit the idea of pseudo-labeling in the context of semi-supervised learning where a learning algorithm has access to a small set of labeled samples and a large set of unlabeled samples. Pseudo-labeling works by applying…
The paradigm of data programming, which uses weak supervision in the form of rules/labelling functions, and semi-supervised learning, which augments small amounts of labelled data with a large unlabelled dataset, have shown great promise in…
The problem of fully supervised classification is that it requires a tremendous amount of annotated data, however, in many datasets a large portion of data is unlabeled. To alleviate this problem semi-supervised learning (SSL) leverages the…