Related papers: Unsupervised Learning for Lexicon-Based Classifica…

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can…

Machine Learning · Computer Science 2019-07-01 Jessa Bekker , Pieter Robberechts , Jesse Davis

Text Classification Using Label Names Only: A Language Model Self-Training Approach

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled…

Computation and Language · Computer Science 2020-10-15 Yu Meng , Yunyi Zhang , Jiaxin Huang , Chenyan Xiong , Heng Ji , Chao Zhang , Jiawei Han

Learning from Positive and Unlabeled Data under the Selected At Random Assumption

For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about…

Machine Learning · Computer Science 2018-08-28 Jessa Bekker , Jesse Davis

Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers

Zero-shot text classifiers based on label descriptions embed an input text and a set of labels into the same space: measures such as cosine similarity can then be used to select the most similar label description to the input text as the…

Computation and Language · Computer Science 2022-05-25 Angelo Basile , Marc Franco-Salvador , Paolo Rosso

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

Self-training provides an effective means of using an extremely small amount of labeled data to create pseudo-labels for unlabeled data. Many state-of-the-art self-training approaches hinge on different regularization methods to prevent…

Computation and Language · Computer Science 2022-02-08 Hazel Kim , Jaeman Son , Yo-Sub Han

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes

In scenarios where training data is limited due to observation costs or data scarcity, enriching the label information associated with each instance becomes crucial for building high-accuracy classification models. In such contexts, it is…

Machine Learning · Computer Science 2025-07-25 Kosuke Sugiyama , Masato Uchida

A Comparison of Techniques for Sentiment Classification of Film Reviews

We undertake the task of comparing lexicon-based sentiment classification of film reviews with machine learning approaches. We look at existing methodologies and attempt to emulate and improve on them using a 'given' lexicon and a…

Computation and Language · Computer Science 2019-05-14 Milan Gritta

Semi-supervised Classification for Natural Language Processing

Semi-supervised classification is an interesting idea where classification models are learned from both labeled and unlabeled data. It has several advantages over supervised classification in natural language processing domain. For…

Computation and Language · Computer Science 2014-09-29 Rushdi Shams

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding…

Machine Learning · Computer Science 2019-02-26 Sanjeev Arora , Hrishikesh Khandeparkar , Mikhail Khodak , Orestis Plevrakis , Nikunj Saunshi

Semantic-enriched Visual Vocabulary Construction in a Weakly Supervised Context

One of the prevalent learning tasks involving images is content-based image classification. This is a difficult task especially because the low-level features used to digitally describe images usually capture little information about the…

Computer Vision and Pattern Recognition · Computer Science 2015-12-16 Marian-Andrei Rizoiu , Julien Velcin , Stéphane Lallich

Embedding Semantic Relations into Word Representations

Learning representations for semantic relations is important for various tasks such as analogy detection, relational search, and relation classification. Although there have been several proposals for learning representations for individual…

Computation and Language · Computer Science 2015-05-04 Danushka Bollegala , Takanori Maehara , Ken-ichi Kawarabayashi

A probabilistic methodology for multilabel classification

Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen…

Artificial Intelligence · Computer Science 2013-03-01 Alfonso E. Romero , Luis M. de Campos

Probabilistic Decoupling of Labels in Classification

In this paper we develop a principled, probabilistic, unified approach to non-standard classification tasks, such as semi-supervised, positive-unlabelled, multi-positive-unlabelled and noisy-label learning. We train a classifier on the…

Machine Learning · Computer Science 2020-06-17 Jeppe Nørregaard , Lars Kai Hansen

Lex2Sent: A bagging approach to unsupervised sentiment analysis

Unsupervised text classification, with its most common form being sentiment analysis, used to be performed by counting words in a text that were stored in a lexicon, which assigns each word to one class or as a neutral word. In recent…

Computation and Language · Computer Science 2025-06-26 Kai-Robin Lange , Jonas Rieger , Carsten Jentsch

Unsupervised Label Refinement Improves Dataless Text Classification

Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label…

Computation and Language · Computer Science 2020-12-09 Zewei Chu , Karl Stratos , Kevin Gimpel

Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents

Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications. We formulate recognizing semantic differences (RSD) as a token-level regression task and study three…

Computation and Language · Computer Science 2023-10-23 Jannis Vamvas , Rico Sennrich

Supervised Graph Contrastive Pretraining for Text Classification

Contrastive pretraining techniques for text classification has been largely studied in an unsupervised setting. However, oftentimes labeled data from related tasks which share label semantics with current task is available. We hypothesize…

Computation and Language · Computer Science 2021-12-22 Samujjwal Ghosh , Subhadeep Maji , Maunendra Sankar Desarkar

Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical

Complementary-label learning is a weakly supervised learning problem in which each training example is associated with one or multiple complementary labels indicating the classes to which it does not belong. Existing consistent approaches…

Machine Learning · Computer Science 2024-10-14 Wei Wang , Takashi Ishida , Yu-Jie Zhang , Gang Niu , Masashi Sugiyama

Learning to Impute: A General Framework for Semi-supervised Learning

Recent semi-supervised learning methods have shown to achieve comparable results to their supervised counterparts while using only a small portion of labels in image classification tasks thanks to their regularization strategies. In this…

Machine Learning · Computer Science 2020-09-25 Wei-Hong Li , Chuan-Sheng Foo , Hakan Bilen