Related papers: Unsupervised Label Refinement Improves Dataless Te…

Improve Text Classification Accuracy with Intent Information

Text classification, a core component of task-oriented dialogue systems, attracts continuous research from both the research and industry community, and has resulted in tremendous progress. However, existing method does not consider the use…

Computation and Language · Computer Science 2022-12-16 Yifeng Xie

Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers

Zero-shot text classifiers based on label descriptions embed an input text and a set of labels into the same space: measures such as cosine similarity can then be used to select the most similar label description to the input text as the…

Computation and Language · Computer Science 2022-05-25 Angelo Basile , Marc Franco-Salvador , Paolo Rosso

In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects,…

Machine Learning · Computer Science 2024-12-24 Ismail Hakki Karaman , Gulser Koksal , Levent Eriskin , Salih Salihoglu

Universal Cross-Lingual Text Classification

Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge.…

Computation and Language · Computer Science 2024-06-18 Riya Savant , Anushka Shelke , Sakshi Todmal , Sanskruti Kanphade , Ananya Joshi , Raviraj Joshi

Clustering Unclustered Data: Unsupervised Binary Labeling of Two Datasets Having Different Class Balances

We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying…

Machine Learning · Computer Science 2013-05-02 Marthinus Christoffel du Plessis , Masashi Sugiyama

The Benefits of Label-Description Training for Zero-Shot Text Classification

Pretrained language models have improved zero-shot text classification by allowing the transfer of semantic knowledge from the training data in order to classify among specific label sets in downstream tasks. We propose a simple way to…

Computation and Language · Computer Science 2023-10-24 Lingyu Gao , Debanjan Ghosh , Kevin Gimpel

Text Classification Using Label Names Only: A Language Model Self-Training Approach

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled…

Computation and Language · Computer Science 2020-10-15 Yu Meng , Yunyi Zhang , Jiaxin Huang , Chenyan Xiong , Heng Ji , Chao Zhang , Jiawei Han

Labeled Data Selection for Category Discovery

Category discovery methods aim to find novel categories in unlabeled visual data. At training time, a set of labeled and unlabeled images are provided, where the labels correspond to the categories present in the images. The labeled data…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Bingchen Zhao , Nico Lang , Serge Belongie , Oisin Mac Aodha

Unsupervised Identification of Translationese

Translated texts are distinctively different from original ones, to the extent that supervised text classification methods can distinguish between them with high accuracy. These differences were proven useful for statistical machine…

Computation and Language · Computer Science 2016-09-13 Ella Rabinovich , Shuly Wintner

Label Agnostic Pre-training for Zero-shot Text Classification

Conventional approaches to text classification typically assume the existence of a fixed set of predefined labels to which a given text can be classified. However, in real-world applications, there exists an infinite label space for…

Computation and Language · Computer Science 2023-05-29 Christopher Clarke , Yuzhao Heng , Yiping Kang , Krisztian Flautner , Lingjia Tang , Jason Mars

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Text segmentation on multilabel documents: A distant-supervised approach

Segmenting text into semantically coherent segments is an important task with applications in information retrieval and text summarization. Developing accurate topical segmentation requires the availability of training data with ground…

Computation and Language · Computer Science 2019-04-16 Saurav Manchanda , George Karypis

One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification

Distance-based unsupervised text classification is a method within text classification that leverages the semantic similarity between a label and a text to determine label relevance. This method provides numerous benefits, including fast…

Computation and Language · Computer Science 2025-10-14 Jens Van Nooten , Andriy Kosar , Guy De Pauw , Walter Daelemans

Description Based Text Classification with Reinforcement Learning

The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model…

Computation and Language · Computer Science 2020-06-05 Duo Chai , Wei Wu , Qinghong Han , Fei Wu , Jiwei Li

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these…

Computation and Language · Computer Science 2022-12-16 Tingyu Xia , Yue Wang , Yuan Tian , Yi Chang

Semi-supervised Text Categorization Using Recursive K-means Clustering

In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy.…

Machine Learning · Computer Science 2017-06-27 Harsha S. Gowda , Mahamad Suhil , D. S. Guru , Lavanya Narayana Raju

DocSCAN: Unsupervised Text Classification via Learning from Neighbors

We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN). For each document, we obtain semantically informative vectors from a large pre-trained language…

Computation and Language · Computer Science 2022-10-05 Dominik Stammbach , Elliott Ash

Iterative Data Programming for Expanding Text Classification Corpora

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta

Iterative Label Improvement: Robust Training by Confidence Based Filtering and Dataset Partitioning

State-of-the-art, high capacity deep neural networks not only require large amounts of labelled training data, they are also highly susceptible to label errors in this data, typically resulting in large efforts and costs and therefore…

Machine Learning · Computer Science 2020-07-20 Christian Haase-Schütz , Rainer Stal , Heinz Hertlein , Bernhard Sick

Text Categorization via Similarity Search: An Efficient and Effective Novel Algorithm

We present a supervised learning algorithm for text categorization which has brought the team of authors the 2nd place in the text categorization division of the 2012 Cybersecurity Data Mining Competition (CDMC'2012) and a 3rd prize…

Information Retrieval · Computer Science 2013-07-11 Hubert Haoyang Duan , Vladimir Pestov , Varun Singla