Related papers: LIME: Weakly-Supervised Text Classification Withou…

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts. Most existing methods first use the label names as static…

Computation and Language · Computer Science 2023-10-23 Yunyi Zhang , Minhao Jiang , Yu Meng , Yu Zhang , Jiawei Han

Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation

Weakly-supervised text classification aims to induce text classifiers from only a few user-provided seed words. The vast majority of previous work assumes high-quality seed words are given. However, the expert-annotated seed words are…

Computation and Language · Computer Science 2021-04-21 Yiping Jin , Akshay Bhatia , Dittaya Wanvarie

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Recent advances in weakly supervised text classification mostly focus on designing sophisticated methods to turn high-level human heuristics into quality pseudo-labels. In this paper, we revisit the seed matching-based method, which is…

Computation and Language · Computer Science 2023-10-24 Chengyu Dong , Zihan Wang , Jingbo Shang

Weakly-Supervised Neural Text Classification

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification…

Information Retrieval · Computer Science 2018-09-13 Yu Meng , Jiaming Shen , Chao Zhang , Jiawei Han

Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals

Dataless text classification, i.e., a new paradigm of weakly supervised learning, refers to the task of learning with unlabeled documents and a few predefined representative words of categories, known as seed words. The recent generative…

Computation and Language · Computer Science 2021-12-07 Bing Wang , Yue Wang , Ximing Li , Jihong Ouyang

RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules

Weakly supervised text classification (WSTC), also called zero-shot or dataless text classification, has attracted increasing attention due to its applicability in classifying a mass of texts within the dynamic and open Web environment,…

Computation and Language · Computer Science 2024-04-26 Miaomiao Li , Jiaqi Zhu , Yang Wang , Yi Yang , Yilin Li , Hongan Wang

Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks

Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting…

Computation and Language · Computer Science 2021-02-24 Xinyang Zhang , Chenwei Zhang , Luna Xin Dong , Jingbo Shang , Jiawei Han

Weakly-Supervised Hierarchical Text Classification

Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for text classification due…

Computation and Language · Computer Science 2019-01-01 Yu Meng , Jiaming Shen , Chao Zhang , Jiawei Han

X-Class: Text Classification with Extremely Weak Supervision

In this paper, we explore text classification with extremely weak supervision, i.e., only relying on the surface text of class names. This is a more challenging setting than the seed-driven weak supervision, which allows a few seed words…

Computation and Language · Computer Science 2022-02-09 Zihan Wang , Dheeraj Mekala , Jingbo Shang

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these…

Computation and Language · Computer Science 2022-12-16 Tingyu Xia , Yue Wang , Yuan Tian , Yi Chang

A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

Etremely Weakly Supervised Text Classification (XWS-TC) refers to text classification based on minimal high-level human guidance, such as a few label-indicative seed words or classification instructions. There are two mainstream approaches…

Computation and Language · Computer Science 2023-05-23 Zihan Wang , Tianle Wang , Dheeraj Mekala , Jingbo Shang

Denoising Multi-Source Weak Supervision for Neural Text Classification

We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources. This problem is challenging because rule-induced weak labels are often noisy and…

Computation and Language · Computer Science 2021-03-12 Wendi Ren , Yinghao Li , Hanting Su , David Kartchner , Cassie Mitchell , Chao Zhang

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision

Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy, which is a fundamental web text mining task with broad applications such as web content analysis and semantic indexing. Most…

Computation and Language · Computer Science 2025-02-06 Yunyi Zhang , Ruozhen Yang , Xueqiang Xu , Rui Li , Jinfeng Xiao , Jiaming Shen , Jiawei Han

Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes.…

Computation and Language · Computer Science 2024-06-18 Letian Peng , Yi Gu , Chengyu Dong , Zihan Wang , Jingbo Shang

LUMI: Unsupervised Intent Clustering with Multiple Pseudo-Labels

In this paper, we propose an intuitive, training-free and label-free method for intent clustering in conversational search. Current approaches to short text clustering use LLM-generated pseudo-labels to enrich text representations or to…

Computation and Language · Computer Science 2026-02-26 I-Fan Lin , Faegheh Hasibi , Suzan Verberne

Label-template based Few-Shot Text Classification with Contrastive Learning

As an algorithmic framework for learning to learn, meta-learning provides a promising solution for few-shot text classification. However, most existing research fail to give enough attention to class labels. Traditional basic framework…

Computation and Language · Computer Science 2024-12-16 Guanghua Hou , Shuhui Cao , Deqiang Ouyang , Ning Wang

Weakly Supervised Label Learning Flows

Supervised learning usually requires a large amount of labelled data. However, attaining ground-truth labels is costly for many tasks. Alternatively, weakly supervised methods learn with cheap weak signals that only approximately label some…

Machine Learning · Computer Science 2024-11-26 You Lu , Wenzhuo Song , Chidubem Arachie , Bert Huang

A common classification task situation is where one has a large amount of data available for training, but only a small portion is annotated with class labels. The goal of semi-supervised training, in this context, is to improve…

Computer Vision and Pattern Recognition · Computer Science 2022-07-01 Zijian Hu , Zhengyu Yang , Xuefeng Hu , Ram Nevatia

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords).…

Computation and Language · Computer Science 2023-10-24 Yu Zhang , Bowen Jin , Xiusi Chen , Yanzhen Shen , Yunyi Zhang , Yu Meng , Jiawei Han

BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification

Multi-label Text Classification (MLTC) is the task of categorizing documents into one or more topics. Considering the large volumes of data and varying domains of such tasks, fully supervised learning requires manually fully annotated…

Computation and Language · Computer Science 2022-10-28 Ziwen Liu , Josep Grau-Bove , Scott Allan Orr