English
Related papers

Related papers: BERT-Flow-VAE: A Weakly-supervised Model for Multi…

200 papers

The escalating volume of collected healthcare textual data presents a unique challenge for automated Multi-Label Text Classification (MLTC), which is primarily due to the scarcity of annotated texts for training and their nuanced nature.…

Computation and Language · Computer Science 2025-03-04 Hajar Sakai , Sarah S. Lam

Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text, and has a wide range of application domains. Most existing approaches require an enormous amount of annotated data to learn a classifier and/or…

Computation and Language · Computer Science 2023-09-26 Muberra Ozmen , Joseph Cotnareanu , Mark Coates

Weakly supervised text classification (WSTC), also called zero-shot or dataless text classification, has attracted increasing attention due to its applicability in classifying a mass of texts within the dynamic and open Web environment,…

Computation and Language · Computer Science 2024-04-26 Miaomiao Li , Jiaqi Zhu , Yang Wang , Yi Yang , Yilin Li , Hongan Wang

Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting…

Computation and Language · Computer Science 2021-02-24 Xinyang Zhang , Chenwei Zhang , Luna Xin Dong , Jingbo Shang , Jiawei Han

A major challenge of multi-label text classification (MLTC) is to stimulatingly exploit possible label differences and label correlations. In this paper, we tackle this challenge by developing Label-Wise Pre-Training (LW-PT) method to get a…

Computation and Language · Computer Science 2020-08-18 Han Liu , Caixia Yuan , Xiaojie Wang

In this paper, we consider the task of retrieving documents with predefined topics from an unlabeled document dataset using an unsupervised approach. The proposed unsupervised approach requires only a small number of keywords describing the…

Computation and Language · Computer Science 2022-10-13 Tim Schopf , Daniel Braun , Florian Matthes

Unsupervised representation learning algorithms such as word2vec and ELMo improve the accuracy of many supervised NLP models, mainly because they can take advantage of large amounts of unlabeled text. However, the supervised models only…

Computation and Language · Computer Science 2018-09-25 Kevin Clark , Minh-Thang Luong , Christopher D. Manning , Quoc V. Le

In multi-label text classification (MLTC), each given document is associated with a set of correlated labels. To capture label correlations, previous classifier-chain and sequence-to-sequence models transform MLTC to a sequence prediction…

Computation and Language · Computer Science 2021-06-08 Ximing Zhang , Qian-Wen Zhang , Zhao Yan , Ruifang Liu , Yunbo Cao

Although BERT is widely used by the NLP community, little is known about its inner workings. Several attempts have been made to shed light on certain aspects of BERT, often with contradicting conclusions. A much raised concern focuses on…

Computation and Language · Computer Science 2020-10-13 Nikolaos Manginas , Ilias Chalkidis , Prodromos Malakasiotis

Supervised learning usually requires a large amount of labelled data. However, attaining ground-truth labels is costly for many tasks. Alternatively, weakly supervised methods learn with cheap weak signals that only approximately label some…

Machine Learning · Computer Science 2024-11-26 You Lu , Wenzhuo Song , Chidubem Arachie , Bert Huang

BERT is inefficient for sentence-pair tasks such as clustering or semantic search as it needs to evaluate combinatorially many sentence pairs which is very time-consuming. Sentence BERT (SBERT) attempted to solve this challenge by learning…

Computation and Language · Computer Science 2021-02-08 Yan Zhang , Ruidan He , Zuozhu Liu , Kwan Hui Lim , Lidong Bing

Extreme multi-label text classification (XMTC) is an important problem in the era of big data, for tagging a given text with the most relevant multiple labels from an extremely large-scale label set. XMTC can be found in many applications,…

Computation and Language · Computer Science 2019-11-05 Ronghui You , Zihan Zhang , Ziye Wang , Suyang Dai , Hiroshi Mamitsuka , Shanfeng Zhu

As an algorithmic framework for learning to learn, meta-learning provides a promising solution for few-shot text classification. However, most existing research fail to give enough attention to class labels. Traditional basic framework…

Computation and Language · Computer Science 2024-12-16 Guanghua Hou , Shuhui Cao , Deqiang Ouyang , Ning Wang

In multi-label text classification, each textual document can be assigned with one or more labels. Due to this nature, the multi-label text classification task is often considered to be more challenging compared to the binary or multi-class…

Information Retrieval · Computer Science 2019-07-02 Jingcheng Du , Qingyu Chen , Yifan Peng , Yang Xiang , Cui Tao , Zhiyong Lu

This is the first work to investigate the effectiveness of BERT-based contextual embeddings in active learning (AL) tasks on cold-start scenarios, where traditional fine-tuning is infeasible due to the absence of labeled data. Our primary…

Machine Learning · Computer Science 2024-07-25 Fabiano Belém , Washington Cunha , Celso França , Claudio Andrade , Leonardo Rocha , Marcos André Gonçalves

We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, annotated with ~4.3k EUROVOC labels, which is suitable for LMTC, few- and zero-shot…

Computation and Language · Computer Science 2019-06-07 Ilias Chalkidis , Manos Fergadiotis , Prodromos Malakasiotis , Ion Androutsopoulos

In weakly-supervised text classification, only label names act as sources of supervision. Predominant approaches to weakly-supervised text classification utilize a two-phase framework, where test samples are first assigned pseudo-labels and…

Computation and Language · Computer Science 2022-10-14 Seongmin Park , Jihwa Lee

Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords).…

Computation and Language · Computer Science 2023-10-24 Yu Zhang , Bowen Jin , Xiusi Chen , Yanzhen Shen , Yunyi Zhang , Yu Meng , Jiawei Han

Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy…

Computer Vision and Pattern Recognition · Computer Science 2018-03-06 Weifeng Ge , Sibei Yang , Yizhou Yu

Multi-label text classification (MLTC) aims to assign multiple labels to each sample in the dataset. The labels usually have internal correlations. However, traditional methods tend to ignore the correlations between labels. In order to…

Computation and Language · Computer Science 2018-09-11 Pengcheng Yang , Shuming Ma , Yi Zhang , Junyang Lin , Qi Su , Xu Sun
‹ Prev 1 2 3 10 Next ›