Related papers: Statistical Topic Models for Multi-Label Document …

SGM: Sequence Generation Model for Multi-label Classification

Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations…

Computation and Language · Computer Science 2018-06-18 Pengcheng Yang , Xu Sun , Wei Li , Shuming Ma , Wei Wu , Houfeng Wang

Topic Model Based Multi-Label Classification from the Crowd

Multi-label classification is a common supervised machine learning problem where each instance is associated with multiple classes. The key challenge in this problem is learning the correlations between the classes. An additional challenge…

Machine Learning · Computer Science 2016-04-05 Divya Padmanabhan , Satyanath Bhat , Shirish Shevade , Y. Narahari

Learning Supervised Topic Models for Classification and Regression from Crowds

The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on…

Machine Learning · Statistics 2018-08-20 Filipe Rodrigues , Mariana Lourenço , Bernardete Ribeiro , Francisco Pereira

Multi-label Dataless Text Classification with Topic Modeling

Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However,…

Information Retrieval · Computer Science 2017-11-07 Daochen Zha , Chenliang Li

A probabilistic methodology for multilabel classification

Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen…

Artificial Intelligence · Computer Science 2013-03-01 Alfonso E. Romero , Luis M. de Campos

Automatic Generation of Topic Labels

Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their…

Information Retrieval · Computer Science 2020-06-02 Areej Alokaili , Nikolaos Aletras , Mark Stevenson

Retrieval-augmented Multi-label Text Classification

Multi-label text classification (MLC) is a challenging task in settings of large label sets, where label support follows a Zipfian distribution. In this paper, we address this problem through retrieval augmentation, aiming to improve the…

Computation and Language · Computer Science 2023-05-23 Ilias Chalkidis , Yova Kementchedjhieva

Variational Sequential Labelers for Semi-Supervised Learning

We introduce a family of multitask variational methods for semi-supervised sequence labeling. Our model family consists of a latent-variable generative model and a discriminative labeler. The generative models use latent variables to define…

Computation and Language · Computer Science 2019-06-25 Mingda Chen , Qingming Tang , Karen Livescu , Kevin Gimpel

Are We Really Making Much Progress in Text Classification? A Comparative Review

We analyze various methods for single-label and multi-label text classification across well-known datasets, categorizing them into bag-of-words, sequence-based, graph-based, and hierarchical approaches. Despite the surge in methods like…

Computation and Language · Computer Science 2025-01-22 Lukas Galke , Ansgar Scherp , Andor Diera , Fabian Karl , Bao Xin Lin , Bhakti Khera , Tim Meuser , Tushar Singhal

Multilingual Hierarchical Attention Networks for Document Classification

Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. However, when multilingual document collections are considered, training such models separately for each language…

Computation and Language · Computer Science 2017-09-18 Nikolaos Pappas , Andrei Popescu-Belis

In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects,…

Machine Learning · Computer Science 2024-12-24 Ismail Hakki Karaman , Gulser Koksal , Levent Eriskin , Salih Salihoglu

Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution

Multi-label text classification is a challenging task because it requires capturing label dependencies. It becomes even more challenging when class distribution is long-tailed. Resampling and re-weighting are common approaches used for…

Computation and Language · Computer Science 2021-10-19 Yi Huang , Buse Giledereli , Abdullatif Köksal , Arzucan Özgür , Elif Ozkirimli

Multilabel Classification through Random Graph Ensembles

We present new methods for multilabel classification, relying on ensemble learning on a collection of random output graphs imposed on the multilabel and a kernel-based structured output learner as the base classifier. For ensemble learning,…

Machine Learning · Computer Science 2013-11-19 Hongyu Su , Juho Rousu

Generation of Consistent Sets of Multi-Label Classification Rules with a Multi-Objective Evolutionary Algorithm

Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text.…

Machine Learning · Computer Science 2020-04-03 Thiago Zafalon Miranda , Diorge Brognara Sardinha , Márcio Porto Basgalupp , Yaochu Jin , Ricardo Cerri

Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets

Multilabel classification is an emergent data mining task with a broad range of real world applications. Learning from imbalanced multilabel data is being deeply studied latterly, and several resampling methods have been proposed in the…

Machine Learning · Computer Science 2018-02-15 Francisco Charte , Antonio J. Rivera , María J. del Jesus , Francisco Herrera

Improving the Accuracy and Efficiency of Legal Document Tagging with Large Language Models and Instruction Prompts

Legal multi-label classification is a critical task for organizing and accessing the vast amount of legal documentation. Despite its importance, it faces challenges such as the complexity of legal language, intricate label dependencies, and…

Computation and Language · Computer Science 2025-04-15 Emily Johnson , Xavier Holt , Noah Wilson

Variational Deep Semantic Hashing for Text Documents

As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original…

Information Retrieval · Computer Science 2017-08-14 Suthee Chaidaroon , Yi Fang

Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected…

Machine Learning · Computer Science 2024-03-28 Ao Zhou , Bin Liu , Jin Wang , Grigorios Tsoumakas

A Multivariate Bernoulli-Based Sampling Method for Multi-Label Data with Application to Meta-Research

Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient observations with scarcer labels to make inferences…

Machine Learning · Computer Science 2026-05-27 Simon Chung , Colby J. Vorland , Donna L. Maney , Andrew W. Brown

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by…

Computation and Language · Computer Science 2016-08-09 Shaohua Li , Tat-Seng Chua , Jun Zhu , Chunyan Miao