Related papers: Iterative Data Programming for Expanding Text Clas…

Iterative Label Improvement: Robust Training by Confidence Based Filtering and Dataset Partitioning

State-of-the-art, high capacity deep neural networks not only require large amounts of labelled training data, they are also highly susceptible to label errors in this data, typically resulting in large efforts and costs and therefore…

Machine Learning · Computer Science 2020-07-20 Christian Haase-Schütz , Rainer Stal , Heinz Hertlein , Bernhard Sick

Not Enough Data? Deep Learning to the Rescue!

Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially…

Computation and Language · Computer Science 2019-11-28 Ateret Anaby-Tavor , Boaz Carmeli , Esther Goldbraich , Amir Kantor , George Kour , Segev Shlomov , Naama Tepper , Naama Zwerdling

Learning from Multiple Noisy Partial Labelers

Programmatic weak supervision creates models without hand-labeled training data by combining the outputs of heuristic labelers. Existing frameworks make the restrictive assumption that labelers output a single class label. Enabling users to…

Machine Learning · Computer Science 2022-03-28 Peilin Yu , Tiffany Ding , Stephen H. Bach

Data Programming: Creating Large Training Sets, Quickly

Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive…

Machine Learning · Statistics 2018-12-10 Alexander Ratner , Christopher De Sa , Sen Wu , Daniel Selsam , Christopher Ré

Enhancement of Short Text Clustering by Iterative Classification

Short text clustering is a challenging task due to the lack of signal contained in such short texts. In this work, we propose iterative classification as a method to b o ost the clustering quality (e.g., accuracy) of short texts. Given a…

Information Retrieval · Computer Science 2020-02-03 Md Rashadul Hasan Rakib , Norbert Zeh , Magdalena Jankowska , Evangelos Milios

Unsupervised Label Refinement Improves Dataless Text Classification

Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label…

Computation and Language · Computer Science 2020-12-09 Zewei Chu , Karl Stratos , Kevin Gimpel

Improving Probabilistic Models in Text Classification via Active Learning

Social scientists often classify text documents to use the resulting labels as an outcome or a predictor in empirical research. Automated text classification has become a standard tool, since it requires less human coding. However, scholars…

Computation and Language · Computer Science 2025-05-14 Mitchell Bosley , Saki Kuzushima , Ted Enamorado , Yuki Shiraito

The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming

Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored…

Machine Learning · Computer Science 2021-08-27 Chufan Gao , Mononito Goswami

The Power of Communities: A Text Classification Model with Automated Labeling Process Using Network Community Detection

Text classification is one of the most critical areas in machine learning and artificial intelligence research. It has been actively adopted in many business applications such as conversational intelligence systems, news articles…

Computation and Language · Computer Science 2019-11-15 Minjun Kim , Hiroki Sayama

A cost-reducing partial labeling estimator in text classification problem

We propose a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, we assign negative-oriented labels to the ambiguous…

Machine Learning · Statistics 2019-06-11 Jiangning Chen , Zhibo Dai , Juntao Duan , Qianli Hu , Ruilin Li , Heinrich Matzinger , Ionel Popescu , Haoyan Zhai

Active and Incremental Learning with Weak Supervision

Large amounts of labeled training data are one of the main contributors to the great success that deep models have achieved in the past. Label acquisition for tasks other than benchmarks can pose a challenge due to requirements of both…

Computer Vision and Pattern Recognition · Computer Science 2020-01-22 Clemens-Alexander Brust , Christoph Käding , Joachim Denzler

Improve Text Classification Accuracy with Intent Information

Text classification, a core component of task-oriented dialogue systems, attracts continuous research from both the research and industry community, and has resulted in tremendous progress. However, existing method does not consider the use…

Computation and Language · Computer Science 2022-12-16 Yifeng Xie

Exploiting All Samples in Low-Resource Sentence Classification: Early Stopping and Initialization Parameters

To improve deep-learning performance in low-resource settings, many researchers have redesigned model architectures or applied additional data (e.g., external resources, unlabeled samples). However, there have been relatively few…

Computation and Language · Computer Science 2024-07-26 Hongseok Choi , Hyunju Lee

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

Active learning is an iterative labeling process that is used to obtain a small labeled subset, despite the absence of labeled data, thereby enabling to train a model for supervised tasks such as text classification. While active learning…

Computation and Language · Computer Science 2024-10-07 Christopher Schröder , Gerhard Heyer

An iterative method for classification of binary data

In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference…

Machine Learning · Statistics 2018-09-11 Denali Molitor , Deanna Needell

Self-Training for Class-Incremental Semantic Segmentation

In class-incremental semantic segmentation, we have no access to the labeled data of previous tasks. Therefore, when incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previously learned…

Computer Vision and Pattern Recognition · Computer Science 2022-03-14 Lu Yu , Xialei Liu , Joost van de Weijer

Sequential Targeting: an incremental learning approach for data imbalance in text classification

Classification tasks require a balanced distribution of data to ensure the learner to be trained to generalize over all classes. In real-world datasets, however, the number of instances vary substantially among classes. This typically leads…

Machine Learning · Computer Science 2020-11-24 Joel Jang , Yoonjeon Kim , Kyoungho Choi , Sungho Suh

Data Wrangling Task Automation Using Code-Generating Language Models

Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning…

Machine Learning · Computer Science 2025-02-25 Ashlesha Akella , Krishnasuri Narayanam

Moving Towards Open Set Incremental Learning: Readily Discovering New Authors

The classification of textual data often yields important information. Most classifiers work in a closed world setting where the classifier is trained on a known corpus, and then it is tested on unseen examples that belong to one of the…

Machine Learning · Computer Science 2022-12-27 Justin Leo , Jugal Kalita

Data Programming using Continuous and Quality-Guided Labeling Functions

Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set…

Machine Learning · Computer Science 2019-11-25 Oishik Chatterjee , Ganesh Ramakrishnan , Sunita Sarawagi