Related papers: Adaptive Ranking-based Sample Selection for Weakly…

Reference-based Weak Supervision for Answer Sentence Selection using Web Data

Answer sentence selection (AS2) modeling requires annotated data, i.e., hand-labeled question-answer pairs. We present a strategy to collect weakly supervised answers for a question based on its reference to improve AS2 modeling.…

Computation and Language · Computer Science 2021-04-20 Vivek Krishnamurthy , Thuy Vu , Alessandro Moschitti

Weakly Supervised Label Smoothing

We study Label Smoothing (LS), a widely used regularization technique, in the context of neural learning to rank (L2R) models. LS combines the ground-truth labels with a uniform distribution, encouraging the model to be less confident in…

Information Retrieval · Computer Science 2020-12-17 Gustavo Penha , Claudia Hauff

AutoWS: Automated Weak Supervision Framework for Text Classification

Creating large, good quality labeled data has become one of the major bottlenecks for developing machine learning applications. Multiple techniques have been developed to either decrease the dependence of labeled data (zero/few-shot…

Computation and Language · Computer Science 2023-02-08 Abhinav Bohra , Huy Nguyen , Devashish Khatwani

ART: Adaptive Resampling-based Training for Imbalanced Classification

Traditional resampling methods for handling class imbalance typically uses fixed distributions, undersampling the majority or oversampling the minority. These static strategies ignore changes in class-wise learning difficulty, which can…

Machine Learning · Computer Science 2026-02-17 Arjun Basandrai , Shourya Jain , K. Ilanthenral

Label Augmentation with Reinforced Labeling for Weak Supervision

Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs)…

Machine Learning · Computer Science 2022-04-14 Gürkan Solmaz , Flavio Cirillo , Fabio Maresca , Anagha Gode Anil Kumar

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-24 Ahmed Adel Attia , Dorottya Demszky , Jing Liu , Carol Espy-Wilson

Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected…

Machine Learning · Computer Science 2024-03-28 Ao Zhou , Bin Liu , Jin Wang , Grigorios Tsoumakas

Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification

With the abundance of industrial datasets, imbalanced classification has become a common problem in several application domains. Oversampling is an effective method to solve imbalanced classification. One of the main challenges of the…

Machine Learning · Computer Science 2022-07-18 Min Qian , Yan-Fu Li

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Learning with noisy labels has gained increasing attention because the inevitable imperfect labels in real-world scenarios can substantially hurt the deep model performance. Recent studies tend to regard low-loss samples as clean ones and…

Machine Learning · Computer Science 2024-02-20 Huafeng Liu , Mengmeng Sheng , Zeren Sun , Yazhou Yao , Xian-Sheng Hua , Heng-Tao Shen

Adaptive Active Learning for Regression via Reinforcement Learning

Active learning for regression reduces labeling costs by selecting the most informative samples. Improved Greedy Sampling is a prominent method that balances feature-space diversity and output-space uncertainty using a static,…

Machine Learning · Statistics 2026-03-12 Simon D. Nguyen , Troy Russo , Kentaro Hoffman , Tyler H. McCormick

Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data…

Instrumentation and Methods for Astrophysics · Physics 2015-05-28 Joseph W. Richards , Dan L. Starr , Henrik Brink , Adam A. Miller , Joshua S. Bloom , Nathaniel R. Butler , J. Berian James , James P. Long , John Rice

Neural Ranking Models with Weak Supervision

Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the…

Information Retrieval · Computer Science 2017-05-30 Mostafa Dehghani , Hamed Zamani , Aliaksei Severyn , Jaap Kamps , W. Bruce Croft

ALLWAS: Active Learning on Language models in WASserstein space

Active learning has emerged as a standard paradigm in areas with scarcity of labeled training data, such as in the medical domain. Language models have emerged as the prevalent choice of several natural language tasks due to the performance…

Computation and Language · Computer Science 2021-09-07 Anson Bastos , Manohar Kaul

Learning to Rank from Samples of Variable Quality

Training deep neural networks requires many training samples, but in practice, training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other…

Information Retrieval · Computer Science 2018-06-25 Mostafa Dehghani , Jaap Kamps

Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification

Text classification is the task of automatically assigning text documents correct labels from a predefined set of categories. In real-life (text) classification tasks, observations and misclassification costs are often unevenly distributed…

Machine Learning · Computer Science 2025-09-03 Aleksi Avela , Pauliina Ilmonen

RADS: Reinforcement Learning-Based Sample Selection Improves Transfer Learning in Low-resource and Imbalanced Clinical Settings

A common strategy in transfer learning is few shot fine-tuning, but its success is highly dependent on the quality of samples selected as training examples. Active learning methods such as uncertainty sampling and diversity sampling can…

Computation and Language · Computer Science 2026-04-23 Wei Han , David Martinez , Anna Khanina , Lawrence Cavedon , Karin Verspoor

Energy Score-based Pseudo-Label Filtering and Adaptive Loss for Imbalanced Semi-supervised SAR target recognition

Automatic target recognition (ATR) is an important use case for synthetic aperture radar (SAR) image interpretation. Recent years have seen significant advancements in SAR ATR technology based on semi-supervised learning. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Xinzheng Zhang , Yuqing Luo , Guopeng Li

Identifying Wrongly Predicted Samples: A Method for Active Learning

State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can…

Machine Learning · Computer Science 2020-10-15 Rahaf Aljundi , Nikolay Chumerin , Daniel Olmeda Reino

Imbalanced Semi-supervised Learning with Bias Adaptive Classifier

Pseudo-labeling has proven to be a promising semi-supervised learning (SSL) paradigm. Existing pseudo-labeling methods commonly assume that the class distributions of training data are balanced. However, such an assumption is far from…

Machine Learning · Computer Science 2023-03-03 Renzhen Wang , Xixi Jia , Quanziang Wang , Yichen Wu , Deyu Meng

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-studied task. The performance of modern NLP models easily achieves an accuracy of more than 90% in many standard datasets for text classification in English (Xie et…

Computation and Language · Computer Science 2022-06-06 Dawei Zhu , Michael A. Hedderich , Fangzhou Zhai , David Ifeoluwa Adelani , Dietrich Klakow