English
Related papers

Related papers: Adaptive Ranking-based Sample Selection for Weakly…

200 papers

Answer sentence selection (AS2) modeling requires annotated data, i.e., hand-labeled question-answer pairs. We present a strategy to collect weakly supervised answers for a question based on its reference to improve AS2 modeling.…

Computation and Language · Computer Science 2021-04-20 Vivek Krishnamurthy , Thuy Vu , Alessandro Moschitti

We study Label Smoothing (LS), a widely used regularization technique, in the context of neural learning to rank (L2R) models. LS combines the ground-truth labels with a uniform distribution, encouraging the model to be less confident in…

Information Retrieval · Computer Science 2020-12-17 Gustavo Penha , Claudia Hauff

Creating large, good quality labeled data has become one of the major bottlenecks for developing machine learning applications. Multiple techniques have been developed to either decrease the dependence of labeled data (zero/few-shot…

Computation and Language · Computer Science 2023-02-08 Abhinav Bohra , Huy Nguyen , Devashish Khatwani

Traditional resampling methods for handling class imbalance typically uses fixed distributions, undersampling the majority or oversampling the minority. These static strategies ignore changes in class-wise learning difficulty, which can…

Machine Learning · Computer Science 2026-02-17 Arjun Basandrai , Shourya Jain , K. Ilanthenral

Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs)…

Machine Learning · Computer Science 2022-04-14 Gürkan Solmaz , Flavio Cirillo , Fabio Maresca , Anagha Gode Anil Kumar

Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-24 Ahmed Adel Attia , Dorottya Demszky , Jing Liu , Carol Espy-Wilson

Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected…

Machine Learning · Computer Science 2024-03-28 Ao Zhou , Bin Liu , Jin Wang , Grigorios Tsoumakas

With the abundance of industrial datasets, imbalanced classification has become a common problem in several application domains. Oversampling is an effective method to solve imbalanced classification. One of the main challenges of the…

Machine Learning · Computer Science 2022-07-18 Min Qian , Yan-Fu Li

Learning with noisy labels has gained increasing attention because the inevitable imperfect labels in real-world scenarios can substantially hurt the deep model performance. Recent studies tend to regard low-loss samples as clean ones and…

Machine Learning · Computer Science 2024-02-20 Huafeng Liu , Mengmeng Sheng , Zeren Sun , Yazhou Yao , Xian-Sheng Hua , Heng-Tao Shen

Active learning for regression reduces labeling costs by selecting the most informative samples. Improved Greedy Sampling is a prominent method that balances feature-space diversity and output-space uncertainty using a static,…

Machine Learning · Statistics 2026-03-12 Simon D. Nguyen , Troy Russo , Kentaro Hoffman , Tyler H. McCormick

Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data…

Instrumentation and Methods for Astrophysics · Physics 2015-05-28 Joseph W. Richards , Dan L. Starr , Henrik Brink , Adam A. Miller , Joshua S. Bloom , Nathaniel R. Butler , J. Berian James , James P. Long , John Rice

Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the…

Information Retrieval · Computer Science 2017-05-30 Mostafa Dehghani , Hamed Zamani , Aliaksei Severyn , Jaap Kamps , W. Bruce Croft

Active learning has emerged as a standard paradigm in areas with scarcity of labeled training data, such as in the medical domain. Language models have emerged as the prevalent choice of several natural language tasks due to the performance…

Computation and Language · Computer Science 2021-09-07 Anson Bastos , Manohar Kaul

Training deep neural networks requires many training samples, but in practice, training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other…

Information Retrieval · Computer Science 2018-06-25 Mostafa Dehghani , Jaap Kamps

Text classification is the task of automatically assigning text documents correct labels from a predefined set of categories. In real-life (text) classification tasks, observations and misclassification costs are often unevenly distributed…

Machine Learning · Computer Science 2025-09-03 Aleksi Avela , Pauliina Ilmonen

A common strategy in transfer learning is few shot fine-tuning, but its success is highly dependent on the quality of samples selected as training examples. Active learning methods such as uncertainty sampling and diversity sampling can…

Computation and Language · Computer Science 2026-04-23 Wei Han , David Martinez , Anna Khanina , Lawrence Cavedon , Karin Verspoor

Automatic target recognition (ATR) is an important use case for synthetic aperture radar (SAR) image interpretation. Recent years have seen significant advancements in SAR ATR technology based on semi-supervised learning. However, existing…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Xinzheng Zhang , Yuqing Luo , Guopeng Li

State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can…

Machine Learning · Computer Science 2020-10-15 Rahaf Aljundi , Nikolay Chumerin , Daniel Olmeda Reino

Pseudo-labeling has proven to be a promising semi-supervised learning (SSL) paradigm. Existing pseudo-labeling methods commonly assume that the class distributions of training data are balanced. However, such an assumption is far from…

Machine Learning · Computer Science 2023-03-03 Renzhen Wang , Xixi Jia , Quanziang Wang , Yichen Wu , Deyu Meng

For high-resource languages like English, text classification is a well-studied task. The performance of modern NLP models easily achieves an accuracy of more than 90% in many standard datasets for text classification in English (Xie et…

Computation and Language · Computer Science 2022-06-06 Dawei Zhu , Michael A. Hedderich , Fangzhou Zhai , David Ifeoluwa Adelani , Dietrich Klakow
‹ Prev 1 2 3 10 Next ›