English
Related papers

Related papers: SPEAR : Semi-supervised Data Programming in Python

200 papers

The paradigm of data programming, which uses weak supervision in the form of rules/labelling functions, and semi-supervised learning, which augments small amounts of labelled data with a large unlabelled dataset, have shown great promise in…

Machine Learning · Computer Science 2021-06-15 Ayush Maheshwari , Oishik Chatterjee , KrishnaTeja Killamsetty , Ganesh Ramakrishnan , Rishabh Iyer

Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a…

Machine Learning · Statistics 2015-01-19 Jim Jing-Yan Wang , Xin Gao

In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves…

Machine Learning · Computer Science 2023-06-08 Francesco Pinto , Yaxi Hu , Fanny Yang , Amartya Sanyal

SHallow REcurrent Decoders (SHRED) provide a deep learning strategy for modeling high-dimensional dynamical systems and/or spatiotemporal data from dynamical system snapshot observations. PySHRED is a Python package that implements SHRED…

Machine Learning · Computer Science 2025-07-29 David Ye , Jan Williams , Mars Gao , Stefano Riva , Matteo Tomasetto , David Zoro , J. Nathan Kutz

We present skweak, a versatile, Python-based software toolkit enabling NLP developers to apply weak supervision to a wide range of NLP tasks. Weak supervision is an emerging machine learning paradigm based on a simple idea: instead of…

Computation and Language · Computer Science 2021-08-18 Pierre Lison , Jeremy Barnes , Aliaksandr Hubin

Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't…

Machine Learning · Computer Science 2022-12-02 Jinsung Yoon , Kihyuk Sohn , Chun-Liang Li , Sercan O. Arik , Tomas Pfister

Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in…

Machine Learning · Computer Science 2026-05-25 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a…

Cryptography and Security · Computer Science 2022-05-03 Rui Shu , Tianpei Xia , Huy Tu , Laurie Williams , Tim Menzies

We motivate weakly supervised learning as an effective learning paradigm for problems where curating perfectly annotated datasets is expensive and may require domain expertise such as fine-grained classification. We focus on Partial Label…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Darshana Saravanan , Naresh Manwani , Vineet Gandhi

Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain…

Sound · Computer Science 2022-06-28 Bowen Zhang , Songjun Cao , Xiaoming Zhang , Yike Zhang , Long Ma , Takahiro Shinozaki

Self-supervised learning (SSL) has significantly advanced acoustic representation learning. However, most existing models are optimised for either speech or audio event understanding, resulting in a persistent gap between these two domains.…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-05 Xiaoyu Yang , Yifan Yang , Zengrui Jin , Ziyun Cui , Wen Wu , Baoxiang Li , Chao Zhang , Phil Woodland

Semi-supervised learning (SSL) can reduce the need for large labelled datasets by incorporating unlabelled data into the training. This is particularly interesting for semantic segmentation, where labelling data is very costly and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-20 Sebastian Scherer , Robin Schön , Rainer Lienhart

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Acquiring accurate labels on large-scale datasets is both time consuming and expensive. To reduce the dependency of deep learning models on learning from clean labeled data, several recent research efforts are focused on learning with noisy…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Arushi Goel , Yunlong Jiao , Jordan Massiah

Labeled data is a critical resource for training and evaluating machine learning models. However, many real-life datasets are only partially labeled. We propose a semi-supervised machine learning training strategy to improve event detection…

Computer Vision and Pattern Recognition · Computer Science 2022-10-05 Florian Dubost , Erin Hong , Nandita Bhaskhar , Siyi Tang , Daniel Rubin , Christopher Lee-Messer

The scarcity of labeled data is a critical obstacle to deep learning. Semi-supervised learning (SSL) provides a promising way to leverage unlabeled data by pseudo labels. However, when the size of labeled data is very small (say a few…

Computer Vision and Pattern Recognition · Computer Science 2021-10-27 Yi Xu , Jiandong Ding , Lu Zhang , Shuigeng Zhou

Semi-supervised learning methods are usually employed in the classification of data sets where only a small subset of the data items is labeled. In these scenarios, label noise is a crucial issue, since the noise may easily spread to a…

Machine Learning · Computer Science 2020-02-14 Fabricio Aparecido Breve , Liang Zhao , Marcos Gonçalves Quiles

A common classification task situation is where one has a large amount of data available for training, but only a small portion is annotated with class labels. The goal of semi-supervised training, in this context, is to improve…

Computer Vision and Pattern Recognition · Computer Science 2022-07-01 Zijian Hu , Zhengyu Yang , Xuefeng Hu , Ram Nevatia

Partial Label (PL) learning refers to the task of learning from the partially labeled data, where each training instance is ambiguously equipped with a set of candidate labels but only one is valid. Advances in the recent deep PL learning…

Machine Learning · Computer Science 2022-12-01 Ximing Li , Yuanzhi Jiang , Changchun Li , Yiyuan Wang , Jihong Ouyang

State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such…

Computation and Language · Computer Science 2021-04-13 Giannis Karamanolakis , Subhabrata Mukherjee , Guoqing Zheng , Ahmed Hassan Awadallah
‹ Prev 1 2 3 10 Next ›