Related papers: Deep Bayesian Self-Training
Human annotations are an important source of information in the development of natural language understanding approaches. As under the pressure of productivity annotators can assign different labels to a given text, the quality of produced…
While learning from synthetic training data has recently gained an increased attention, in real-world robotic applications, there are still performance deficiencies due to the so-called Sim-to-Real gap. In practice, this gap is hard to…
Training deep neural networks is challenging when large and annotated datasets are unavailable. Extensive manual annotation of data samples is time-consuming, expensive, and error-prone, notably when it needs to be done by experts. To…
Data augmentation is an essential part of the training process applied to deep learning models. The motivation is that a robust training process for deep learning models depends on large annotated datasets, which are expensive to be…
Active learning frameworks offer efficient data annotation without remarkable accuracy degradation. In other words, active learning starts training the model with a small size of labeled data while exploring the space of unlabeled data in…
Deep learning has proven effective for various application tasks, but its applicability is limited by the reliance on annotated examples. Self-supervised learning has emerged as a promising direction to alleviate the supervision bottleneck,…
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Their performance is strictly connected with the quality of labels used in training. Unfortunately, acquiring good-quality…
Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of…
Auto-annotation by ensemble of models is an efficient method of learning on unlabeled data. Wrong or inaccurate annotations generated by the ensemble may lead to performance degradation of the trained model. To deal with this problem we…
Annotating training data for sequence tagging of texts is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the…
Many machine learning systems today are trained on large amounts of human-annotated data. Data annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective,…
As the adoption of deep learning techniques in industrial applications grows with increasing speed and scale, successful deployment of deep learning models often hinges on the availability, volume, and quality of annotated data. In this…
Build accurate DNN models requires training on large labeled, context specific datasets, especially those matching the target scenario. We believe advances in wireless localization, working in unison with cameras, can produce automated…
Data collection from manual labeling provides domain-specific and task-aligned supervision for data-driven approaches, and a critical mass of well-annotated resources is required to achieve reasonable performance in natural language…
Training a deep neural network heavily relies on a large amount of training data with accurate annotations. To alleviate this problem, various methods have been proposed to annotate the data automatically. However, automatically generating…
Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning "shortcuts". In…
Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…
Digital data collected over the decades and data currently being produced with use of information technology is vastly the unlabeled data or data without description. The unlabeled data is relatively easy to acquire but expensive to label…
Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation…
Supervised deep learning requires a large amount of training samples with annotations (e.g. label class for classification task, pixel- or voxel-wised label map for segmentation tasks), which are expensive and time-consuming to obtain.…