Related papers: Info-Coevolution: An Efficient Framework for Data …
Active learning, a powerful paradigm in machine learning, aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset. However, the traditional active learning process often demands extensive…
When we can not assume a large amount of annotated data , active learning is a good strategy. It consists in learning a model on a small amount of annotated data (annotation budget) and in choosing the best set of points to annotate in…
Many recent approaches to natural language tasks are built on the remarkable abilities of large language models. Large language models can perform in-context learning, where they learn a new task from a few task demonstrations, without any…
Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality,…
A major challenge in Natural Language Processing is obtaining annotated data for supervised learning. An option is the use of crowdsourcing platforms for data annotation. However, crowdsourcing introduces issues related to the annotator's…
Annotating datasets for question answering (QA) tasks is very costly, as it requires intensive manual labor and often domain-specific knowledge. Yet strategies for annotating QA datasets in a cost-effective manner are scarce. To provide a…
Most machine learning and data analytics applications, including performance engineering in software systems, require a large number of annotations and labelled data, which might not be available in advance. Acquiring annotations often…
Image segmentation is a fundamental problem in biomedical image analysis. Recent advances in deep learning have achieved promising results on many biomedical image segmentation benchmarks. However, due to large variations in biomedical…
Many machine learning systems today are trained on large amounts of human-annotated data. Data annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective,…
Automated object detection has become increasingly valuable across diverse applications, yet efficient, high-quality annotation remains a persistent challenge. In this paper, we present the development and evaluation of a platform designed…
The splendid success of convolutional neural networks (CNNs) in computer vision is largely attributable to the availability of massive annotated datasets, such as ImageNet and Places. However, in medical imaging, it is challenging to create…
Dynamic data selection aims to accelerate training with lossless performance. However, reducing training data inherently limits data diversity, potentially hindering generalization. While data augmentation is widely used to enhance…
Supervised deep learning requires a large amount of training samples with annotations (e.g. label class for classification task, pixel- or voxel-wised label map for segmentation tasks), which are expensive and time-consuming to obtain.…
Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that…
Deep learning has been successfully applied to OCT segmentation. However, for data from different manufacturers and imaging protocols, and for different regions of interest (ROIs), it requires laborious and time-consuming data annotation…
Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a…
Many recent delineation techniques owe much of their increased effectiveness to path classification algorithms that make it possible to distinguish promising paths from others. The downside of this development is that they require annotated…
We have seen significant leapfrog advancement in machine learning in recent decades. The central idea of machine learnability lies on constructing learning algorithms that learn from good data. The availability of more data being made…
Supervised Deep Learning has been highly successful in recent years, achieving state-of-the-art results in most tasks. However, with the ongoing uptake of such methods in industrial applications, the requirement for large amounts of…
State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can…