English
Related papers

Related papers: Data Programming: Creating Large Training Sets, Qu…

200 papers

Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set…

Machine Learning · Computer Science 2019-11-25 Oishik Chatterjee , Ganesh Ramakrishnan , Sunita Sarawagi

Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored…

Machine Learning · Computer Science 2021-08-27 Chufan Gao , Mononito Goswami

Although large language models (LLMs) have advanced the state-of-the-art in NLP significantly, deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security. As…

Computation and Language · Computer Science 2023-11-01 Dong-Ho Lee , Jay Pujara , Mohit Sewak , Ryen W. White , Sujay Kumar Jauhar

The paradigm of data programming, which uses weak supervision in the form of rules/labelling functions, and semi-supervised learning, which augments small amounts of labelled data with a large unlabelled dataset, have shown great promise in…

Machine Learning · Computer Science 2021-06-15 Ayush Maheshwari , Oishik Chatterjee , KrishnaTeja Killamsetty , Ganesh Ramakrishnan , Rishabh Iyer

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta

As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly…

Machine Learning · Computer Science 2013-07-16 Alexandros Ntoulas , Omar Alonso , Vasilis Kandylas

Recently, deep learning models have been widely applied in program understanding tasks, and these models achieve state-of-the-art results on many benchmark datasets. A major challenge of deep learning for program understanding is that the…

Software Engineering · Computer Science 2024-01-02 Wenhan Wang , Yanzhou Li , Anran Li , Jian Zhang , Wei Ma , Yang Liu

Large Deep Neural Networks (DNNs) are often data hungry and need high-quality labeled data in copious amounts for learning to converge. This is a challenge in the field of medicine since high quality labeled data is often scarce. Data…

Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports…

The recent success of deep learning is mostly due to the availability of big datasets with clean annotations. However, gathering a cleanly annotated dataset is not always feasible due to practical challenges. As a result, label noise is a…

Computer Vision and Pattern Recognition · Computer Science 2020-03-25 Görkem Algan , İlkay Ulusoy

Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect. There…

Machine Learning · Computer Science 2019-04-15 Junnan Li , Yongkang Wong , Qi Zhao , Mohan Kankanhalli

Modern machine learning models require large labelled datasets to achieve good performance, but manually labelling large datasets is expensive and time-consuming. The data programming paradigm enables users to label large datasets…

Machine Learning · Computer Science 2024-02-12 Naiqing Guan , Nick Koudas

Collecting large-scale data with clean labels for supervised training of neural networks is practically challenging. Although noisy labels are usually cheap to acquire, existing methods suffer a lot from label noise. This paper targets at…

Machine Learning · Computer Science 2020-06-16 Zizhao Zhang , Han Zhang , Sercan O. Arik , Honglak Lee , Tomas Pfister

State-of-the-art, high capacity deep neural networks not only require large amounts of labelled training data, they are also highly susceptible to label errors in this data, typically resulting in large efforts and costs and therefore…

Machine Learning · Computer Science 2020-07-20 Christian Haase-Schütz , Rainer Stal , Heinz Hertlein , Bernhard Sick

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

The scalability of the labeling process and the attainable quality of labels have become limiting factors for many applications of machine learning. The programmatic creation of labeled datasets via the synthesis of noisy heuristics…

Machine Learning · Computer Science 2019-12-18 Benedikt Boecking , Artur Dubrawski

Collecting large training datasets, annotated with high-quality labels, is costly and time-consuming. This paper proposes a novel framework for training deep convolutional neural networks from noisy labeled datasets that can be obtained…

Machine Learning · Computer Science 2017-11-06 Arash Vahdat

A critical bottleneck in supervised machine learning is the need for large amounts of labeled data which is expensive and time consuming to obtain. However, it has been shown that a small amount of labeled data, while insufficient to…

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the…

Machine Learning · Computer Science 2025-03-20 Tong Guo

While mislabeled or ambiguously-labeled samples in the training set could negatively affect the performance of deep models, diagnosing the dataset and identifying mislabeled samples helps to improve the generalization power. Training…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Qingrui Jia , Xuhong Li , Lei Yu , Jiang Bian , Penghao Zhao , Shupeng Li , Haoyi Xiong , Dejing Dou
‹ Prev 1 2 3 10 Next ›