Related papers: Multi-Task Self-Supervised Learning for Disfluency…

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Most existing approaches to disfluency detection heavily rely on human-annotated corpora, which is expensive to obtain in practice. There have been several proposals to alleviate this issue with, for instance, self-supervised learning…

Computation and Language · Computer Science 2020-10-30 Shaolei Wang , Zhongyuan Wang , Wanxiang Che , Ting Liu

Auxiliary Sequence Labeling Tasks for Disfluency Detection

Detecting disfluencies in spontaneous speech is an important preprocessing step in natural language processing and speech recognition applications. Existing works for disfluency detection have focused on designing a single objective only…

Computation and Language · Computer Science 2021-04-06 Dongyub Lee , Byeongil Ko , Myeong Cheol Shin , Taesun Whang , Daniel Lee , Eun Hwa Kim , EungGyun Kim , Jaechoon Jo

Improving Disfluency Detection by Self-Training a Self-Attentive Model

Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word…

Computation and Language · Computer Science 2020-04-30 Paria Jamshid Lou , Mark Johnson

Self-semi-supervised Learning to Learn from NoisyLabeled Data

The remarkable success of today's deep neural networks highly depends on a massive number of correctly labeled data. However, it is rather costly to obtain high-quality human-labeled data, leading to the active research area of training…

Machine Learning · Computer Science 2020-11-04 Jiacheng Wang , Yue Ma , Shuang Gao

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear…

Computation and Language · Computer Science 2022-04-19 Sreyan Ghosh , Sonal Kumar , Yaman Kumar Singla , Rajiv Ratn Shah , S. Umesh

Learning Multiple Dense Prediction Tasks from Partially Annotated Data

Despite the recent advances in multi-task learning of dense prediction problems, most methods rely on expensive labelled datasets. In this paper, we present a label efficient approach and look at jointly learning of multiple dense…

Computer Vision and Pattern Recognition · Computer Science 2022-05-05 Wei-Hong Li , Xialei Liu , Hakan Bilen

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or…

Machine Learning · Computer Science 2017-12-08 Mostafa Dehghani , Aliaksei Severyn , Sascha Rothe , Jaap Kamps

Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity

Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment…

Computation and Language · Computer Science 2019-10-11 Eunah Cho , He Xie , John P. Lalor , Varun Kumar , William M. Campbell

Self-training Improves Pre-training for Natural Language Understanding

Unsupervised pre-training has led to much recent progress in natural language understanding. In this paper, we study self-training as another way to leverage unlabeled data through semi-supervised learning. To obtain additional data for a…

Computation and Language · Computer Science 2020-10-06 Jingfei Du , Edouard Grave , Beliz Gunel , Vishrav Chaudhary , Onur Celebi , Michael Auli , Ves Stoyanov , Alexis Conneau

Multi-Task Self-Supervised Pre-Training for Music Classification

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and…

Sound · Computer Science 2021-02-08 Ho-Hsiang Wu , Chieh-Chi Kao , Qingming Tang , Ming Sun , Brian McFee , Juan Pablo Bello , Chao Wang

Beyond without Forgetting: Multi-Task Learning for Classification with Disjoint Datasets

Multi-task Learning (MTL) for classification with disjoint datasets aims to explore MTL when one task only has one labeled dataset. In existing methods, for each task, the unlabeled datasets are not fully exploited to facilitate this task.…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Yan Hong , Li Niu , Jianfu Zhang , Liqing Zhang

Weakly Supervised Multi-task Learning for Concept-based Explainability

In ML-aided decision-making tasks, such as fraud detection or medical diagnosis, the human-in-the-loop, usually a domain-expert without technical ML knowledge, prefers high-level concept-based explanations instead of low-level explanations…

Machine Learning · Computer Science 2021-04-27 Catarina Belém , Vladimir Balayan , Pedro Saleiro , Pedro Bizarro

Boosting Disfluency Detection with Large Language Model as Disfluency Generator

Current disfluency detection methods heavily rely on costly and scarce human-annotated data. To tackle this issue, some approaches employ heuristic or statistical features to generate disfluent sentences, partially improving detection…

Computation and Language · Computer Science 2024-08-07 Zhenrong Cheng , Jiayan Guo , Hao Sun , Yan Zhang

Semi-Supervised Learning with Scarce Annotations

While semi-supervised learning (SSL) algorithms provide an efficient way to make use of both labelled and unlabelled data, they generally struggle when the number of annotated samples is very small. In this work, we consider the problem of…

Computer Vision and Pattern Recognition · Computer Science 2020-04-23 Sylvestre-Alvise Rebuffi , Sebastien Ehrhardt , Kai Han , Andrea Vedaldi , Andrew Zisserman

Disjoint Multi-task Learning between Heterogeneous Human-centric Tasks

Human behavior understanding is arguably one of the most important mid-level components in artificial intelligence. In order to efficiently make use of data, multi-task learning has been studied in diverse computer vision tasks including…

Computer Vision and Pattern Recognition · Computer Science 2018-02-15 Dong-Jin Kim , Jinsoo Choi , Tae-Hyun Oh , Youngjin Yoon , In So Kweon

Doubly Robust Self-Training

Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of…

Machine Learning · Computer Science 2023-11-06 Banghua Zhu , Mingyu Ding , Philip Jacobson , Ming Wu , Wei Zhan , Michael Jordan , Jiantao Jiao

Boosting Supervised Learning Performance with Co-training

Deep learning perception models require a massive amount of labeled training data to achieve good performance. While unlabeled data is easy to acquire, the cost of labeling is prohibitive and could create a tremendous burden on companies or…

Computer Vision and Pattern Recognition · Computer Science 2021-11-19 Xinnan Du , William Zhang , Jose M. Alvarez

Semi-supervised Sequence Learning

We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing.…

Machine Learning · Computer Science 2015-11-05 Andrew M. Dai , Quoc V. Le

Meta-learning of semi-supervised learning from tasks with heterogeneous attribute spaces

We propose a meta-learning method for semi-supervised learning that learns from multiple tasks with heterogeneous attribute spaces. The existing semi-supervised meta-learning methods assume that all tasks share the same attribute space,…

Machine Learning · Computer Science 2023-11-10 Tomoharu Iwata , Atsutoshi Kumagai

A novel multimodal dynamic fusion network for disfluency detection in spoken utterances

Disfluency, though originating from human spoken utterances, is primarily studied as a uni-modal text-based Natural Language Processing (NLP) task. Based on early-fusion and self-attention-based multimodal interaction between text and…

Computation and Language · Computer Science 2022-11-29 Sreyan Ghosh , Utkarsh Tyagi , Sonal Kumar , Manan Suri , Rajiv Ratn Shah