Related papers: Learning a Neural Diff for Speech Models

Transfer Learning for Speech Recognition on a Budget

End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory,…

Machine Learning · Computer Science 2017-06-02 Julius Kunze , Louis Kirsch , Ilia Kurenkov , Andreas Krug , Jens Johannsmeier , Sebastian Stober

Meta-Transfer Learning for Code-Switched Speech Recognition

An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and…

Computation and Language · Computer Science 2020-04-30 Genta Indra Winata , Samuel Cahyawijaya , Zhaojiang Lin , Zihan Liu , Peng Xu , Pascale Fung

Knowing When to Quit: Probabilistic Early Exits for Speech Separation

In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however,…

Machine Learning · Computer Science 2026-03-05 Kenny Falkær Olsen , Mads Østergaard , Karl Ulbæk , Søren Føns Nielsen , Rasmus Malik Høegh Lindrup , Bjørn Sand Jensen , Morten Mørup

Exploring Benefits of Transfer Learning in Neural Machine Translation

Neural machine translation is known to require large numbers of parallel training sentences, which generally prevent it from excelling on low-resource language pairs. This thesis explores the use of cross-lingual transfer learning on neural…

Computation and Language · Computer Science 2020-01-07 Tom Kocmi

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep…

Sound · Computer Science 2023-05-12 Yi Chang , Zhao Ren , Thanh Tam Nguyen , Kun Qian , Björn W. Schuller

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the…

Machine Learning · Computer Science 2021-11-05 Jun-Liang Lin , Sheng-De Wang

Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction

Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network…

Sound · Computer Science 2024-02-01 Jiawei Li , Chunxu Guo , Li Fu , Lu Fan , Edward F. Chang , Yuanning Li

Towards efficient models for real-time deep noise suppression

With recent research advancements, deep learning models are becoming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-20 Sebastian Braun , Hannes Gamper , Chandan K. A. Reddy , Ivan Tashev

Efficient Transfer Learning Schemes for Personalized Language Modeling using Recurrent Neural Network

In this paper, we propose an efficient transfer leaning methods for training a personalized language model using a recurrent neural network with long short-term memory architecture. With our proposed fast transfer learning schemes, a…

Computation and Language · Computer Science 2017-10-11 Seunghyun Yoon , Hyeongu Yun , Yuna Kim , Gyu-tae Park , Kyomin Jung

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in…

Computation and Language · Computer Science 2021-04-12 Michael A. Hedderich , Lukas Lange , Heike Adel , Jannik Strötgen , Dietrich Klakow

An Unsupervised Autoregressive Model for Speech Representation Learning

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is…

Computation and Language · Computer Science 2019-06-20 Yu-An Chung , Wei-Ning Hsu , Hao Tang , James Glass

Convolutional-Recurrent Neural Networks for Speech Enhancement

We propose an end-to-end model based on convolutional and recurrent neural networks for speech enhancement. Our model is purely data-driven and does not make any assumptions about the type or the stationarity of the noise. In contrast to…

Sound · Computer Science 2018-05-03 Han Zhao , Shuayb Zarar , Ivan Tashev , Chin-Hui Lee

Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers

End-to-end speech recognition is a promising technology for enabling compact automatic speech recognition (ASR) systems since it can unify the acoustic and language model into a single neural network. However, as a drawback, training of…

Computation and Language · Computer Science 2022-02-17 Yotaro Kubo , Shigeki Karita , Michiel Bacchiani

Optimizing Speech Recognition For The Edge

While most deployed speech recognition systems today still run on servers, we are in the midst of a transition towards deployments on edge devices. This leap to the edge is powered by the progression from traditional speech recognition…

Computation and Language · Computer Science 2020-02-10 Yuan Shangguan , Jian Li , Qiao Liang , Raziel Alvarez , Ian McGraw

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and…

Computation and Language · Computer Science 2021-04-21 Wei Zhou , Simon Berger , Ralf Schlüter , Hermann Ney

Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data

Most state-of-the-art speech systems are using Deep Neural Networks (DNNs). Those systems require a large amount of data to be learned. Hence, learning state-of-the-art frameworks on under-resourced speech languages/problems is a difficult…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-10 Vincent Roger , Jérôme Farinas , Julien Pinquier

Incremental Adaptation Strategies for Neural Network Language Models

It is today acknowledged that neural network language models outperform backoff language models in applications like speech recognition or statistical machine translation. However, training these models on large amounts of data can take…

Neural and Evolutionary Computing · Computer Science 2015-07-08 Aram Ter-Sarkisov , Holger Schwenk , Loic Barrault , Fethi Bougares

Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary

Cross-lingual model transfer is a compelling and popular method for predicting annotations in a low-resource language, whereby parallel corpora provide a bridge to a high-resource language and its associated annotated corpora. However,…

Computation and Language · Computer Science 2017-05-02 Meng Fang , Trevor Cohn

Multilingual acoustic word embedding models for processing zero-resource languages

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in "zero-resource" speech search, indexing…

Computation and Language · Computer Science 2020-02-24 Herman Kamper , Yevgen Matusevych , Sharon Goldwater

Real-time low-resource phoneme recognition on edge devices

While speech recognition has seen a surge in interest and research over the last decade, most machine learning models for speech recognition either require large training datasets or lots of storage and memory. Combined with the prominence…

Computation and Language · Computer Science 2021-03-26 Yonatan Alon