English
Related papers

Related papers: Learning a Neural Diff for Speech Models

200 papers

End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory,…

Machine Learning · Computer Science 2017-06-02 Julius Kunze , Louis Kirsch , Ilia Kurenkov , Andreas Krug , Jens Johannsmeier , Sebastian Stober

An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and…

Computation and Language · Computer Science 2020-04-30 Genta Indra Winata , Samuel Cahyawijaya , Zhaojiang Lin , Zihan Liu , Peng Xu , Pascale Fung

In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however,…

Neural machine translation is known to require large numbers of parallel training sentences, which generally prevent it from excelling on low-resource language pairs. This thesis explores the use of cross-lingual transfer learning on neural…

Computation and Language · Computer Science 2020-01-07 Tom Kocmi

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep…

Sound · Computer Science 2023-05-12 Yi Chang , Zhao Ren , Thanh Tam Nguyen , Kun Qian , Björn W. Schuller

The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the…

Machine Learning · Computer Science 2021-11-05 Jun-Liang Lin , Sheng-De Wang

Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network…

Sound · Computer Science 2024-02-01 Jiawei Li , Chunxu Guo , Li Fu , Lu Fan , Edward F. Chang , Yuanning Li

With recent research advancements, deep learning models are becoming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-20 Sebastian Braun , Hannes Gamper , Chandan K. A. Reddy , Ivan Tashev

In this paper, we propose an efficient transfer leaning methods for training a personalized language model using a recurrent neural network with long short-term memory architecture. With our proposed fast transfer learning schemes, a…

Computation and Language · Computer Science 2017-10-11 Seunghyun Yoon , Hyeongu Yun , Yuna Kim , Gyu-tae Park , Kyomin Jung

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in…

Computation and Language · Computer Science 2021-04-12 Michael A. Hedderich , Lukas Lange , Heike Adel , Jannik Strötgen , Dietrich Klakow

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is…

Computation and Language · Computer Science 2019-06-20 Yu-An Chung , Wei-Ning Hsu , Hao Tang , James Glass

We propose an end-to-end model based on convolutional and recurrent neural networks for speech enhancement. Our model is purely data-driven and does not make any assumptions about the type or the stationarity of the noise. In contrast to…

Sound · Computer Science 2018-05-03 Han Zhao , Shuayb Zarar , Ivan Tashev , Chin-Hui Lee

End-to-end speech recognition is a promising technology for enabling compact automatic speech recognition (ASR) systems since it can unify the acoustic and language model into a single neural network. However, as a drawback, training of…

Computation and Language · Computer Science 2022-02-17 Yotaro Kubo , Shigeki Karita , Michiel Bacchiani

While most deployed speech recognition systems today still run on servers, we are in the midst of a transition towards deployments on edge devices. This leap to the edge is powered by the progression from traditional speech recognition…

Computation and Language · Computer Science 2020-02-10 Yuan Shangguan , Jian Li , Qiao Liang , Raziel Alvarez , Ian McGraw

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and…

Computation and Language · Computer Science 2021-04-21 Wei Zhou , Simon Berger , Ralf Schlüter , Hermann Ney

Most state-of-the-art speech systems are using Deep Neural Networks (DNNs). Those systems require a large amount of data to be learned. Hence, learning state-of-the-art frameworks on under-resourced speech languages/problems is a difficult…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-10 Vincent Roger , Jérôme Farinas , Julien Pinquier

It is today acknowledged that neural network language models outperform backoff language models in applications like speech recognition or statistical machine translation. However, training these models on large amounts of data can take…

Neural and Evolutionary Computing · Computer Science 2015-07-08 Aram Ter-Sarkisov , Holger Schwenk , Loic Barrault , Fethi Bougares

Cross-lingual model transfer is a compelling and popular method for predicting annotations in a low-resource language, whereby parallel corpora provide a bridge to a high-resource language and its associated annotated corpora. However,…

Computation and Language · Computer Science 2017-05-02 Meng Fang , Trevor Cohn

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in "zero-resource" speech search, indexing…

Computation and Language · Computer Science 2020-02-24 Herman Kamper , Yevgen Matusevych , Sharon Goldwater

While speech recognition has seen a surge in interest and research over the last decade, most machine learning models for speech recognition either require large training datasets or lots of storage and memory. Combined with the prominence…

Computation and Language · Computer Science 2021-03-26 Yonatan Alon
‹ Prev 1 2 3 10 Next ›