Related papers: Fully Convolutional Speech Recognition

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which dif-fers from most…

Machine Learning · Statistics 2017-06-16 Szu-Wei Fu , Yu Tsao , Xugang Lu , Hisashi Kawai

Attention Based Fully Convolutional Network for Speech Emotion Recognition

Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long…

Sound · Computer Science 2019-05-03 Yuanyuan Zhang , Jun Du , Zirui Wang , Jianshu Zhang

Very Deep Convolutional Networks for End-to-End Speech Recognition

Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more…

Computation and Language · Computer Science 2016-10-11 Yu Zhang , William Chan , Navdeep Jaitly

Learning Waveform-Based Acoustic Models using Deep Variational Convolutional Neural Networks

We investigate the potential of stochastic neural networks for learning effective waveform-based acoustic models. The waveform-based setting, inherent to fully end-to-end speech recognition systems, is motivated by several comparative…

Machine Learning · Statistics 2021-08-17 Dino Oglic , Zoran Cvetkovic , Peter Sollich

End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features. Recent advances in ``deep learning'' approaches questioned such systems, but…

Machine Learning · Computer Science 2013-12-10 Dimitri Palaz , Ronan Collobert , Mathew Magimai. -Doss

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force…

Machine Learning · Computer Science 2016-09-14 Ronan Collobert , Christian Puhrsch , Gabriel Synnaeve

Speech Denoising with Deep Feature Losses

We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly. Given input audio containing speech corrupted by an additive background signal, the system aims to produce a processed…

Audio and Speech Processing · Electrical Eng. & Systems 2018-09-18 Francois G. Germain , Qifeng Chen , Vladlen Koltun

Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of…

Sound · Computer Science 2016-11-30 Shuhui Qu , Juncheng Li , Wei Dai , Samarjit Das

Speech and Speaker Recognition from Raw Waveform with SincNet

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-26 Mirco Ravanelli , Yoshua Bengio

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

We propose a new end-to-end neural acoustic model for automatic speech recognition. The model is composed of multiple blocks with residual connections between them. Each block consists of one or more modules with 1D time-channel separable…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-24 Samuel Kriman , Stanislav Beliaev , Boris Ginsburg , Jocelyn Huang , Oleksii Kuchaiev , Vitaly Lavrukhin , Ryan Leary , Jason Li , Yang Zhang

Densely Connected Convolutional Networks for Speech Recognition

This paper presents our latest investigation on Densely Connected Convolutional Networks (DenseNets) for acoustic modelling (AM) in automatic speech recognition. DenseN-ets are very deep, compact convolutional neural networks, which have…

Computation and Language · Computer Science 2018-08-13 Chia Yu Li , Ngoc Thang Vu

Convolutional-Recurrent Neural Networks for Speech Enhancement

We propose an end-to-end model based on convolutional and recurrent neural networks for speech enhancement. Our model is purely data-driven and does not make any assumptions about the type or the stationarity of the noise. In contrast to…

Sound · Computer Science 2018-05-03 Han Zhao , Shuayb Zarar , Ivan Tashev , Chin-Hui Lee

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-18 Jee-weon Jung , Hee-Soo Heo , Ju-ho Kim , Hye-jin Shim , Ha-Jin Yu

Speaker Recognition from Raw Waveform with SincNet

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-12 Mirco Ravanelli , Yoshua Bengio

Very Deep Convolutional Neural Networks for Raw Waveforms

Learning acoustic models directly from the raw waveform data with minimal processing is challenging. Current waveform-based models have generally used very few (~2) convolutional layers, which might be insufficient for building high-level…

Sound · Computer Science 2016-10-04 Wei Dai , Chia Dai , Shuhui Qu , Juncheng Li , Samarjit Das

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional…

Sound · Computer Science 2018-03-29 Boqing Zhu , Changjian Wang , Feng Liu , Jin Lei , Zengquan Lu , Yuxing Peng

Learning linearly separable features for speech recognition using convolutional neural networks

Automatic speech recognition systems usually rely on spectral-based features, such as MFCC of PLP. These features are extracted based on prior knowledge such as, speech perception or/and speech production. Recently, convolutional neural…

Machine Learning · Computer Science 2015-04-17 Dimitri Palaz , Mathew Magimai Doss , Ronan Collobert

Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

Speech recognition in noisy and channel distorted scenarios is often challenging as the current acoustic modeling schemes are not adaptive to the changes in the signal distribution in the presence of noise. In this work, we develop a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Purvi Agrawal , Sriram Ganapathy

Effects of Number of Filters of Convolutional Layers on Speech Recognition Model Accuracy

Inspired by the progress of the End-to-End approach [1], this paper systematically studies the effects of Number of Filters of convolutional layers on the model prediction accuracy of CNN+RNN (Convolutional Neural Networks adding to…

Machine Learning · Computer Science 2021-02-05 James Mou , Jun Li

An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition

In this project, we worked on speech recognition, specifically predicting individual words based on both the video frames and audio. Empowered by convolutional neural networks, the recent speech recognition and lip reading models are…

Computer Vision and Pattern Recognition · Computer Science 2018-12-27 Devesh Walawalkar , Yihui He , Rohit Pillai