Related papers: WEnets: A Convolutional Framework for Evaluating A…

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-21 Andrew Catellier , Stephen Voran

Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Unsupervised representation learning of speech has been of keen interest in recent years, which is for example evident in the wide interest of the ZeroSpeech challenges. This work presents a new method for learning frame level…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-18 Mingjie Chen , Thomas Hain

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional…

Sound · Computer Science 2018-03-29 Boqing Zhu , Changjian Wang , Feng Liu , Jin Lei , Zengquan Lu , Yuxing Peng

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones;…

Sound · Computer Science 2016-09-20 Aaron van den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , Koray Kavukcuoglu

WSNet: Compact and Efficient Networks Through Weight Sampling

We present a new approach and a novel architecture, termed WSNet, for learning compact and efficient deep neural networks. Existing approaches conventionally learn full model parameters independently and then compress them via ad hoc…

Computer Vision and Pattern Recognition · Computer Science 2018-05-23 Xiaojie Jin , Yingzhen Yang , Ning Xu , Jianchao Yang , Nebojsa Jojic , Jiashi Feng , Shuicheng Yan

Perceptual audio loss function for deep learning

PESQ and POLQA , are standards are standards for automated assessment of voice quality of speech as experienced by human beings. The predictions of those objective measures should come as close as possible to subjective quality scores as…

Sound · Computer Science 2017-08-22 Dan Elbaz , Michael Zibulevsky

RawNet: Fast End-to-End Neural Vocoder

Neural network-based vocoders have recently demonstrated the powerful ability to synthesize high-quality speech. These models usually generate samples by conditioning on spectral features, such as Mel-spectrogram and fundamental frequency,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-13 Yunchao He , Yujun Wang

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification

We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks. It is composed of residual blocks with 1D depth-wise separable convolutions, batch-normalization, and ReLU layers. This architecture…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-27 Nithin Rao Koluguri , Jason Li , Vitaly Lavrukhin , Boris Ginsburg

A Wavenet for Speech Denoising

Currently, most speech processing techniques use magnitude spectrograms as front-end and are therefore by default discarding part of the signal: the phase. In order to overcome this limitation, we propose an end-to-end learning method for…

Sound · Computer Science 2018-02-01 Dario Rethage , Jordi Pons , Xavier Serra

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Wei Han , Zhengdong Zhang , Yu Zhang , Jiahui Yu , Chung-Cheng Chiu , James Qin , Anmol Gulati , Ruoming Pang , Yonghui Wu

Optimizing Basis Function Selection in Constructive Wavelet Neural Networks and Its Applications

Wavelet neural network (WNN), which learns an unknown nonlinear mapping from the data, has been widely used in signal processing, and time-series analysis. However, challenges in constructing accurate wavelet bases and high computational…

Machine Learning · Computer Science 2025-07-15 Dunsheng Huang , Dong Shen , Lei Lu , Ying Tan

Speaker Recognition from Raw Waveform with SincNet

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-12 Mirco Ravanelli , Yoshua Bengio

Toward end-to-end interpretable convolutional neural networks for waveform signals

This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech…

Sound · Computer Science 2024-05-06 Linh Vu , Thu Tran , Wern-Han Lim , Raphael Phan

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

This paper presents a refinement framework of WaveNet vocoders for variational autoencoder (VAE) based voice conversion (VC), which reduces the quality distortion caused by the mismatch between the training data and testing data.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Wen-Chin Huang , Yi-Chiao Wu , Hsin-Te Hwang , Patrick Lumban Tobing , Tomoki Hayashi , Kazuhiro Kobayashi , Tomoki Toda , Yu Tsao , Hsin-Min Wang

Do WaveNets Dream of Acoustic Waves?

Various sources have reported the WaveNet deep learning architecture being able to generate high-quality speech, but to our knowledge there haven't been studies on the interpretation or visualization of trained WaveNets. This study…

Sound · Computer Science 2018-02-26 Kanru Hua

WaveNet's Precision in EEG Classification

This study introduces a WaveNet-based deep learning model designed to automate the classification of intracranial electroencephalography (iEEG) signals into physiological activity, pathological (epileptic) activity, power-line noise, and…

Machine Learning · Computer Science 2026-01-14 Casper van Laar , Khubaib Ahmed

WaDeNet: Wavelet Decomposition based CNN for Speech Processing

Existing speech processing systems consist of different modules, individually optimized for a specific task such as acoustic modelling or feature extraction. In addition to not assuring optimality of the system, the disjoint nature of…

Sound · Computer Science 2020-11-12 Prithvi Suresh , Abhijith Ragav

WaveletNet: Logarithmic Scale Efficient Convolutional Neural Networks for Edge Devices

We present a logarithmic-scale efficient convolutional neural network architecture for edge devices, named WaveletNet. Our model is based on the well-known depthwise convolution, and on two new layers, which we introduce in this work: a…

Machine Learning · Computer Science 2018-11-29 Li Jing , Rumen Dangovski , Marin Soljacic

A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

In this work, we propose a multi-head relevance weighting framework to learn audio representations from raw waveforms. The audio waveform, split into windows of short duration, are processed with a 1-D convolutional layer of cosine…

Audio and Speech Processing · Electrical Eng. & Systems 2021-08-02 Debottam Dutta , Purvi Agrawal , Sriram Ganapathy