Related papers: Understanding Audio Pattern Using Convolutional Ne…

Raw Waveform-based Audio Classification Using Sample-level CNN Architectures

Music, speech, and acoustic scene sound are often handled separately in the audio domain because of their different signal characteristics. However, as the image domain grows rapidly by versatile image classification models, it is necessary…

Sound · Computer Science 2017-12-05 Jongpil Lee , Taejun Kim , Jiyoung Park , Juhan Nam

Speech and Speaker Recognition from Raw Waveform with SincNet

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-26 Mirco Ravanelli , Yoshua Bengio

Very Deep Convolutional Neural Networks for Raw Waveforms

Learning acoustic models directly from the raw waveform data with minimal processing is challenging. Current waveform-based models have generally used very few (~2) convolutional layers, which might be insufficient for building high-level…

Sound · Computer Science 2016-10-04 Wei Dai , Chia Dai , Shuhui Qu , Juncheng Li , Samarjit Das

Speaker Recognition from Raw Waveform with SincNet

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-12 Mirco Ravanelli , Yoshua Bengio

Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)

This study explores the field of audio classification from raw waveform using Convolutional Neural Networks (CNNs), a method that eliminates the need for extracting specialised features in the pre-processing step. Unlike recent trends in…

Sound · Computer Science 2024-12-03 Kazi Nazmul Haque , Rajib Rana , Tasnim Jarin , Bjorn W. Schuller

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which dif-fers from most…

Machine Learning · Statistics 2017-06-16 Szu-Wei Fu , Yu Tsao , Xugang Lu , Hisashi Kawai

Fully Convolutional Speech Recognition

Current state-of-the-art speech recognition systems build on recurrent neural networks for acoustic and/or language modeling, and rely on feature extraction pipelines to extract mel-filterbanks or cepstral coefficients. In this paper we…

Computation and Language · Computer Science 2019-04-10 Neil Zeghidour , Qiantong Xu , Vitaliy Liptchinsky , Nicolas Usunier , Gabriel Synnaeve , Ronan Collobert

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones;…

Sound · Computer Science 2016-09-20 Aaron van den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , Koray Kavukcuoglu

Audio Transformers

Over the past two decades, CNN architectures have produced compelling models of sound perception and cognition, learning hierarchical organizations of features. Analogous to successes in computer vision, audio feature classification can be…

Sound · Computer Science 2025-05-13 Prateek Verma , Jonathan Berger

A Generative Model for Raw Audio Using Transformer Architectures

This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet. This is fully probabilistic, auto-regressive, and…

Sound · Computer Science 2021-07-09 Prateek Verma , Chris Chafe

Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms

Recently, the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains. This approach was applied to musical…

Sound · Computer Science 2017-05-23 Jongpil Lee , Jiyoung Park , Keunhyoung Luke Kim , Juhan Nam

Direct Modelling of Speech Emotion from Raw Speech

Speech emotion recognition is a challenging task and heavily depends on hand-engineered acoustic features, which are typically crafted to echo human perception of speech signals. However, a filter bank that is designed from perceptual…

Sound · Computer Science 2020-07-29 Siddique Latif , Rajib Rana , Sara Khalifa , Raja Jurdak , Julien Epps

Phase-Aware Deep Learning with Complex-Valued CNNs for Audio Signal Applications

This study explores the design and application of Complex-Valued Convolutional Neural Networks (CVCNNs) in audio signal processing, with a focus on preserving and utilizing phase information often neglected in real-valued networks. We begin…

Machine Learning · Computer Science 2025-10-14 Naman Agrawal

Interpretable Convolutional Filters with SincNet

Deep learning is currently playing a crucial role toward higher levels of artificial intelligence. This paradigm allows neural networks to learn complex and abstract representations, that are progressively obtained by combining simpler…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-12 Mirco Ravanelli , Yoshua Bengio

Explaining Deep Convolutional Neural Networks on Music Classification

Deep convolutional neural networks (CNNs) have been actively adopted in the field of music information retrieval, e.g. genre classification, mood detection, and chord recognition. However, the process of learning and prediction is little…

Machine Learning · Computer Science 2016-07-11 Keunwoo Choi , George Fazekas , Mark Sandler

End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network

In this paper, we present an end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network (CNN) that learns a representation directly from the audio signal. Several convolutional layers are used to…

Sound · Computer Science 2019-04-22 Sajjad Abdoli , Patrick Cardinal , Alessandro Lameiras Koerich

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution…

Sound · Computer Science 2018-02-15 Taejun Kim , Jongpil Lee , Juhan Nam

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers…

Sound · Computer Science 2023-10-25 Florian Schmid , Khaled Koutini , Gerhard Widmer

CGCNN: Complex Gabor Convolutional Neural Network on raw speech

Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent…

Sound · Computer Science 2020-02-12 Paul-Gauthier Noé , Titouan Parcollet , Mohamed Morchid

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Motivated by the fact that characteristics of different sound classes are highly diverse in different temporal scales and hierarchical levels, a novel deep convolutional neural network (CNN) architecture is proposed for the environmental…

Sound · Computer Science 2018-06-15 Boqing Zhu , Kele Xu , Dezhi Wang , Lilun Zhang , Bo Li , Yuxing Peng