Related papers: Raw Waveform-based Audio Classification Using Samp…

Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of…

Sound · Computer Science 2016-11-30 Shuhui Qu , Juncheng Li , Wei Dai , Samarjit Das

Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms

Recently, the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains. This approach was applied to musical…

Sound · Computer Science 2017-05-23 Jongpil Lee , Jiyoung Park , Keunhyoung Luke Kim , Juhan Nam

Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification

Music tag words that describe music audio by text have different levels of abstraction. Taking this issue into account, we propose a music classification approach that aggregates multi-level and multi-scale features using pre-trained…

Sound · Computer Science 2017-06-22 Jongpil Lee , Juhan Nam

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Motivated by the fact that characteristics of different sound classes are highly diverse in different temporal scales and hierarchical levels, a novel deep convolutional neural network (CNN) architecture is proposed for the environmental…

Sound · Computer Science 2018-06-15 Boqing Zhu , Kele Xu , Dezhi Wang , Lilun Zhang , Bo Li , Yuxing Peng

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional…

Sound · Computer Science 2018-03-29 Boqing Zhu , Changjian Wang , Feng Liu , Jin Lei , Zengquan Lu , Yuxing Peng

Speech and Speaker Recognition from Raw Waveform with SincNet

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-26 Mirco Ravanelli , Yoshua Bengio

Audio Transformers

Over the past two decades, CNN architectures have produced compelling models of sound perception and cognition, learning hierarchical organizations of features. Analogous to successes in computer vision, audio feature classification can be…

Sound · Computer Science 2025-05-13 Prateek Verma , Jonathan Berger

An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy

Audio classification can distinguish different kinds of sounds, which is helpful for intelligent applications in daily life. However, it remains a challenging task since the sound events in an audio clip is probably multiple, even…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-22 Jiaxu Chen , Jing Hao , Kai Chen , Di Xie , Shicai Yang , Shiliang Pu

Very Deep Convolutional Neural Networks for Raw Waveforms

Learning acoustic models directly from the raw waveform data with minimal processing is challenging. Current waveform-based models have generally used very few (~2) convolutional layers, which might be insufficient for building high-level…

Sound · Computer Science 2016-10-04 Wei Dai , Chia Dai , Shuhui Qu , Juncheng Li , Samarjit Das

Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms

Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms. However, the emergence of deep neural networks as efficient feature extractors…

Sound · Computer Science 2021-02-16 Kleanthis Avramidis , Agelos Kratimenos , Christos Garoufis , Athanasia Zlatintsi , Petros Maragos

Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)

This study explores the field of audio classification from raw waveform using Convolutional Neural Networks (CNNs), a method that eliminates the need for extracting specialised features in the pre-processing step. Unlike recent trends in…

Sound · Computer Science 2024-12-03 Kazi Nazmul Haque , Rajib Rana , Tasnim Jarin , Bjorn W. Schuller

Speaker Recognition from Raw Waveform with SincNet

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-12 Mirco Ravanelli , Yoshua Bengio

Deep Learning for Audio Signal Processing

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered…

Sound · Computer Science 2019-05-28 Hendrik Purwins , Bo Li , Tuomas Virtanen , Jan Schlüter , Shuo-yiin Chang , Tara Sainath

Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method

Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of…

Sound · Computer Science 2021-05-11 Koichi Saito , Tomohiko Nakamura , Kohei Yatabe , Yuma Koizumi , Hiroshi Saruwatari

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones;…

Sound · Computer Science 2016-09-20 Aaron van den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , Koray Kavukcuoglu

A Generative Model for Raw Audio Using Transformer Architectures

This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet. This is fully probabilistic, auto-regressive, and…

Sound · Computer Science 2021-07-09 Prateek Verma , Chris Chafe

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-18 Jee-weon Jung , Hee-Soo Heo , Ju-ho Kim , Hye-jin Shim , Ha-Jin Yu

End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network

In this paper, we present an end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network (CNN) that learns a representation directly from the audio signal. Several convolutional layers are used to…

Sound · Computer Science 2019-04-22 Sajjad Abdoli , Patrick Cardinal , Alessandro Lameiras Koerich

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution…

Sound · Computer Science 2018-02-15 Taejun Kim , Jongpil Lee , Juhan Nam

Acoustic Model Adaptation from Raw Waveforms with SincNet

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-01 Joachim Fainberg , Ondřej Klejch , Erfan Loweimi , Peter Bell , Steve Renals