Related papers: Rethinking CNN Models for Audio Classification

FrequentNet: A Novel Interpretable Deep Learning Model for Image Classification

This paper has proposed a new baseline deep learning model of more benefits for image classification. Different from the convolutional neural network(CNN) practice where filters are trained by back propagation to represent different…

Computer Vision and Pattern Recognition · Computer Science 2021-08-13 Yifei Li , Kuangyan Song , Yiming Sun , Liao Zhu

CNN Architectures for Large-Scale Audio Classification

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with…

Sound · Computer Science 2017-01-11 Shawn Hershey , Sourish Chaudhuri , Daniel P. W. Ellis , Jort F. Gemmeke , Aren Jansen , R. Channing Moore , Manoj Plakal , Devin Platt , Rif A. Saurous , Bryan Seybold , Malcolm Slaney , Ron J. Weiss , Kevin Wilson

Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data

This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside…

Sound · Computer Science 2024-04-09 Hamza Mahdi , Eptehal Nashnoush , Rami Saab , Arjun Balachandar , Rishit Dagli , Lucas X. Perri , Houman Khosravani

SpectNet : End-to-End Audio Signal Classification Using Learnable Spectrograms

Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-18 Md. Istiaq Ansari , Taufiq Hasan

Utilizing Domain Knowledge in End-to-End Audio Processing

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers…

Sound · Computer Science 2017-12-04 Tycho Max Sylvester Tax , Jose Luis Diez Antich , Hendrik Purwins , Lars Maaløe

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

The ability of deep convolutional neural networks (CNN) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the…

Sound · Computer Science 2017-04-05 Justin Salamon , Juan Pablo Bello

Cross-Domain Knowledge Transfer for Underwater Acoustic Classification Using Pre-trained Models

Transfer learning is commonly employed to leverage large, pre-trained models and perform fine-tuning for downstream tasks. The most prevalent pre-trained models are initially trained using ImageNet. However, their ability to generalize can…

Sound · Computer Science 2025-03-19 Amirmohammad Mohammadi , Tejashri Kelhe , Davelle Carreiro , Alexandra Van Dine , Joshua Peeples

Randomly weighted CNNs for (music) audio classification

The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors. Following this idea, we study how non-trained (randomly weighted) convolutional neural networks perform as feature…

Sound · Computer Science 2019-02-18 Jordi Pons , Xavier Serra

Fidelity Estimation Improves Noisy-Image Classification With Pretrained Networks

Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning…

Computer Vision and Pattern Recognition · Computer Science 2021-10-06 Xiaoyu Lin , Deblina Bhattacharjee , Majed El Helou , Sabine Süsstrunk

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

We explore why deep convolutional neural networks (CNNs) with small two-dimensional kernels, primarily used for modeling spatial relations in images, are also effective in speech recognition. We analyze the representations learned by deep…

Computation and Language · Computer Science 2018-11-13 Joanna Rownicka , Peter Bell , Steve Renals

Learning to Learn Parameterized Classification Networks for Scalable Input Images

Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change. This prevents the feasibility of deployment on different input image resolutions for a specific model. To…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Duo Li , Anbang Yao , Qifeng Chen

An Effective Label Noise Model for DNN Text Classification

Because large, human-annotated datasets suffer from labeling errors, it is crucial to be able to train deep neural networks in the presence of label noise. While training image classification models with label noise have received much…

Machine Learning · Computer Science 2019-03-19 Ishan Jindal , Daniel Pressel , Brian Lester , Matthew Nokleby

LEAN: Light and Efficient Audio Classification Network

Over the past few years, audio classification task on large-scale dataset such as AudioSet has been an important research area. Several deeper Convolution-based Neural networks have shown compelling performance notably Vggish, YAMNet, and…

Sound · Computer Science 2023-05-23 Shwetank Choudhary , CR Karthik , Punuru Sri Lakshmi , Sumit Kumar

Densely Connected Convolutional Networks for Speech Recognition

This paper presents our latest investigation on Densely Connected Convolutional Networks (DenseNets) for acoustic modelling (AM) in automatic speech recognition. DenseN-ets are very deep, compact convolutional neural networks, which have…

Computation and Language · Computer Science 2018-08-13 Chia Yu Li , Ngoc Thang Vu

Adapting a ConvNeXt model to audio classification on AudioSet

In computer vision, convolutional neural networks (CNN) such as ConvNeXt, have been able to surpass state-of-the-art transformers, partly thanks to depthwise separable convolutions (DSC). DSC, as an approximation of the regular convolution,…

Sound · Computer Science 2023-06-02 Thomas Pellegrini , Ismail Khalfaoui-Hassani , Etienne Labbé , Timothée Masquelier

Unsupervised Discriminative Learning of Sounds for Audio Event Classification

Recent progress in network-based audio event classification has shown the benefit of pre-training models on visual data such as ImageNet. While this process allows knowledge transfer across different domains, training a model on large-scale…

Sound · Computer Science 2021-05-21 Sascha Hornauer , Ke Li , Stella X. Yu , Shabnam Ghaffarzadegan , Liu Ren

Data-Efficient Classification of Birdcall Through Convolutional Neural Networks Transfer Learning

Deep learning Convolutional Neural Network (CNN) models are powerful classification models but require a large amount of training data. In niche domains such as bird acoustics, it is expensive and difficult to obtain a large number of…

Computer Vision and Pattern Recognition · Computer Science 2019-09-18 Dina B. Efremova , Mangalam Sankupellay , Dmitry A. Konovalov

Trainingless Adaptation of Pretrained Models for Environmental Sound Classification

Deep neural network (DNN)-based models for environmental sound classification are not robust against a domain to which training data do not belong, that is, out-of-distribution or unseen data. To utilize pretrained models for the unseen…

Sound · Computer Science 2024-12-24 Noriyuki Tonami , Wataru Kohno , Keisuke Imoto , Yoshiyuki Yajima , Sakiko Mishima , Reishi Kondo , Tomoyuki Hino

Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network

Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-scale Dense connected…

Computer Vision and Pattern Recognition · Computer Science 2018-06-13 Dawei Feng , Kele Xu , Haibo Mi , Feifan Liao , Yan Zhou

Phase-Aware Deep Learning with Complex-Valued CNNs for Audio Signal Applications

This study explores the design and application of Complex-Valued Convolutional Neural Networks (CVCNNs) in audio signal processing, with a focus on preserving and utilizing phase information often neglected in real-valued networks. We begin…

Machine Learning · Computer Science 2025-10-14 Naman Agrawal