Related papers: Audio Spectrogram Representations for Processing w…

Spectrogram features for audio and speech analysis

Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivator for spectrogram-based representations was…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-17 Ian McLoughlin , Lam Pham , Yan Song , Xiaoxiao Miao , Huy Phan , Pengfei Cai , Qing Gu , Jiang Nan , Haoyu Song , Donny Soh

Applying Visual Domain Style Transfer and Texture Synthesis Techniques to Audio - Insights and Challenges

Style transfer is a technique for combining two images based on the activations and feature statistics in a deep learning neural network architecture. This paper studies the analogous task in the audio domain and takes a critical look at…

Sound · Computer Science 2020-08-10 M. Huzaifah , L. Wyse

Neural Style Transfer for Audio Spectograms

There has been fascinating work on creating artistic transformations of images by Gatys. This was revolutionary in how we can in some sense alter the 'style' of an image while generally preserving its 'content'. In our work, we present a…

Sound · Computer Science 2024-12-24 Prateek Verma , Julius O. Smith

Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks

Modern day audio signal classification techniques lack the ability to classify low feature audio signals in the form of spectrographic temporal frequency data representations. Additionally, currently utilized techniques rely on full diverse…

Sound · Computer Science 2024-10-30 Noel Elias

Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks

Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and…

Sound · Computer Science 2025-09-16 Friedrich Wolf-Monheim

Deep Learning for Audio Signal Processing

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered…

Sound · Computer Science 2019-05-28 Hendrik Purwins , Bo Li , Tuomas Virtanen , Jan Schlüter , Shuo-yiin Chang , Tara Sainath

Neuralogram: A Deep Neural Network Based Representation for Audio Signals

We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural…

Sound · Computer Science 2019-04-11 Prateek Verma , Chris Chafe , Jonathan Berger

Audio representations for deep learning in sound synthesis: A review

The rise of deep learning algorithms has led many researchers to withdraw from using classic signal processing methods for sound generation. Deep learning models have achieved expressive voice synthesis, realistic sound textures, and…

Sound · Computer Science 2022-01-10 Anastasia Natsiou , Sean O'Leary

Explaining Spectrograms in Machine Learning: A Study on Neural Networks for Speech Classification

This study investigates discriminative patterns learned by neural networks for accurate speech classification, with a specific focus on vowel classification tasks. By examining the activations and features of neural networks for vowel…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-25 Jesin James , Balamurali B. T. , Binu Abeysinghe , Junchen Liu

Acoustic Scene Classification with Spectrogram Processing Strategies

Recently, convolutional neural networks (CNN) have achieved the state-of-the-art performance in acoustic scene classification (ASC) task. The audio data is often transformed into two-dimensional spectrogram representations, which are then…

Sound · Computer Science 2020-07-09 Helin Wang , Yuexian Zou , Dading Chong

Hypernetworks build Implicit Neural Representations of Sounds

Implicit Neural Representations (INRs) are nowadays used to represent multimedia signals across various real-life applications, including image super-resolution, image compression, or 3D rendering. Existing methods that leverage INRs are…

Machine Learning · Computer Science 2023-06-21 Filip Szatkowski , Karol J. Piczak , Przemysław Spurek , Jacek Tabor , Tomasz Trzciński

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

Recent successful applications of convolutional neural networks (CNNs) to audio classification and speech recognition have motivated the search for better input representations for more efficient training. Visual displays of an audio…

Computer Vision and Pattern Recognition · Computer Science 2017-06-23 M. Huzaifah

Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?

When convolutional neural networks are used to tackle learning problems based on music or, more generally, time series data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which…

Machine Learning · Computer Science 2018-09-20 Monika Doerfler , Thomas Grill , Roswitha Bammer , Arthur Flexer

Improving Machine Hearing on Limited Data Sets

Convolutional neural network (CNN) architectures have originated and revolutionized machine learning for images. In order to take advantage of CNNs in predictive modeling with audio data, standard FFT-based signal processing methods are…

Sound · Computer Science 2025-02-20 Pavol Harar , Roswitha Bammer , Anna Breger , Monika Dörfler , Zdenek Smekal

Investigating Map-Based Path Loss Models: A Study of Feature Representations in Convolutional Neural Networks

Path loss prediction is a beneficial tool for efficient use of the radio frequency spectrum. Building on prior research on high-resolution map-based path loss models, this paper studies convolutional neural network input representations in…

Machine Learning · Computer Science 2026-02-05 Ryan G. Dempsey , Jonathan Ethier , Halim Yanikomeroglu

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-01 Dang Thoai Phan

Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of…

Sound · Computer Science 2016-11-30 Shuhui Qu , Juncheng Li , Wei Dai , Samarjit Das

SubSpectral Normalization for Neural Audio Data Processing

Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency…

Sound · Computer Science 2021-03-26 Simyung Chang , Hyoungwoo Park , Janghoon Cho , Hyunsin Park , Sungrack Yun , Kyuwoong Hwang

SpectNet : End-to-End Audio Signal Classification Using Learnable Spectrograms

Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-18 Md. Istiaq Ansari , Taufiq Hasan

Leveraging Neural Representations for Audio Manipulation

We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-11 Scott H. Hawley , Christian J. Steinmetz