English
Related papers

Related papers: Audio Spectrogram Representations for Processing w…

200 papers

Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivator for spectrogram-based representations was…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-17 Ian McLoughlin , Lam Pham , Yan Song , Xiaoxiao Miao , Huy Phan , Pengfei Cai , Qing Gu , Jiang Nan , Haoyu Song , Donny Soh

Style transfer is a technique for combining two images based on the activations and feature statistics in a deep learning neural network architecture. This paper studies the analogous task in the audio domain and takes a critical look at…

Sound · Computer Science 2020-08-10 M. Huzaifah , L. Wyse

There has been fascinating work on creating artistic transformations of images by Gatys. This was revolutionary in how we can in some sense alter the 'style' of an image while generally preserving its 'content'. In our work, we present a…

Sound · Computer Science 2024-12-24 Prateek Verma , Julius O. Smith

Modern day audio signal classification techniques lack the ability to classify low feature audio signals in the form of spectrographic temporal frequency data representations. Additionally, currently utilized techniques rely on full diverse…

Sound · Computer Science 2024-10-30 Noel Elias

Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and…

Sound · Computer Science 2025-09-16 Friedrich Wolf-Monheim

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered…

Sound · Computer Science 2019-05-28 Hendrik Purwins , Bo Li , Tuomas Virtanen , Jan Schlüter , Shuo-yiin Chang , Tara Sainath

We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural…

Sound · Computer Science 2019-04-11 Prateek Verma , Chris Chafe , Jonathan Berger

The rise of deep learning algorithms has led many researchers to withdraw from using classic signal processing methods for sound generation. Deep learning models have achieved expressive voice synthesis, realistic sound textures, and…

Sound · Computer Science 2022-01-10 Anastasia Natsiou , Sean O'Leary

This study investigates discriminative patterns learned by neural networks for accurate speech classification, with a specific focus on vowel classification tasks. By examining the activations and features of neural networks for vowel…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-25 Jesin James , Balamurali B. T. , Binu Abeysinghe , Junchen Liu

Recently, convolutional neural networks (CNN) have achieved the state-of-the-art performance in acoustic scene classification (ASC) task. The audio data is often transformed into two-dimensional spectrogram representations, which are then…

Sound · Computer Science 2020-07-09 Helin Wang , Yuexian Zou , Dading Chong

Implicit Neural Representations (INRs) are nowadays used to represent multimedia signals across various real-life applications, including image super-resolution, image compression, or 3D rendering. Existing methods that leverage INRs are…

Machine Learning · Computer Science 2023-06-21 Filip Szatkowski , Karol J. Piczak , Przemysław Spurek , Jacek Tabor , Tomasz Trzciński

Recent successful applications of convolutional neural networks (CNNs) to audio classification and speech recognition have motivated the search for better input representations for more efficient training. Visual displays of an audio…

Computer Vision and Pattern Recognition · Computer Science 2017-06-23 M. Huzaifah

When convolutional neural networks are used to tackle learning problems based on music or, more generally, time series data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which…

Machine Learning · Computer Science 2018-09-20 Monika Doerfler , Thomas Grill , Roswitha Bammer , Arthur Flexer

Convolutional neural network (CNN) architectures have originated and revolutionized machine learning for images. In order to take advantage of CNNs in predictive modeling with audio data, standard FFT-based signal processing methods are…

Sound · Computer Science 2025-02-20 Pavol Harar , Roswitha Bammer , Anna Breger , Monika Dörfler , Zdenek Smekal

Path loss prediction is a beneficial tool for efficient use of the radio frequency spectrum. Building on prior research on high-resolution map-based path loss models, this paper studies convolutional neural network input representations in…

Machine Learning · Computer Science 2026-02-05 Ryan G. Dempsey , Jonathan Ethier , Halim Yanikomeroglu

Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-01 Dang Thoai Phan

One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of…

Sound · Computer Science 2016-11-30 Shuhui Qu , Juncheng Li , Wei Dai , Samarjit Das

Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency…

Sound · Computer Science 2021-03-26 Simyung Chang , Hyoungwoo Park , Janghoon Cho , Hyunsin Park , Sungrack Yun , Kyuwoong Hwang

Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-18 Md. Istiaq Ansari , Taufiq Hasan

We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-11 Scott H. Hawley , Christian J. Steinmetz
‹ Prev 1 2 3 10 Next ›