Related papers: Spectrogram features for audio and speech analysis

Audio Spectrogram Representations for Processing with Convolutional Neural Networks

One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than…

Sound · Computer Science 2017-06-30 L. Wyse

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-01 Dang Thoai Phan

A Survey of Deep Learning for Complex Speech Spectrograms

Recent advancements in deep learning have significantly impacted the field of speech signal processing, particularly in the analysis and manipulation of complex spectrograms. This survey provides a comprehensive overview of the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-06 Yuying Xie , Zheng-Hua Tan

Neuralogram: A Deep Neural Network Based Representation for Audio Signals

We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural…

Sound · Computer Science 2019-04-11 Prateek Verma , Chris Chafe , Jonathan Berger

SpectNet : End-to-End Audio Signal Classification Using Learnable Spectrograms

Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-18 Md. Istiaq Ansari , Taufiq Hasan

Feature Learning from Spectrograms for Assessment of Personality Traits

Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Marc-André Carbonneau , Eric Granger , Yazid Attabi , Ghyslain Gagnon

Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

Speech separation has been very successful with deep learning techniques. Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for…

Sound · Computer Science 2019-04-17 Gene-Ping Yang , Chao-I Tuan , Hung-Yi Lee , Lin-shan Lee

Log Complex Color for Visual Pattern Recognition of Total Sound

While traditional audio visualization methods depict amplitude intensities vs. time, such as in a time-frequency spectrogram, and while some may use complex phase information to augment the amplitude representation, such as in a reassigned…

Sound · Computer Science 2019-07-24 Stephen Wedekind , P. Fraundorf

Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals

Multi-resolution spectro-temporal features of a speech signal represent how the brain perceives sounds by tuning cortical cells to different spectral and temporal modulations. These features produce a higher dimensional representation of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-28 Rahil Parikh , Nadee Seneviratne , Ganesh Sivaraman , Shihab Shamma , Carol Espy-Wilson

Deep Learning for Audio Signal Processing

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered…

Sound · Computer Science 2019-05-28 Hendrik Purwins , Bo Li , Tuomas Virtanen , Jan Schlüter , Shuo-yiin Chang , Tara Sainath

Speech Recognition Front End Without Information Loss

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The…

Computation and Language · Computer Science 2015-03-31 Matthew Ager , Zoran Cvetkovic , Peter Sollich

Recognition Performance of a Structured Language Model

A new language model for speech recognition inspired by linguistic analysis is presented. The model develops hidden hierarchical structure incrementally and uses it to extract meaningful information from the word history - thus enabling the…

Computation and Language · Computer Science 2007-05-23 Ciprian Chelba , Frederick Jelinek

Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks

Modern day audio signal classification techniques lack the ability to classify low feature audio signals in the form of spectrographic temporal frequency data representations. Additionally, currently utilized techniques rely on full diverse…

Sound · Computer Science 2024-10-30 Noel Elias

Time-Frequency Audio Features for Speech-Music Classification

Distinct striation patterns are observed in the spectrograms of speech and music. This motivated us to propose three novel time-frequency features for speech-music classification. These features are extracted in two stages. First, a preset…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-06 Mrinmoy Bhattacharjee , S. R. M. Prasanna , Prithwijit Guha

Applying Visual Domain Style Transfer and Texture Synthesis Techniques to Audio - Insights and Challenges

Style transfer is a technique for combining two images based on the activations and feature statistics in a deep learning neural network architecture. This paper studies the analogous task in the audio domain and takes a critical look at…

Sound · Computer Science 2020-08-10 M. Huzaifah , L. Wyse

Spectral Analysis Network for Deep Representation Learning and Image Clustering

Deep representation learning is a crucial procedure in multimedia analysis and attracts increasing attention. Most of the popular techniques rely on convolutional neural network and require a large amount of labeled data in the training…

Computer Vision and Pattern Recognition · Computer Science 2020-09-14 Jinghua Wang , Adrian Hilton , Jianmin Jiang

Dual-Quaternions: Theory and Applications in Sound

Sound is a fundamental and rich source of information; playing a key role in many areas from humanities and social sciences through to engineering and mathematics. Sound is more than just data 'signals'. It encapsulates physical, sensorial…

Sound · Computer Science 2023-03-24 Benjamin Kenwright

Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints

Audio source separation is often achieved by estimating the magnitude spectrogram of each source, and then applying a phase recovery (or spectrogram inversion) algorithm to retrieve time-domain signals. Typically, spectrogram inversion is…

Sound · Computer Science 2023-07-03 Paul Magron , Tuomas Virtanen

Images that Sound: Composing Images and Sounds on a Single Canvas

Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-06 Ziyang Chen , Daniel Geng , Andrew Owens

Explaining Spectrograms in Machine Learning: A Study on Neural Networks for Speech Classification

This study investigates discriminative patterns learned by neural networks for accurate speech classification, with a specific focus on vowel classification tasks. By examining the activations and features of neural networks for vowel…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-25 Jesin James , Balamurali B. T. , Binu Abeysinghe , Junchen Liu