English
Related papers

Related papers: Spectrogram features for audio and speech analysis

200 papers

One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than…

Sound · Computer Science 2017-06-30 L. Wyse

Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-01 Dang Thoai Phan

Recent advancements in deep learning have significantly impacted the field of speech signal processing, particularly in the analysis and manipulation of complex spectrograms. This survey provides a comprehensive overview of the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-06 Yuying Xie , Zheng-Hua Tan

We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural…

Sound · Computer Science 2019-04-11 Prateek Verma , Chris Chafe , Jonathan Berger

Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-18 Md. Istiaq Ansari , Taufiq Hasan

Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Marc-André Carbonneau , Eric Granger , Yazid Attabi , Ghyslain Gagnon

Speech separation has been very successful with deep learning techniques. Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for…

Sound · Computer Science 2019-04-17 Gene-Ping Yang , Chao-I Tuan , Hung-Yi Lee , Lin-shan Lee

While traditional audio visualization methods depict amplitude intensities vs. time, such as in a time-frequency spectrogram, and while some may use complex phase information to augment the amplitude representation, such as in a reassigned…

Sound · Computer Science 2019-07-24 Stephen Wedekind , P. Fraundorf

Multi-resolution spectro-temporal features of a speech signal represent how the brain perceives sounds by tuning cortical cells to different spectral and temporal modulations. These features produce a higher dimensional representation of…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-28 Rahil Parikh , Nadee Seneviratne , Ganesh Sivaraman , Shihab Shamma , Carol Espy-Wilson

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered…

Sound · Computer Science 2019-05-28 Hendrik Purwins , Bo Li , Tuomas Virtanen , Jan Schlüter , Shuo-yiin Chang , Tara Sainath

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The…

Computation and Language · Computer Science 2015-03-31 Matthew Ager , Zoran Cvetkovic , Peter Sollich

A new language model for speech recognition inspired by linguistic analysis is presented. The model develops hidden hierarchical structure incrementally and uses it to extract meaningful information from the word history - thus enabling the…

Computation and Language · Computer Science 2007-05-23 Ciprian Chelba , Frederick Jelinek

Modern day audio signal classification techniques lack the ability to classify low feature audio signals in the form of spectrographic temporal frequency data representations. Additionally, currently utilized techniques rely on full diverse…

Sound · Computer Science 2024-10-30 Noel Elias

Distinct striation patterns are observed in the spectrograms of speech and music. This motivated us to propose three novel time-frequency features for speech-music classification. These features are extracted in two stages. First, a preset…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-06 Mrinmoy Bhattacharjee , S. R. M. Prasanna , Prithwijit Guha

Style transfer is a technique for combining two images based on the activations and feature statistics in a deep learning neural network architecture. This paper studies the analogous task in the audio domain and takes a critical look at…

Sound · Computer Science 2020-08-10 M. Huzaifah , L. Wyse

Deep representation learning is a crucial procedure in multimedia analysis and attracts increasing attention. Most of the popular techniques rely on convolutional neural network and require a large amount of labeled data in the training…

Computer Vision and Pattern Recognition · Computer Science 2020-09-14 Jinghua Wang , Adrian Hilton , Jianmin Jiang

Sound is a fundamental and rich source of information; playing a key role in many areas from humanities and social sciences through to engineering and mathematics. Sound is more than just data 'signals'. It encapsulates physical, sensorial…

Sound · Computer Science 2023-03-24 Benjamin Kenwright

Audio source separation is often achieved by estimating the magnitude spectrogram of each source, and then applying a phase recovery (or spectrogram inversion) algorithm to retrieve time-domain signals. Typically, spectrogram inversion is…

Sound · Computer Science 2023-07-03 Paul Magron , Tuomas Virtanen

Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-06 Ziyang Chen , Daniel Geng , Andrew Owens

This study investigates discriminative patterns learned by neural networks for accurate speech classification, with a specific focus on vowel classification tasks. By examining the activations and features of neural networks for vowel…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-25 Jesin James , Balamurali B. T. , Binu Abeysinghe , Junchen Liu
‹ Prev 1 2 3 10 Next ›