English
Related papers

Related papers: FAST: Fast Audio Spectrogram Transformer

200 papers

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels. To…

Sound · Computer Science 2021-07-12 Yuan Gong , Yu-An Chung , James Glass

Transformers have rapidly overtaken CNN-based architectures as the new standard in audio classification. Transformer-based models, such as the Audio Spectrogram Transformers (AST), also inherit the fixed-size input paradigm from CNNs.…

Sound · Computer Science 2024-07-12 Jiu Feng , Mehmet Hamza Erol , Joon Son Chung , Arda Senocak

Audio classification models, particularly the Audio Spectrogram Transformer (AST), play a crucial role in efficient audio analysis. However, optimizing their efficiency without compromising accuracy remains a challenge. In this paper, we…

Sound · Computer Science 2024-06-13 Swarup Ranjan Behera , Abhishek Dhiman , Karthik Gowda , Aalekhya Satya Narayani

Recently, neural networks based purely on self-attention, such as the Vision Transformer (ViT), have been shown to outperform deep learning models constructed with convolutional neural networks (CNNs) on various vision tasks, thus extending…

Sound · Computer Science 2022-02-14 Yuan Gong , Cheng-I Jeff Lai , Yu-An Chung , James Glass

Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows…

Sound · Computer Science 2024-08-08 Sheng Kuang , Jie Shi , Kiki van der Heijden , Siamak Mehrkanoon

Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes. In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs…

Sound · Computer Science 2023-03-21 Wentao Zhu , Mohamed Omar

In recent years, Sound AI is being increasingly used to predict machine failures. By attaching a microphone to the machine of interest, one can get real time data on machine behavior from the field. Traditionally, Convolutional Neural Net…

Sound · Computer Science 2026-04-15 Kiran Voderhobli Holla

The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers…

Sound · Computer Science 2023-10-25 Florian Schmid , Khaled Koutini , Gerhard Widmer

Respiratory sound analysis is a crucial tool for screening asthma and other pulmonary pathologies, yet traditional auscultation remains subjective and experience-dependent. Our prior research established a CNN baseline using DenseNet201,…

Sound · Computer Science 2026-01-21 Theodore Aptekarev , Vladimir Sokolovsky , Gregory Furman

We propose an accurate and efficient scene text detection framework, termed FAST (i.e., faster arbitrarily-shaped text detector). Different from recent advanced text detectors that used complicated post-processing and hand-crafted network…

Computer Vision and Pattern Recognition · Computer Science 2023-01-12 Zhe Chen , Jiahao Wang , Wenhai Wang , Guo Chen , Enze Xie , Ping Luo , Tong Lu

The rapid advancement of generative artificial intelligence has spurred innovative approaches to semantic communication, giving rise to a new paradigm known as generative semantic communication (GSC). The integration of flexible cross-modal…

Signal Processing · Electrical Eng. & Systems 2025-11-03 Yiru Wang , Wanting Yang , Fangli Mou , Zehui Xiong , Zide Fan , Shiwen Mao , Tony Q. S. Quek

Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach. They only train a few extra parameters for each downstream task, without sacrificing performance and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-16 Umberto Cappellazzo , Daniele Falavigna , Alessio Brutti , Mirco Ravanelli

Traffic congestion remains a pressing urban challenge, requiring intelligent transportation systems for real-time management. We present a hybrid framework that combines deep learning and reinforcement learning for acoustic vehicle speed…

Sound · Computer Science 2025-09-03 Yuli Zhang , Pengfei Fan , Ruiyuan Jiang , Hankang Gu , Dongyao Jia , Xinheng Wang

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a…

Over the past two decades, CNN architectures have produced compelling models of sound perception and cognition, learning hierarchical organizations of features. Analogous to successes in computer vision, audio feature classification can be…

Sound · Computer Science 2025-05-13 Prateek Verma , Jonathan Berger

This paper presents a context-aware framework for feature selection and classification procedures to realize a fast and accurate audio event annotation and classification. The context-aware design starts with exploring feature extraction…

Sound · Computer Science 2023-03-08 M. Mehrdad Morsali , Hoda Mohammadzade , Saeed Bagheri Shouraki

Audio Sentiment Analysis is a popular research area which extends the conventional text-based sentiment analysis to depend on the effectiveness of acoustic features extracted from speech. However, current progress on audio sentiment…

Audio and Speech Processing · Electrical Eng. & Systems 2019-08-01 Feiyang Chen , Ziqian Luo

Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as…

Sound · Computer Science 2023-06-26 Florian Schmid , Khaled Koutini , Gerhard Widmer

Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-13 Shehzeen Hussain , Mojan Javaheripi , Paarth Neekhara , Ryan Kastner , Farinaz Koushanfar

FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset. A number of variants of FullSubNet have been proposed, but…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-08 Xiang Hao , Xiaofei Li
‹ Prev 1 2 3 10 Next ›