English
Related papers

Related papers: ASM: Audio Spectrogram Mixer

200 papers

Respiratory sound analysis is a crucial tool for screening asthma and other pulmonary pathologies, yet traditional auscultation remains subjective and experience-dependent. Our prior research established a CNN baseline using DenseNet201,…

Sound · Computer Science 2026-01-21 Theodore Aptekarev , Vladimir Sokolovsky , Gregory Furman

Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes. In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs…

Sound · Computer Science 2023-03-21 Wentao Zhu , Mohamed Omar

We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST). Given an input audio spectrogram, we first patchify…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-19 Sreyan Ghosh , Ashish Seth , S. Umesh , Dinesh Manocha

Recently, MLP structures have regained popularity, with MLP-Mixer standing out as a prominent example. In the field of computer vision, MLP-Mixer is noted for its ability to extract data information from both channel and token perspectives,…

Machine Learning · Computer Science 2024-03-05 Qingfeng Ji , Yuxin Wang , Letong Sun

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels. To…

Sound · Computer Science 2021-07-12 Yuan Gong , Yu-An Chung , James Glass

The rapid advancement of artificial intelligence (AI) has enabled sophisticated audio generation and voice cloning technologies, posing significant security risks for applications reliant on voice authentication. While existing datasets and…

Sound · Computer Science 2025-05-22 Kunyang Huang , Bin Hu

Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer powerful feature extraction capabilities, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-30 Atakan Işık , Selin Vulga Işık , Ahmet Feridun Işık , Mahşuk Taylan

Audio classification models, particularly the Audio Spectrogram Transformer (AST), play a crucial role in efficient audio analysis. However, optimizing their efficiency without compromising accuracy remains a challenge. In this paper, we…

Sound · Computer Science 2024-06-13 Swarup Ranjan Behera , Abhishek Dhiman , Karthik Gowda , Aalekhya Satya Narayani

Recently, neural networks based purely on self-attention, such as the Vision Transformer (ViT), have been shown to outperform deep learning models constructed with convolutional neural networks (CNNs) on various vision tasks, thus extending…

Sound · Computer Science 2022-02-14 Yuan Gong , Cheng-I Jeff Lai , Yu-An Chung , James Glass

In audio classification, developing efficient and robust models is critical for real-time applications. Inspired by the design principles of MobileViT, we present FAST (Fast Audio Spectrogram Transformer), a new architecture that combines…

Sound · Computer Science 2025-04-21 Anugunj Naman , Gaibo Zhang

Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-10 Yu-Wen Chen , Kuo-Hsuan Hung , Shang-Yi Chuang , Jonathan Sherman , Xugang Lu , Yu Tsao

Over the last two decades, language modeling has experienced a shift from the use of predominantly recurrent architectures that process tokens sequentially during training and inference to non-recurrent models that process sequence elements…

Computation and Language · Computer Science 2026-05-20 Benjamin L. Badger

In multi-channel speech enhancement and robust automatic speech recognition (ASR), beamforming can typically improve the signal-to-noise ratio (SNR) of the target speaker and produce reliable enhancement with little distortion to target…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-22 Zhong-Qiu Wang , Ruizhe Pang

Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-30 Sangmin Bae , June-Woo Kim , Won-Yang Cho , Hyerim Baek , Soyoun Son , Byungjo Lee , Changwan Ha , Kyongpil Tae , Sungnyun Kim , Se-Young Yun

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

Transformers have rapidly overtaken CNN-based architectures as the new standard in audio classification. Transformer-based models, such as the Audio Spectrogram Transformers (AST), also inherit the fixed-size input paradigm from CNNs.…

Sound · Computer Science 2024-07-12 Jiu Feng , Mehmet Hamza Erol , Joon Son Chung , Arda Senocak

Transformers have become central to recent advances in audio classification. However, training an audio spectrogram transformer, e.g. AST, from scratch can be resource and time-intensive. Furthermore, the complexity of transformers heavily…

Sound · Computer Science 2024-01-17 Jiu Feng , Mehmet Hamza Erol , Joon Son Chung , Arda Senocak

Connecting audio encoders with large language models (LLMs) allows the LLM to perform various audio understanding tasks, such as automatic speech recognition (ASR) and audio captioning (AC). Most research focuses on training an adapter…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-22 Weiqiao Shan , Yuang Li , Yuhao Zhang , Yingfeng Luo , Chen Xu , Xiaofeng Zhao , Long Meng , Yunfei Lu , Min Zhang , Hao Yang , Tong Xiao , Jingbo Zhu

Autism Spectrum Disorder (ASD) is a complex neuro-developmental challenge, presenting a spectrum of difficulties in social interaction, communication, and the expression of repetitive behaviors in different situations. This increasing…

Machine Learning · Computer Science 2025-06-16 Mohd Mujtaba Akhtar , Girish , Muskaan Singh , Orchid Chetia Phukan

Deep learning architectures have made significant progress in terms of performance in many research areas. The automatic speech recognition (ASR) field has thus benefited from these scientific and technological advances, particularly for…

Sound · Computer Science 2024-03-01 Quentin Raymondaud , Mickael Rouvier , Richard Dufour
‹ Prev 1 2 3 10 Next ›