English
Related papers

Related papers: Fast Audio Codec Identification Using Overlapping …

200 papers

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modeling techniques to audio data. However, audio codecs often…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-19 Edresson Casanova , Ryan Langman , Paarth Neekhara , Shehzeen Hussain , Jason Li , Subhankar Ghosh , Ante Jukić , Sang-gil Lee

We frame the problem of selecting an optimal audio encoding scheme as a supervised learning task. Through uniform convergence theory, we guarantee approximately optimal codec selection while controlling for selection bias. We present…

Sound · Computer Science 2018-12-20 Clayton Sanford , Cyrus Cousins , Eli Upfal

Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency. Researchers recently discovered the potential of codecs as suitable tokenizers for converting continuous audio into…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-21 Haibin Wu , Xuanjun Chen , Yi-Cheng Lin , Kai-wei Chang , Ho-Lam Chung , Alexander H. Liu , Hung-yi Lee

Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation. Its interest has recently been demonstrated in computer…

Sound · Computer Science 2023-11-23 Ismail Khalfaoui-Hassani , Timothée Masquelier , Thomas Pellegrini

With the rise of multimodal large language models (LLMs), audio codec plays an increasingly vital role in encoding audio into discrete tokens, enabling integration of audio into text-based LLMs. Current audio codec captures two types of…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-29 Ruifan Deng , Yitian Gong , Qinghui Gao , Luozhijie Jin , Qinyuan Cheng , Zhaoye Fei , Shimin Li , Xipeng Qiu

Sparse coding is an unsupervised learning algorithm that learns a succinct high-level representation of the inputs given only unlabeled data; it represents each input as a sparse linear combination of a set of basis functions. Originally…

Machine Learning · Computer Science 2012-06-26 Roger Grosse , Rajat Raina , Helen Kwong , Andrew Y. Ng

Xampling generalizes compressed sensing (CS) to reduced-rate sampling of analog signals. A unified framework is introduced for low rate sampling and processing of signals lying in a union of subspaces. Xampling consists of two main blocks:…

Information Theory · Computer Science 2015-03-19 Moshe Mishali , Yonina C. Eldar

We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-06 Yanxiong Li , Wenchang Cao , Wei Xie , Qisheng Huang , Wenfeng Pang , Qianhua He

Large Audio Language Models (LALMs) demonstrate impressive performance across diverse tasks, ranging from speech recognition to general audio understanding. However, their scalability is limited by the quadratic complexity of attention and…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-27 Saurabhchand Bhati , Samuel Thomas , Hilde Kuehne , Rogerio Feris , James Glass

Phonetic speech transcription is crucial for fine-grained linguistic analysis and downstream speech applications. While Connectionist Temporal Classification (CTC) is a widely used approach for such tasks due to its efficiency, it often…

Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor…

Sound · Computer Science 2023-10-18 Wenzhe Liu , Wei Xiao , Meng Wang , Shan Yang , Yupeng Shi , Yuyong Kang , Dan Su , Shidong Shang , Dong Yu

One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less…

Sound · Computer Science 2019-04-11 Hongwei Song , Jiqing Han , Shiwen Deng

Most widely-used modern audio codecs, such as Ogg Vorbis and MP3, as well as more recent "neural" codecs like Meta's Encodec or the Descript Audio Codec are based on block-coding; audio is divided into overlapping, fixed-size "frames" which…

Sound · Computer Science 2025-05-12 John Vinyard

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly \emph{fully}…

Sound · Computer Science 2020-05-08 Zhuo Chen , Takuya Yoshioka , Liang Lu , Tianyan Zhou , Zhong Meng , Yi Luo , Jian Wu , Xiong Xiao , Jinyu Li

Identifying acoustic events from a continuously streaming audio source is of interest for many applications including environmental monitoring for basic research. In this scenario neither different event classes are known nor what…

Computer Vision and Pattern Recognition · Computer Science 2017-12-12 Matthias Meyer , Jan Beutel , Lothar Thiele

Identification of bird species from audio records is one of the challenging tasks due to the existence of multiple species in the same recording, noise in the background, and long-term recording. Besides, choosing a proper acoustic feature…

Sound · Computer Science 2022-01-04 Nahian Ibn Hasan

Segmenting audio into homogeneous sections such as music and speech helps us understand the content of audio. It is useful as a pre-processing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Deep…

Frequently misclassified pairs of classes that share many common acoustic properties exist in acoustic scene classification (ASC). To distinguish such pairs of classes, trivial details scattered throughout the data could be vital clues.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-10 Hye-jin Shim , Jee-weon Jung , Ju-ho Kim , Ha-jin Yu

We introduce BANC, a neural binaural audio codec designed for efficient speech compression in single and two-speaker scenarios while preserving the spatial location information of each speaker. Our key contributions are as follows: 1) The…

Sound · Computer Science 2024-11-26 Anton Ratnarajah , Shi-Xiong Zhang , Dong Yu

Complex-valued processing brought deep learning-based speech enhancement and signal extraction to a new level. Typically, the noise reduction process is based on a time-frequency (TF) mask which is applied to a noisy spectrogram. Complex…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-24 Hendrik Schröter , Tobias Rosenkranz , Alberto N. Escalante-B. , Andreas Maier
‹ Prev 1 2 3 10 Next ›