English
Related papers

Related papers: OpenACE: An Open Benchmark for Evaluating Audio Co…

200 papers

A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e.\…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-29 Yi-Chiao Wu , Israel D. Gebru , Dejan Marković , Alexander Richard

With the rise of multimodal large language models (LLMs), audio codec plays an increasingly vital role in encoding audio into discrete tokens, enabling integration of audio into text-based LLMs. Current audio codec captures two types of…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-29 Ruifan Deng , Yitian Gong , Qinghui Gao , Luozhijie Jin , Qinyuan Cheng , Zhaoye Fei , Shimin Li , Xipeng Qiu

Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-22 Michael Chinen , Felicia S. C. Lim , Jan Skoglund , Nikita Gureev , Feargus O'Gorman , Andrew Hines

Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The assessment of audio captioning systems is typically based on quantitative metrics applied to text data. Previous studies have…

Sound · Computer Science 2024-03-28 Gijs Wijngaard , Elia Formisano , Bruno L. Giordano , Michel Dumontier

Automated audio captioning aims at generating textual descriptions for an audio clip. To evaluate the quality of generated audio captions, previous works directly adopt image captioning metrics like SPICE and CIDEr, without justifying their…

Sound · Computer Science 2022-01-28 Zelin Zhou , Zhiling Zhang , Xuenan Xu , Zeyu Xie , Mengyue Wu , Kenny Q. Zhu

This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-27 Wissam A. Jassim , Jan Skoglund , Michael Chinen , Andrew Hines

The Automated Audio Captioning (AAC) task aims to describe an audio signal using natural language. To evaluate machine-generated captions, the metrics should take into account audio events, acoustic scenes, paralinguistics, signal…

Sound · Computer Science 2024-11-06 Satvik Dixit , Soham Deshmukh , Bhiksha Raj

We introduce AudioCapBench, a benchmark for evaluating audio captioning capabilities of large multimodal models. \method covers three distinct audio domains, including environmental sound, music, and speech, with 1,000 curated evaluation…

Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with…

We present a novel benchmark for voice cloning text-to-speech models. The benchmark consists of an evaluation protocol, an open-source library for assessing the performance of voice cloning models, and an accompanying leaderboard. The paper…

Computation and Language · Computer Science 2025-09-18 Iwona Christop , Tomasz Kuczyński , Marek Kubis

Recent speech and audio coding standards such as 3GPP Enhanced Voice Services match the foreseeable needs and requirements in transmission of speech and audio, when using current transmission infrastructure and applications. Trends in…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-15 Tom Bäckström

We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-19 Kuan-Po Huang , Chih-Kai Yang , Yu-Kuan Fu , Ewan Dunbar , Hung-yi Lee

Neural audio codecs (NACs) have made significant advancements in recent years and are rapidly being adopted in many audio processing pipelines. However, they can introduce audio distortions which degrade speaker verification (SV)…

Sound · Computer Science 2025-09-04 Nirmalya Mallick Thakur , Jia Qi Yip , Eng Siong Chng

Automatic audio captioning is essential for audio understanding, enabling applications such as accessibility and content indexing. However, evaluating the quality of audio captions remains a major challenge, especially in reference-free…

Sound · Computer Science 2025-12-12 Tianyu Guo , Hongyu Chen , Hao Liang , Meiyi Qiang , Bohan Zeng , Linzhuang Sun , Bin Cui , Wentao Zhang

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream relies on a model architecture composed by a fully…

Sound · Computer Science 2021-07-08 Neil Zeghidour , Alejandro Luebs , Ahmed Omran , Jan Skoglund , Marco Tagliasacchi

Speech codecs serve as a crucial bridge in unifying speech and text language models. Existing codec methods face several challenges in semantic encoding, such as residual paralinguistic information (e.g., timbre, emotion), insufficient…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-06 Chunyu Qiang , Haoyu Wang , Cheng Gong , Tianrui Wang , Ruibo Fu , Tao Wang , Ruilong Chen , Jiangyan Yi , Zhengqi Wen , Chen Zhang , Longbiao Wang , Jianwu Dang , Jianhua Tao

Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-10 Nagashree K. S. Rao , Nils Peters

Masked token prediction has emerged as a powerful pre-training objective across language, vision, and speech, offering the potential to unify these diverse modalities through a single pre-training task. However, its application for general…

Objective speech-quality metrics are widely used to assess codec performance. However, for neural codecs, it is often unclear which metrics provide reliable quality estimates. To address this, we evaluated 45 objective metrics by…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-30 Wolfgang Mack , Nezih Topaloglu , Laura Lechler , Ivana Balić , Alexandra Craciun , Mansur Yesilbursa , Kamil Wojcicki

We introduces X-ARES (eXtensive Audio Representation and Evaluation Suite), a novel open-source benchmark designed to systematically assess audio encoder performance across diverse domains. By encompassing tasks spanning speech,…

Sound · Computer Science 2025-05-28 Junbo Zhang , Heinrich Dinkel , Yadong Niu , Chenyu Liu , Si Cheng , Anbei Zhao , Jian Luan
‹ Prev 1 2 3 10 Next ›