English
Related papers

Related papers: libACA, pyACA, and ACA-Code: Audio Content Analysi…

200 papers

Preprint for a book chapter introducing Audio Content Analysis. With a focus on Music Information Retrieval systems, this chapter defines musical audio content, introduces the general process of audio content analysis, and surveys basic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-01-05 Alexander Lerch

Digital audio processing tools offer music researchers the opportunity to examine both non-notated music and music as performance. This chapter summarises the types of information that can be extracted from audio as well as currently…

Sound · Computer Science 2021-11-10 Johanna Devaney

With the rise of multimodal large language models (LLMs), audio codec plays an increasingly vital role in encoding audio into discrete tokens, enabling integration of audio into text-based LLMs. Current audio codec captures two types of…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-29 Ruifan Deng , Yitian Gong , Qinghui Gao , Luozhijie Jin , Qinyuan Cheng , Zhaoye Fei , Shimin Li , Xipeng Qiu

We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented…

Sound · Computer Science 2019-05-08 Robin Scheibler , Eric Bezzam , Ivan Dokmanić

Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret,…

Computation and Language · Computer Science 2023-06-28 Robert Chew , John Bollenbacher , Michael Wenger , Jessica Speer , Annice Kim

Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened…

Sound · Computer Science 2024-06-26 Jizhong Liu , Gang Li , Junbo Zhang , Heinrich Dinkel , Yongqing Wang , Zhiyong Yan , Yujun Wang , Bin Wang

Auditory scene analysis (ASA) aims to retrieve information from the acoustic environment, by carrying out three main tasks: sound source location, separation, and classification. These tasks are traditionally executed with a linear data…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-21 Caleb Rascon , Luis Gato-Diaz , Eduardo García-Alarcón

A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved…

Challenges in managing linguistic diversity and integrating various musical modalities are faced by current music information retrieval systems. These limitations reduce their effectiveness in a global, multimodal music environment. To…

A major challenge for video captioning is to combine audio and visual cues. Existing multi-modal fusion methods have shown encouraging results in video understanding. However, the temporal structures of multiple modalities at different…

Computation and Language · Computer Science 2018-04-17 Xin Wang , Yuan-Fang Wang , William Yang Wang

A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e.\…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-29 Yi-Chiao Wu , Israel D. Gebru , Dejan Marković , Alexander Richard

This paper presents a method for detecting mispronunciations with the aim of improving Computer Assisted Language Learning (CALL) tools used by foreign language learners. The algorithm is based on Principle Component Analysis (PCA). It is…

Sound · Computer Science 2016-02-29 Zhenhao Ge , Sudhendu R. Sharma , Mark J. T. Smith

The Automated Audio Captioning (AAC) task asks models to generate natural language descriptions of an audio input. Evaluating these machine-generated audio captions is a complex task that requires considering diverse factors, among them,…

Computation and Language · Computer Science 2025-08-12 Tsung-Han Wu , Joseph E. Gonzalez , Trevor Darrell , David M. Chan

pyAMPACT (Python-based Automatic Music Performance Analysis and Comparison Toolkit) links symbolic and audio music representations to facilitate score-informed estimation of performance data in audio as well as general linking of symbolic…

Sound · Computer Science 2026-01-06 Johanna Devaney , Daniel McKemie , Alex Morgan

The analysis, processing, and extraction of meaningful information from sounds all around us is the subject of the broader area of audio analytics. Audio captioning is a recent addition to the domain of audio analytics, a cross-modal…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-04 Sandeep Kothinti , Dimitra Emmanouilidou

Audio data are widely exchanged over telecommunications networks. Due to the limitations of network resources, these data are typically compressed before transmission. Various methods are available for compressing audio data. To access such…

Multimedia · Computer Science 2025-02-12 Farzane Jafari

Automated audio captioning (AAC) has developed rapidly in recent years, involving acoustic signal processing and natural language processing to generate human-readable sentences for audio clips. The current models are generally based on the…

Sound · Computer Science 2021-10-13 Zhongjie Ye , Helin Wang , Dongchao Yang , Yuexian Zou

In this paper, we propose SemanticAC, a semantics-assisted framework for Audio Classification to better leverage the semantic information. Unlike conventional audio classification methods that treat class labels as discrete vectors, we…

Sound · Computer Science 2023-02-14 Yicheng Xiao , Yue Ma , Shuyan Li , Hantao Zhou , Ran Liao , Xiu Li

Neural audio codecs (NACs) provide compact representations that can be leveraged in many downstream applications, in particular large language models. Yet most NACs encode mixtures of multiple sources in an entangled manner, which may…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-21 Ryo Aihara , Yoshiki Masuyama , Francesco Paissan , François G. Germain , Gordon Wichern , Jonathan Le Roux

To address the calibration and procedural challenges inherent in remote audiogram assessment for rehabilitative audiology, this study investigated whether calibration-independent adaptive categorical loudness scaling (ACALOS) data can be…

Sound · Computer Science 2026-04-07 Chen Xu , Lena Schell-Majoor , Birger Kollmeier
‹ Prev 1 2 3 10 Next ›