Related papers: Evaluating Compositional Structure in Audio Repres…

Measuring Compositionality in Representation Learning

Many machine learning algorithms represent input data with vector embeddings or discrete codes. When inputs exhibit compositional structure (e.g. objects built from parts or procedures from subroutines), it is natural to ask whether this…

Machine Learning · Computer Science 2019-04-09 Jacob Andreas

Compositional Audio Representation Learning

Human auditory perception is compositional in nature -- we identify auditory streams from auditory scenes with multiple sound events. However, such auditory scenes are typically represented using clip-level representations that do not…

Sound · Computer Science 2025-03-04 Sripathi Sridhar , Mark Cartwright

How do Transformer Embeddings Represent Compositions? A Functional Analysis

Compositionality is a key aspect of human intelligence, essential for reasoning and generalization. While transformer-based models have become the de facto standard for many language modeling tasks, little is known about how they represent…

Computation and Language · Computer Science 2025-06-03 Aishik Nagar , Ishaan Singh Rawal , Mansi Dhanania , Cheston Tan

A Survey on Compositional Learning of AI Models: Theoretical and Experimental Practices

Compositional learning, mastering the ability to combine basic concepts and construct more intricate ones, is crucial for human cognition, especially in human language comprehension and visual perception. This notion is tightly connected to…

Artificial Intelligence · Computer Science 2024-11-22 Sania Sinha , Tanawan Premsri , Parisa Kordjamshidi

What makes Models Compositional? A Theoretical View: With Supplement

Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often…

Machine Learning · Computer Science 2024-05-07 Parikshit Ram , Tim Klinger , Alexander G. Gray

A Complexity-Based Theory of Compositionality

Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought, language, and higher-level reasoning. In AI, compositional representations can enable a powerful form of out-of-distribution…

Computation and Language · Computer Science 2025-06-04 Eric Elmoznino , Thomas Jiralerspong , Yoshua Bengio , Guillaume Lajoie

A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification

One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less…

Sound · Computer Science 2019-04-11 Hongwei Song , Jiqing Han , Shiwen Deng

Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Whie Jung , Jaehoon Yoo , Sungjin Ahn , Seunghoon Hong

Contextual Joint Factor Acoustic Embeddings

Embedding acoustic information into fixed length representations is of interest for a whole range of applications in speech and audio technology. Two novel unsupervised approaches to generate acoustic embeddings by modelling of acoustic…

Computation and Language · Computer Science 2021-02-08 Yanpei Shi , Thomas Hain

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that: across 7 architectures…

Computation and Language · Computer Science 2023-05-17 Zixian Ma , Jerry Hong , Mustafa Omer Gul , Mona Gandhi , Irena Gao , Ranjay Krishna

ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The assessment of audio captioning systems is typically based on quantitative metrics applied to text data. Previous studies have…

Sound · Computer Science 2024-03-28 Gijs Wijngaard , Elia Formisano , Bruno L. Giordano , Michel Dumontier

What You Hear Is What You See: Audio Quality Metrics From Image Quality Metrics

In this study, we investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms. The encouraging outcome of the proposed approach is based on the…

Sound · Computer Science 2023-08-31 Tashi Namgyal , Alexander Hepburn , Raul Santos-Rodriguez , Valero Laparra , Jesus Malo

X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance

We introduces X-ARES (eXtensive Audio Representation and Evaluation Suite), a novel open-source benchmark designed to systematically assess audio encoder performance across diverse domains. By encompassing tasks spanning speech,…

Sound · Computer Science 2025-05-28 Junbo Zhang , Heinrich Dinkel , Yadong Niu , Chenyu Liu , Si Cheng , Anbei Zhao , Jian Luan

A Deep Representation for Invariance And Music Classification

Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose…

Sound · Computer Science 2016-11-17 Chiyuan Zhang , Georgios Evangelopoulos , Stephen Voinea , Lorenzo Rosasco , Tomaso Poggio

Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics

The analysis, processing, and extraction of meaningful information from sounds all around us is the subject of the broader area of audio analytics. Audio captioning is a recent addition to the domain of audio analytics, a cross-modal…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-04 Sandeep Kothinti , Dimitra Emmanouilidou

Play It Back: Iterative Attention for Audio Recognition

A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time. Humans attempting to discriminate between fine-grained audio categories, often replay the same discriminative…

Sound · Computer Science 2023-03-14 Alexandros Stergiou , Dima Damen

Machine Learning Framework for Audio-Based Equipment Condition Monitoring: A Comparative Study of Classification Algorithms

Audio-based equipment condition monitoring suffers from a lack of standardized methodologies for algorithm selection, hindering reproducible research. This paper addresses this gap by introducing a comprehensive framework for the systematic…

Machine Learning · Computer Science 2026-03-20 Srijesh Pillai , Yodhin Agarwal , Zaheeruddin Ahmed

Evaluation of Audio Compression Codecs

Perceptual quality of audio is the combination of aural accuracy and listener-perceived sound fidelity. It is how humans respond to the accuracy, intelligibility, and fidelity of aural media. Today this fidelity is also heavily influenced…

Sound · Computer Science 2026-03-12 Thien T. Duong , Jan P. Springer

Automated Audio Captioning using Audio Event Clues

Audio captioning is an important research area that aims to generate meaningful descriptions for audio clips. Most of the existing research extracts acoustic features of audio clips as input to encoder-decoder and transformer architectures…

Sound · Computer Science 2022-04-20 Ayşegül Özkaya Eren , Mustafa Sert

Geometry of Compositionality

This paper proposes a simple test for compositionality (i.e., literal usage) of a word or phrase in a context-specific way. The test is computationally simple, relying on no external resources and only uses a set of trained word vectors.…

Computation and Language · Computer Science 2016-11-30 Hongyu Gong , Suma Bhat , Pramod Viswanath