English
Related papers

Related papers: Evaluating Compositional Structure in Audio Repres…

200 papers

Many machine learning algorithms represent input data with vector embeddings or discrete codes. When inputs exhibit compositional structure (e.g. objects built from parts or procedures from subroutines), it is natural to ask whether this…

Machine Learning · Computer Science 2019-04-09 Jacob Andreas

Human auditory perception is compositional in nature -- we identify auditory streams from auditory scenes with multiple sound events. However, such auditory scenes are typically represented using clip-level representations that do not…

Sound · Computer Science 2025-03-04 Sripathi Sridhar , Mark Cartwright

Compositionality is a key aspect of human intelligence, essential for reasoning and generalization. While transformer-based models have become the de facto standard for many language modeling tasks, little is known about how they represent…

Computation and Language · Computer Science 2025-06-03 Aishik Nagar , Ishaan Singh Rawal , Mansi Dhanania , Cheston Tan

Compositional learning, mastering the ability to combine basic concepts and construct more intricate ones, is crucial for human cognition, especially in human language comprehension and visual perception. This notion is tightly connected to…

Artificial Intelligence · Computer Science 2024-11-22 Sania Sinha , Tanawan Premsri , Parisa Kordjamshidi

Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often…

Machine Learning · Computer Science 2024-05-07 Parikshit Ram , Tim Klinger , Alexander G. Gray

Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought, language, and higher-level reasoning. In AI, compositional representations can enable a powerful form of out-of-distribution…

Computation and Language · Computer Science 2025-06-04 Eric Elmoznino , Thomas Jiralerspong , Yoshua Bengio , Guillaume Lajoie

One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less…

Sound · Computer Science 2019-04-11 Hongwei Song , Jiqing Han , Shiwen Deng

Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Whie Jung , Jaehoon Yoo , Sungjin Ahn , Seunghoon Hong

Embedding acoustic information into fixed length representations is of interest for a whole range of applications in speech and audio technology. Two novel unsupervised approaches to generate acoustic embeddings by modelling of acoustic…

Computation and Language · Computer Science 2021-02-08 Yanpei Shi , Thomas Hain

A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that: across 7 architectures…

Computation and Language · Computer Science 2023-05-17 Zixian Ma , Jerry Hong , Mustafa Omer Gul , Mona Gandhi , Irena Gao , Ranjay Krishna

Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The assessment of audio captioning systems is typically based on quantitative metrics applied to text data. Previous studies have…

Sound · Computer Science 2024-03-28 Gijs Wijngaard , Elia Formisano , Bruno L. Giordano , Michel Dumontier

In this study, we investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms. The encouraging outcome of the proposed approach is based on the…

Sound · Computer Science 2023-08-31 Tashi Namgyal , Alexander Hepburn , Raul Santos-Rodriguez , Valero Laparra , Jesus Malo

We introduces X-ARES (eXtensive Audio Representation and Evaluation Suite), a novel open-source benchmark designed to systematically assess audio encoder performance across diverse domains. By encompassing tasks spanning speech,…

Sound · Computer Science 2025-05-28 Junbo Zhang , Heinrich Dinkel , Yadong Niu , Chenyu Liu , Si Cheng , Anbei Zhao , Jian Luan

Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose…

The analysis, processing, and extraction of meaningful information from sounds all around us is the subject of the broader area of audio analytics. Audio captioning is a recent addition to the domain of audio analytics, a cross-modal…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-04 Sandeep Kothinti , Dimitra Emmanouilidou

A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time. Humans attempting to discriminate between fine-grained audio categories, often replay the same discriminative…

Sound · Computer Science 2023-03-14 Alexandros Stergiou , Dima Damen

Audio-based equipment condition monitoring suffers from a lack of standardized methodologies for algorithm selection, hindering reproducible research. This paper addresses this gap by introducing a comprehensive framework for the systematic…

Machine Learning · Computer Science 2026-03-20 Srijesh Pillai , Yodhin Agarwal , Zaheeruddin Ahmed

Perceptual quality of audio is the combination of aural accuracy and listener-perceived sound fidelity. It is how humans respond to the accuracy, intelligibility, and fidelity of aural media. Today this fidelity is also heavily influenced…

Sound · Computer Science 2026-03-12 Thien T. Duong , Jan P. Springer

Audio captioning is an important research area that aims to generate meaningful descriptions for audio clips. Most of the existing research extracts acoustic features of audio clips as input to encoder-decoder and transformer architectures…

Sound · Computer Science 2022-04-20 Ayşegül Özkaya Eren , Mustafa Sert

This paper proposes a simple test for compositionality (i.e., literal usage) of a word or phrase in a context-specific way. The test is computationally simple, relying on no external resources and only uses a set of trained word vectors.…

Computation and Language · Computer Science 2016-11-30 Hongyu Gong , Suma Bhat , Pramod Viswanath
‹ Prev 1 2 3 10 Next ›