Related papers: A quick search method for audio signals based on a…
A naive approach for finding similar audio items would be to compare each entry from the feature vector of the test example with each feature vector of the candidates in a k-nearest neighbors fashion. There are already two problems with…
We introduce an audiovisual method for long-range text-to-video retrieval. Unlike previous approaches designed for short video retrieval (e.g., 5-15 seconds in duration), our approach aims to retrieve minute-long videos that capture complex…
The automatic summarization of multimedia sources is an important task that facilitates the understanding of an individual by condensing the source while maintaining relevant information. In this paper we focus on audio summarization based…
Modern day audio signal classification techniques lack the ability to classify low feature audio signals in the form of spectrographic temporal frequency data representations. Additionally, currently utilized techniques rely on full diverse…
Traditional Query-by-Example (QbE) speech search approaches usually use methods based on frame-level features, while state-of-the-art approaches tend to use models based on acoustic word embeddings (AWEs) to transform variable length audio…
This paper describes an effective unsupervised method for query-by-example speaker retrieval. We suppose that only one speaker is in each audio file or in audio segment. The audio data are modeled using a common universal codebook. The…
Machine hearing or listening represents an emerging area. Conventional approaches rely on the design of handcrafted features specialized to a specific audio task and that can hardly generalized to other audio fields. For example,…
In this paper, we proposed a robust music genre classification method based on a sparse FFT based feature extraction method which extracted with discriminating power of spectral analysis of non-stationary audio signals, and the capability…
This work proposes a novel feature selection algorithm to classify Songs into different groups. Classification of musical content is often a non-trivial job and still relatively less explored area. The main idea conveyed in this article is…
The aim of this paper is to introduce a new schema, based on a Compressive Sampling technique, for the recovery of lost data in multimedia streaming. The audio streaming data are encapsuled in different packets by using an interleaving…
This work introduces a feature extracted from stereophonic/binaural audio signals aiming to represent a measure of perceived quality degradation in processed spatial auditory scenes. The feature extraction technique is based on a simplified…
In this paper, we propose an enhanced audio-visual deep detection method. Recent methods in audio-visual deepfake detection mostly assess the synchronization between audio and visual features. Although they have shown promising results,…
Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint…
This study presents a novel Deep Learning-based and lightweight approach for the automated detection of sports highlights (HLs) from audio and video sources. HL detection is a key task in sports video analysis, traditionally requiring…
This paper addresses the problem of cross-modal musical piece identification and retrieval: finding the appropriate recording(s) from a database given a sheet music query, and vice versa, working directly with audio and scanned sheet music…
Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely…
Identifying acoustic events from a continuously streaming audio source is of interest for many applications including environmental monitoring for basic research. In this scenario neither different event classes are known nor what…
In this work, we address the problem of audio-based near-duplicate video retrieval. We propose the Audio Similarity Learning (AuSiL) approach that effectively captures temporal patterns of audio similarity between video pairs. For the…
The $k$-segmentation of a video stream is used to partition it into $k$ piecewise-linear segments, so that each linear segment has a meaningful interpretation. Such segmentation may be used to summarize large videos using a small set of…
Precise video retrieval requires multi-modal correlations to handle unseen vocabulary and scenes, becoming more complex for lengthy videos where models must perform effectively without prior training on a specific dataset. We introduce a…