Related papers: Bird Vocalization Embedding Extraction Using Self-…

Directional Embedding Based Semi-supervised Framework For Bird Vocalization Segmentation

This paper proposes a data-efficient, semi-supervised, two-pass framework for segmenting bird vocalizations. The framework utilizes a binary classification model to categorize frames of an input audio recording into the background or bird…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-27 Anshul Thakur , Padmanabhan Rajan

Estimating the Repertoire Size in Birds using Unsupervised Clustering techniques

Birds produce multiple types of vocalizations that, together, constitute a vocal repertoire. For some species, the repertoire size is of importance because it informs us about their brain capacity, territory size or social behaviour.…

Quantitative Methods · Quantitative Biology 2023-03-21 Joachim Poutaraud

Parsing Birdsong with Deep Audio Embeddings

Monitoring of bird populations has played a vital role in conservation efforts and in understanding biodiversity loss. The automation of this process has been facilitated by both sensing technologies, such as passive acoustic monitoring,…

Machine Learning · Computer Science 2021-08-23 Irina Tolkova , Brian Chu , Marcel Hedman , Stefan Kahl , Holger Klinck

Deductive Refinement of Species Labelling in Weakly Labelled Birdsong Recordings

Many approaches have been used in bird species classification from their sound in order to provide labels for the whole of a recording. However, a more precise classification of each bird vocalization would be of great importance to the use…

Sound · Computer Science 2016-03-24 Veronica Morfi , Dan Stowell

Emotion-Disentangled Embedding Alignment for Noise-Robust and Cross-Corpus Speech Emotion Recognition

Effectiveness of speech emotion recognition in real-world scenarios is often hindered by noisy environments and variability across datasets. This paper introduces a two-step approach to enhance the robustness and generalization of speech…

Sound · Computer Science 2025-10-13 Upasana Tiwari , Rupayan Chakraborty , Sunil Kumar Kopparapu

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-22 Burooj Ghani , Tom Denton , Stefan Kahl , Holger Klinck

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-28 Yang Xiang , Jesper Lisby Højvang , Morten Højfeldt Rasmussen , Mads Græsbøll Christensen

Deep learning for detection of bird vocalisations

This work focuses on reliable detection of bird sound emissions as recorded in the open field. Acoustic detection of avian sounds can be used for the automatized monitoring of multiple bird taxa and querying in long-term recordings for…

Sound · Computer Science 2016-09-28 Ilyas Potamitis

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-25 Yang Xiang , Jesper Lisby Højvang , Morten Højfeldt Rasmussen , Mads Græsbøll Christensen

A deep representation learning speech enhancement method using $\beta$-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-12 Yang Xiang , Jesper Lisby Højvang , Morten Højfeldt Rasmussen , Mads Græsbøll Christensen

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning…

Machine Learning · Statistics 2017-11-15 N. Siddharth , Brooks Paige , Jan-Willem van de Meent , Alban Desmaison , Noah D. Goodman , Pushmeet Kohli , Frank Wood , Philip H. S. Torr

Emotional Styles Hide in Deep Speaker Embeddings: Disentangle Deep Speaker Embeddings for Speaker Clustering

Speaker clustering is the task of identifying the unique speakers in a set of audio recordings (each belonging to exactly one speaker) without knowing who and how many speakers are present in the entire data, which is essential for speaker…

Sound · Computer Science 2025-09-30 Chaohao Lin , Xu Zheng , Kaida Wu , Peihao Xiang , Ou Bai

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to…

Sound · Computer Science 2014-07-14 Dan Stowell , Mark D. Plumbley

Retrieval-based Disentangled Representation Learning with Natural Language Supervision

Disentangled representation learning remains challenging as the underlying factors of variation in the data do not naturally exist. The inherent complexity of real-world data makes it unfeasible to exhaustively enumerate and encapsulate all…

Computation and Language · Computer Science 2024-02-13 Jiawei Zhou , Xiaoguang Li , Lifeng Shang , Xin Jiang , Qun Liu , Lei Chen

Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities. For this reason, there is an increasing interest in…

Sound · Computer Science 2022-04-12 Alejandro Delgado , Charalampos Saitis , Emmanouil Benetos , Mark Sandler

Deep Networks tag the location of bird vocalisations on audio spectrograms

This work focuses on reliable detection and segmentation of bird vocalizations as recorded in the open field. Acoustic detection of avian sounds can be used for the automatized monitoring of multiple bird taxa and querying in long-term…

Audio and Speech Processing · Electrical Eng. & Systems 2017-11-20 Lefteris Fanioudakis , Ilyas Potamitis

Identifying birdsong syllables without labelled data

Identifying sequences of syllables within birdsongs is key to tackling a wide array of challenges, including bird individual identification and better understanding of animal communication and sensory-motor learning. Recently, machine…

Sound · Computer Science 2025-09-27 Mélisande Teng , Julien Boussard , David Rolnick , Hugo Larochelle

Self-Supervised Learning for Few-Shot Bird Sound Classification

Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is pertinent in bioacoustics, where biologists…

Sound · Computer Science 2024-02-12 Ilyass Moummad , Romain Serizel , Nicolas Farrugia

A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder

Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-18 Yang Xiang , Jingguang Tian , Xinhui Hu , Xinkang Xu , ZhaoHui Yin

Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics

Animal sounds can be recognised automatically by machine learning, and this has an important role to play in biodiversity monitoring. Yet despite increasingly impressive capabilities, bioacoustic species classifiers still exhibit imbalanced…

Sound · Computer Science 2024-09-25 Burooj Ghani , Vincent J. Kalkman , Bob Planqué , Willem-Pier Vellinga , Lisa Gill , Dan Stowell