Related papers: Directional Embedding Based Semi-supervised Framew…

Deductive Refinement of Species Labelling in Weakly Labelled Birdsong Recordings

Many approaches have been used in bird species classification from their sound in order to provide labels for the whole of a recording. However, a more precise classification of each bird vocalization would be of great importance to the use…

Sound · Computer Science 2016-03-24 Veronica Morfi , Dan Stowell

Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning

This paper addresses the extraction of the bird vocalization embedding from the whole song level using disentangled representation learning (DRL). Bird vocalization embeddings are necessary for large-scale bioacoustic tasks, and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-31 Runwu Shi , Katsutoshi Itoyama , Kazuhiro Nakadai

Conv-codes: Audio Hashing For Bird Species Classification

In this work, we propose a supervised, convex representation based audio hashing framework for bird species classification. The proposed framework utilizes archetypal analysis, a matrix factorization technique, to obtain convex-sparse…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-08 Anshul Thakur , Pulkit Sharma , Vinayak Abrol , Padmanabhan Rajan

Parsing Birdsong with Deep Audio Embeddings

Monitoring of bird populations has played a vital role in conservation efforts and in understanding biodiversity loss. The automation of this process has been facilitated by both sensing technologies, such as passive acoustic monitoring,…

Machine Learning · Computer Science 2021-08-23 Irina Tolkova , Brian Chu , Marcel Hedman , Stefan Kahl , Holger Klinck

Bird Species Classification And Acoustic Features Selection Based on Distributed Neural Network with Two Stage Windowing of Short-Term Features

Identification of bird species from audio records is one of the challenging tasks due to the existence of multiple species in the same recording, noise in the background, and long-term recording. Besides, choosing a proper acoustic feature…

Sound · Computer Science 2022-01-04 Nahian Ibn Hasan

Deep Networks tag the location of bird vocalisations on audio spectrograms

This work focuses on reliable detection and segmentation of bird vocalizations as recorded in the open field. Acoustic detection of avian sounds can be used for the automatized monitoring of multiple bird taxa and querying in long-term…

Audio and Speech Processing · Electrical Eng. & Systems 2017-11-20 Lefteris Fanioudakis , Ilyas Potamitis

Deep learning for detection of bird vocalisations

This work focuses on reliable detection of bird sound emissions as recorded in the open field. Acoustic detection of avian sounds can be used for the automatized monitoring of multiple bird taxa and querying in long-term recordings for…

Sound · Computer Science 2016-09-28 Ilyas Potamitis

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Visual bird's eye view (BEV) semantic segmentation helps autonomous vehicles understand the surrounding environment only from images, including static elements (e.g., roads) and dynamic elements (e.g., vehicles, pedestrians). However, the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Junyu Zhu , Lina Liu , Yu Tang , Feng Wen , Wanlong Li , Yong Liu

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding multiple binary labels into a single label with…

Sound · Computer Science 2022-04-01 Zhihao Du , Shiliang Zhang , Siqi Zheng , Zhijie Yan

An empirical investigation into audio pipeline approaches for classifying bird species

This paper is an investigation into aspects of an audio classification pipeline that will be appropriate for the monitoring of bird species on edges devices. These aspects include transfer learning, data augmentation and model optimization.…

Sound · Computer Science 2021-08-11 David Behr , Ciira wa Maina , Vukosi Marivate

Semi-supervised classification of bird vocalizations

Changes in bird populations can indicate broader changes in ecosystems, making birds one of the most important animal groups to monitor. Combining machine learning and passive acoustics enables continuous monitoring over extended periods…

Sound · Computer Science 2025-02-20 Simen Hexeberg , Mandar Chitre , Matthias Hoffmann-Kuhnt , Bing Wen Low

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we…

Sound · Computer Science 2020-07-21 Carlos Rodrigo Castillo-Sanchez , Leibny Paola Garcia-Perera , Anabel Martin-Gonzalez

A Multi-modal Deep Neural Network approach to Bird-song identification

We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four…

Sound · Computer Science 2018-11-13 Botond Fazeka , Alexander Schindler , Thomas Lidy , Andreas Rauber

BirdSoundsDenoising: Deep Visual Audio Denoising for Bird Sounds

Audio denoising has been explored for decades using both traditional and deep learning-based methods. However, these methods are still limited to either manually added artificial noise or lower denoised audio quality. To overcome these…

Sound · Computer Science 2022-10-20 Youshan Zhang , Jialu Li

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are…

Neural and Evolutionary Computing · Computer Science 2015-08-19 John R. Hershey , Zhuo Chen , Jonathan Le Roux , Shinji Watanabe

ASGIR: Audio Spectrogram Transformer Guided Classification And Information Retrieval For Birds

Recognition and interpretation of bird vocalizations are pivotal in ornithological research and ecological conservation efforts due to their significance in understanding avian behaviour, performing habitat assessment and judging ecological…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-30 Yashwardhan Chaudhuri , Paridhi Mundra , Arnesh Batra , Orchid Chetia Phukan , Arun Balaji Buduru

Estimating the Repertoire Size in Birds using Unsupervised Clustering techniques

Birds produce multiple types of vocalizations that, together, constitute a vocal repertoire. For some species, the repertoire size is of importance because it informs us about their brain capacity, territory size or social behaviour.…

Quantitative Methods · Quantitative Biology 2023-03-21 Joachim Poutaraud

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

In this paper, we tackle the singing voice phoneme segmentation problem in the singing training scenario by using language-independent information -- onset and prior coarse duration. We propose a two-step method. In the first step, we…

Sound · Computer Science 2018-06-06 Rong Gong , Xavier Serra

Unsupervised classification to improve the quality of a bird song recording dataset

Open audio databases such as Xeno-Canto are widely used to build datasets to explore bird song repertoire or to train models for automatic bird sound classification by deep learning algorithms. However, such databases suffer from the fact…

Machine Learning · Computer Science 2023-02-16 Félix Michaud , Jérôme Sueur , Maxime Le Cesne , Sylvain Haupert

Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding…

Sound · Computer Science 2021-02-11 Zeqian Li , Jacob Whitehill