English
Related papers

Related papers: VANPY: Voice Analysis Framework

200 papers

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common. In addition, most of the voice synthesis models still require a large number of audio data paired…

Sound · Computer Science 2022-11-18 Hyeong-Seok Choi , Jinhyeok Yang , Juheon Lee , Hyeongju Kim

We introduce Vox-Profile, a comprehensive benchmark to characterize rich speaker and speech traits using speech foundation models. Unlike existing works that focus on a single dimension of speaker traits, Vox-Profile provides holistic and…

Speech deepfake detection is a well-established research field with different models, datasets, and training strategies. However, the lack of standardized implementations and evaluation protocols limits reproducibility, benchmarking, and…

The rapid advancement of generative AI has made audio deepfakes increasingly indistinguishable from authentic human vocals, posing significant threats to persons-of-interest (POI) such as public figures. Current detection systems primarily…

Sound · Computer Science 2026-05-19 Jun Xue , Tong Zhang , Zhuolin Yi , Yihuan Huang , Yi Chai , Yiyang Zhang , Yanzhen Ren

SpeechPy is an open source Python package that contains speech preprocessing techniques, speech features, and important post-processing operations. It provides most frequent used speech features including MFCCs and filterbank energies…

Sound · Computer Science 2018-07-25 Amirsina Torfi

Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this…

Sound · Computer Science 2023-12-25 Sarina Meyer , Xiaoxiao Miao , Ngoc Thang Vu

We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral…

Computation and Language · Computer Science 2023-02-09 Mathieu Bernard , Maxime Poli , Julien Karadayi , Emmanuel Dupoux

Employing voice-based emotion recognition function in artificial intelligence (AI) product will improve the user experience. Most of researches that have been done only focus on the speech collected under controlled conditions. The…

Audio and Speech Processing · Electrical Eng. & Systems 2018-03-06 Fei Tao , Gang Liu , Qingen Zhao

We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features…

Sound · Computer Science 2021-10-29 Hyeong-Seok Choi , Juheon Lee , Wansoo Kim , Jie Hwan Lee , Hoon Heo , Kyogu Lee

Emotion is essential in spoken communication, yet most existing frameworks in speech emotion modeling rely on predefined categories or low-dimensional continuous attributes, which offer limited expressive capacity. Recent advances in speech…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-07 Tianhua Qi , Wenming Zheng , Björn W. Schuller , Zhaojie Luo , Haizhou Li

QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that…

Machine Learning · Computer Science 2021-06-22 Alejandro Moreo , Andrea Esuli , Fabrizio Sebastiani

Automatic speaker recognition algorithms typically characterize speech audio using short-term spectral features that encode the physiological and anatomical aspects of speech production. Such algorithms do not fully capitalize on…

Sound · Computer Science 2021-02-16 Anurag Chowdhury , Arun Ross , Prabu David

In light of the growing interest in type inference research for Python, both researchers and practitioners require a standardized process to assess the performance of various type inference techniques. This paper introduces TypeEvalPy, a…

Software Engineering · Computer Science 2024-01-03 Ashwin Prasad Shivarpatna Venkatesh , Samkutty Sabu , Jiawei Wang , Amir M. Mir , Li Li , Eric Bodden

Non-verbal Vocalizations (NVs), such as laughter and sighs, are vital for conveying emotion and intention in human speech, yet most existing speech systems neglect them, which severely compromises communicative richness and emotional…

Sound · Computer Science 2026-01-14 Runchuan Ye , Yixuan Zhou , Renjie Yu , Zijian Lin , Kehan Li , Xiang Li , Xin Liu , Guoyang Zeng , Zhiyong Wu

Disentangled representation learning in speech processing has lagged behind other domains, largely due to the lack of datasets with annotated generative factors for robust evaluation. To address this, we propose SynSpeech, a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Yusuf Brima , Ulf Krumnack , Simone Pika , Gunther Heidemann

Speech audio in the wild is often processed by post-production effects, but existing speech datasets rarely provide precise annotations of effects and parameters, limiting systematic study. We introduce VoxEffects, a speech audio effects…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-15 Zhe Zhang , Yigitcan Özer , Junichi Yamagishi

Deep Audio Analyzer is an open source speech framework that aims to simplify the research and the development process of neural speech processing pipelines, allowing users to conceive, compare and share results in a fast and reproducible…

Sound · Computer Science 2023-10-31 Valerio Francesco Puglisi , Oliver Giudice , Sebastiano Battiato

In recent years, data-driven models have enabled significant advances in medicine. Simultaneously, voice has shown potential for analysis in precision medicine as a biomarker for screening illnesses. There has been a growing trend to pursue…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-20 Larry Zhang , Xiaotong Chen , Abbad Vakil , Ali Byott , Reza Hosseini Ghomi

Vocal pitch is an important high-level feature in music audio processing. However, extracting vocal pitch in polyphonic music is more challenging due to the presence of accompaniment. To eliminate the influence of the accompaniment, most…

Sound · Computer Science 2024-01-09 Haojie Wei , Xueke Cao , Tangpeng Dan , Yueguo Chen

Voice based technologies have the potential to bridge digital accessibility gaps; however, existing datasets fail to capture the linguistic and regional diversity of Indic languages. We present Project VAANI, a large scale multimodal…

‹ Prev 1 2 3 10 Next ›