English
Related papers

Related papers: Parallel and Limited Data Voice Conversion Using S…

200 papers

Large amounts of labeled data are typically required to train deep learning models. For many real-world problems, however, acquiring additional data can be expensive or even impossible. We present semi-supervised deep kernel learning…

Machine Learning · Computer Science 2019-03-05 Neal Jean , Sang Michael Xie , Stefano Ermon

Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which…

Machine Learning · Statistics 2016-11-03 Andrew Gordon Wilson , Zhiting Hu , Ruslan Salakhutdinov , Eric P. Xing

Deep kernel learning (DKL) leverages the connection between Gaussian process (GP) and neural networks (NN) to build an end-to-end, hybrid model. It combines the capability of NN to learn rich representations under massive data and the…

Machine Learning · Statistics 2020-08-20 Haitao Liu , Yew-Soon Ong , Xiaomo Jiang , Xiaofang Wang

So far, many of the deep learning approaches for voice conversion produce good quality speech by using a large amount of training data. This paper presents a Deep Bidirectional Long Short-Term Memory (DBLSTM) based voice conversion…

Audio and Speech Processing · Electrical Eng. & Systems 2018-09-27 Mingyang Zhang , Berrak Sisman , Sai Sirisha Rallabandi , Haizhou Li , Li Zhao

This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest…

Machine Learning · Computer Science 2019-12-06 Tong Teng , Jie Chen , Yehong Zhang , Kian Hsiang Low

Labeled speech data from patients with Parkinsons disease (PD) are scarce, and the statistical distributions of training and test data differ significantly in the existing datasets. To solve these problems, dimensional reduction and sample…

Machine Learning · Computer Science 2020-02-11 Xiaoheng Zhang , Yongming Li , Pin Wang , Xiaoheng Tan , Yuchuan Liu

Digital twins require computationally-efficient reduced-order models (ROMs) that can accurately describe complex dynamics of physical assets. However, constructing ROMs from noisy high-dimensional data is challenging. In this work, we…

Machine Learning · Computer Science 2024-11-12 Nicolò Botteghi , Paolo Motta , Andrea Manzoni , Paolo Zunino , Mengwu Guo

We study the problem of cross-lingual voice conversion in non-parallel speech corpora and one-shot learning setting. Most prior work require either parallel speech corpora or enough amount of training data from a target speaker. However, we…

Sound · Computer Science 2018-08-17 Seyed Hamidreza Mohammadi , Taehwan Kim

While much research effort has been dedicated to scaling up sparse Gaussian process (GP) models based on inducing variables for big data, little attention is afforded to the other less explored class of low-rank GP approximations that…

Machine Learning · Statistics 2016-11-21 Quang Minh Hoang , Trong Nghia Hoang , Kian Hsiang Low

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-26 Yin-Jyun Luo , Chin-Chen Hsu , Kat Agres , Dorien Herremans

This work proposes a Stochastic Variational Deep Kernel Learning method for the data-driven discovery of low-dimensional dynamical models from high-dimensional noisy data. The framework is composed of an encoder that compresses…

Machine Learning · Computer Science 2023-06-28 Nicolò Botteghi , Mengwu Guo , Christoph Brune

Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually outperforms a speaker-independent (SI) vocoder, it is impractical to collect a…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-11 Yi-Chiao Wu , Cheng-Hung Hu , Hung-Shin Lee , Yu-Huai Peng , Wen-Chin Huang , Yu Tsao , Hsin-Min Wang , Tomoki Toda

Voice conversion is to generate a new speech with the source content and a target voice style. In this paper, we focus on one general setting, i.e., non-parallel many-to-many voice conversion, which is close to the real-world scenario. As…

Sound · Computer Science 2022-07-28 Jian Ma , Zhedong Zheng , Hao Fei , Feng Zheng , Tat-seng Chua , Yi Yang

Large-scale distributed training of deep acoustic models plays an important role in today's high-performance automatic speech recognition (ASR). In this paper we investigate a variety of asynchronous decentralized distributed training…

Computation and Language · Computer Science 2021-10-22 Xiaodong Cui , Wei Zhang , Abdullah Kayi , Mingrui Liu , Ulrich Finkler , Brian Kingsbury , George Saon , David Kung

Voice conversion (VC) modifies voice characteristics while preserving linguistic content. This paper presents the Stepback network, a novel model for converting speaker identity using non-parallel data. Unlike traditional VC methods that…

Sound · Computer Science 2025-01-28 Qian Yang , Calbert Graham

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large…

Sound · Computer Science 2020-11-24 Xiaoyi Qin , Yaogen Yang , Lin Yang , Xuyang Wang , Junjie Wang , Ming Li

Speaker Verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-06 Kerlos Atia Abdalmalak , Ascensión Gallardo-Antol'in

We investigate the potential of stochastic neural networks for learning effective waveform-based acoustic models. The waveform-based setting, inherent to fully end-to-end speech recognition systems, is motivated by several comparative…

Machine Learning · Statistics 2021-08-17 Dino Oglic , Zoran Cvetkovic , Peter Sollich

This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-10 Hirokazu Kameoka , Wen-Chin Huang , Kou Tanaka , Takuhiro Kaneko , Nobukatsu Hojo , Tomoki Toda

In this work, we introduce a spatio-temporal kernel for Gaussian process (GP) regression-based sound field estimation. Notably, GPs have the attractive property that the sound field is a linear function of the measurements, allowing the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-08 David Sundström , Shoichi Koyama , Andreas Jakobsson
‹ Prev 1 2 3 10 Next ›