English
Related papers

Related papers: Pretraining Techniques for Sequence-to-Sequence Vo…

200 papers

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models are attractive owing to their ability to convert prosody. While…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-17 Wen-Chin Huang , Tomoki Hayashi , Yi-Chiao Wu , Hirokazu Kameoka , Tomoki Toda

This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC), which is to first transcribe the input speech with an automatic…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-07 Wen-Chin Huang , Tomoki Hayashi , Shinji Watanabe , Tomoki Toda

This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq has been outstanding at numerous tasks involving sequence modeling…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-13 Kou Tanaka , Hirokazu Kameoka , Takuhiro Kaneko , Nobukatsu Hojo

This paper proposes a novel voice conversion (VC) method based on non-autoregressive sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S models such as FastSpeech in text-to-speech (TTS), we extend the…

Sound · Computer Science 2021-04-15 Tomoki Hayashi , Wen-Chin Huang , Kazuhiro Kobayashi , Tomoki Toda

In voice conversion (VC), an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic…

Sound · Computer Science 2021-07-21 Wen-Chin Huang , Tomoki Hayashi , Xinjian Li , Shinji Watanabe , Tomoki Toda

Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS). This paper focuses on an emergent sequence-to-sequence…

This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-10 Hirokazu Kameoka , Wen-Chin Huang , Kou Tanaka , Takuhiro Kaneko , Nobukatsu Hojo , Tomoki Toda

This paper proposes a voice conversion (VC) method using sequence-to-sequence (seq2seq or S2S) learning, which flexibly converts not only the voice characteristics but also the pitch contour and duration of input speech. The proposed…

Sound · Computer Science 2020-10-08 Hirokazu Kameoka , Kou Tanaka , Damian Kwasny , Takuhiro Kaneko , Nobukatsu Hojo

We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-sequence (seq2seq) framework. A2O VC aims to convert any speaker, including those unseen during training, to a fixed target speaker. We utilize…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-26 Wen-Chin Huang , Yi-Chiao Wu , Tomoki Hayashi , Tomoki Toda

Voice conversion (VC) is a task to transform a person's voice to different style while conserving linguistic contents. Previous state-of-the-art on VC is based on sequence-to-sequence (seq2seq) model, which could mislead linguistic…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-28 Tae-Ho Kim , Sungjae Cho , Shinkook Choi , Sejik Park , Soo-Young Lee

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior…

Sound · Computer Science 2017-08-08 Hiroyuki Miyoshi , Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari

We present a voice conversion solution using recurrent sequence to sequence modeling for DNNs. Our solution takes advantage of recent advances in attention based modeling in the fields of Neural Machine Translation (NMT), Text-to-Speech…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-19 Praveen Narayanan , Punarjay Chakravarty , Francois Charette , Gint Puskorius

End-to-end speech recognition is a promising technology for enabling compact automatic speech recognition (ASR) systems since it can unify the acoustic and language model into a single neural network. However, as a drawback, training of…

Computation and Language · Computer Science 2022-02-17 Yotaro Kubo , Shigeki Karita , Michiel Bacchiani

In this paper, a neural network named Sequence-to-sequence ConvErsion NeTwork (SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT model is estimated by aligning the feature sequences of source and…

Sound · Computer Science 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Juan Liu , Yuan Jiang , Li-Rong Dai

We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and fine-tuning on…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-08 Samuel Kessler , Bethan Thomas , Salah Karout

Recently, sequence-to-sequence models with attention have been successfully applied in Text-to-speech (TTS). These models can generate near-human speech with a large accurately-transcribed speech corpus. However, preparing such a large…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-12 Haitong Zhang , Yue Lin

This work proposes a novel approach based on sequence-to-sequence (seq2seq) models for context-aware conversational systems. Exist- ing seq2seq models have been shown to be good for generating natural responses in a data-driven…

Computation and Language · Computer Science 2018-05-23 Silje Christensen , Simen Johnsrud , Massimiliano Ruocco , Heri Ramampiaro

This paper presents a method of sequence-to-sequence (seq2seq) voice conversion using non-parallel training data. In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Li-Rong Dai

We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech…

Computation and Language · Computer Science 2022-05-03 Felix Wu , Kwangyoun Kim , Shinji Watanabe , Kyu Han , Ryan McDonald , Kilian Q. Weinberger , Yoav Artzi

In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert-…

Sound · Computer Science 2021-02-17 Yi Lin , Qin Li , Bo Yang , Zhen Yan , Huachun Tan , Zhengmao Chen
‹ Prev 1 2 3 10 Next ›