Related papers: MAPGN: MAsked Pointer-Generator Network for sequen…

MaskGAN: Better Text Generation via Filling in the______

Neural text generation models are often autoregressive language models or seq2seq models. These models generate text by sampling words sequentially, with each word conditioned on the previous word, and are state-of-the-art for several…

Machine Learning · Statistics 2018-03-02 William Fedus , Ian Goodfellow , Andrew M. Dai

Mask-Align: Self-Supervised Neural Word Alignment

Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks. Current unsupervised neural alignment methods focus on inducing…

Computation and Language · Computer Science 2021-05-18 Chi Chen , Maosong Sun , Yang Liu

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

Prompt learning has achieved great success in efficiently exploiting large-scale pre-trained models in natural language processing (NLP). It reformulates the downstream tasks as the generative pre-training ones to achieve consistency, thus…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 Ning Liao , Bowen Shi , Xiaopeng Zhang , Min Cao , Junchi Yan , Qi Tian

Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

In this paper, we generalize text infilling (e.g., masked language models) by proposing Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective. SSR provides more fine-grained learning…

Computation and Language · Computer Science 2021-09-27 Wangchunshu Zhou , Tao Ge , Canwen Xu , Ke Xu , Furu Wei

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic…

Sound · Computer Science 2020-01-14 Jing-Xuan Zhang , Zhen-Hua Ling , Yuan Jiang , Li-Juan Liu , Chen Liang , Li-Rong Dai

Get To The Point: Summarization with Pointer-Generator Networks

Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two…

Computation and Language · Computer Science 2017-04-26 Abigail See , Peter J. Liu , Christopher D. Manning

Unsupervised Pretraining for Sequence to Sequence Learning

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights…

Computation and Language · Computer Science 2018-02-23 Prajit Ramachandran , Peter J. Liu , Quoc V. Le

May the Force Be with Your Copy Mechanism: Enhanced Supervised-Copy Method for Natural Language Generation

Recent neural sequence-to-sequence models with a copy mechanism have achieved remarkable progress in various text generation tasks. These models addressed out-of-vocabulary problems and facilitated the generation of rare words. However, the…

Computation and Language · Computer Science 2021-12-21 Sanghyuk Choi , Jeong-in Hwang , Hyungjong Noh , Yeonsoo Lee

SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

Speech representations learned from Self-supervised learning (SSL) models can benefit various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-12 Kai-Wei Chang , Wei-Cheng Tseng , Shang-Wen Li , Hung-yi Lee

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive and non-autoregressive systems. The autoregressive systems implicitly model duration but exhibit certain deficiencies in robustness and lack of…

Sound · Computer Science 2024-10-22 Yuancheng Wang , Haoyue Zhan , Liwei Liu , Ruihong Zeng , Haotian Guo , Jiachen Zheng , Qiang Zhang , Xueyao Zhang , Shunsi Zhang , Zhizheng Wu

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the…

Machine Learning · Computer Science 2015-09-24 Samy Bengio , Oriol Vinyals , Navdeep Jaitly , Noam Shazeer

Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Agents that can follow language instructions are expected to be useful in a variety of situations such as navigation. However, training neural network-based agents requires numerous paired trajectories and languages. This paper proposes…

Machine Learning · Computer Science 2023-01-03 Kei Akuzawa , Yusuke Iwasawa , Yutaka Matsuo

Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages

Neural text-to-speech (TTS) models can synthesize natural human speech when trained on large amounts of transcribed speech. However, collecting such large-scale transcribed data is expensive. This paper proposes an unsupervised pre-training…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-29 Seongyeon Park , Myungseo Song , Bohyung Kim , Tae-Hyun Oh

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

Self-supervised pre-training has been successful in both text and speech processing. Speech and text offer different but complementary information. The question is whether we are able to perform a speech-text joint pre-training on unpaired…

Computation and Language · Computer Science 2022-11-01 Xianghu Yue , Junyi Ao , Xiaoxue Gao , Haizhou Li

Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data

Unsupervised clustering on speakers is becoming increasingly important for its potential uses in semi-supervised learning. In reality, we are often presented with enormous amounts of unlabeled data from multi-party meetings and discussions.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-26 Fuchuan Tong , Siqi Zheng , Min Zhang , Yafeng Chen , Hongbin Suo , Qingyang Hong , Lin Li

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech. The present sequence-to-sequence models can directly map text to mel-spectrogram acoustic features, which are convenient for…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-27 Lauri Juvela , Bajibabu Bollepalli , Junichi Yamagishi , Paavo Alku

Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS

Recurrent Neural Networks (RNNs) have become the standard modeling technique for sequence data, and are used in a number of novel text-to-speech models. However, training a TTS model including RNN components has certain requirements for GPU…

Computation and Language · Computer Science 2023-04-18 Ziqi Liang

Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets

Self-supervised learning has emerged as a powerful approach for leveraging large-scale unlabeled data to improve model performance in various domains. In this paper, we explore masked self-supervised pre-training for text recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-03-31 Martin Kišš , Michal Hradiš

Sequential Copying Networks

Copying mechanism shows effectiveness in sequence-to-sequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation. However, existing works on modeling copying or pointing…

Computation and Language · Computer Science 2018-07-09 Qingyu Zhou , Nan Yang , Furu Wei , Ming Zhou

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models are attractive owing to their ability to convert prosody. While…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-17 Wen-Chin Huang , Tomoki Hayashi , Yi-Chiao Wu , Hirokazu Kameoka , Tomoki Toda