English

Regularizing Transformers With Deep Probabilistic Layers

Computation and Language 2021-08-25 v1 Machine Learning

Abstract

Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer and prove its effectiveness not only in Transformers but also in the most relevant encoder-decoder based LM, seq2seq with and without attention.

Keywords

Cite

@article{arxiv.2108.10764,
  title  = {Regularizing Transformers With Deep Probabilistic Layers},
  author = {Aurora Cobo Aguilera and Pablo Martínez Olmos and Antonio Artés-Rodríguez and Fernando Pérez-Cruz},
  journal= {arXiv preprint arXiv:2108.10764},
  year   = {2021}
}
R2 v1 2026-06-24T05:22:56.348Z