Probabilistic Transformers
Machine Learning
2021-04-14 v3 Machine Learning
Abstract
We show that Transformers are Maximum Posterior Probability estimators for Mixtures of Gaussian Models. This brings a probabilistic point of view to Transformers and suggests extensions to other probabilistic cases.
Cite
@article{arxiv.2010.15583,
title = {Probabilistic Transformers},
author = {Javier R. Movellan and Prasad Gabbur},
journal= {arXiv preprint arXiv:2010.15583},
year = {2021}
}