English
Related papers

Related papers: Probabilistic Transformers

200 papers

Most expressivity results for transformers treat them as language recognizers -- devices that accept or reject strings -- rather than as they are used in practice: as language models that generate strings autoregressively and…

Computation and Language · Computer Science 2026-05-26 Andy Yang , Anej Svete , Jiaoda Li , Anthony Widjaja Lin , Jonathan Rawski , Ryan Cotterell , David Chiang

Mixture models arise in many regression problems, but most methods have seen limited adoption partly due to these algorithms' highly-tailored and model-specific nature. On the other hand, transformers are flexible, neural sequence models…

Machine Learning · Computer Science 2023-11-15 Reese Pathak , Rajat Sen , Weihao Kong , Abhimanyu Das

Time series forecasting is crucial for many fields, such as disaster warning, weather prediction, and energy consumption. The Transformer-based models are considered to have revolutionized the field of sequence modeling. However, the…

Machine Learning · Computer Science 2022-11-01 Junlong Tong , Liping Xie , Wankou Yang , Kanjian Zhang

We propose and study properties of maximum likelihood estimators in the class of conditional transformation models. Based on a suitable explicit parameterisation of the unconditional or conditional transformation function, we establish a…

Methodology · Statistics 2019-10-22 Torsten Hothorn , Lisa Möst , Peter Bühlmann

Maximum a posteriori and Bayes estimators are two common methods of point estimation in Bayesian Statistics. It is commonly accepted that maximum a posteriori estimators are a limiting case of Bayes estimators with 0-1 loss. In this paper,…

Statistics Theory · Mathematics 2018-02-23 Robert Bassett , Julio Deride

Gaussian mixture models are central to classical statistics, widely used in the information sciences, and have a rich mathematical structure. We examine their maximum likelihood estimates through the lens of algebraic statistics. The MLE is…

Statistics Theory · Mathematics 2019-04-19 Carlos Améndola , Mathias Drton , Bernd Sturmfels

Transformers are deep architectures that define ``in-context maps'' which enable predicting new tokens based on a given set of tokens (such as a prompt in NLP applications or a set of patches for a vision transformer). In previous work, we…

Computation and Language · Computer Science 2025-10-01 Takashi Furuya , Maarten V. de Hoop , Matti Lassas

Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters…

Machine Learning · Computer Science 2025-08-22 Borjan Geshkovski , Cyril Letrouit , Yury Polyanskiy , Philippe Rigollet

Expanding a lower-dimensional problem to a higher-dimensional space and then projecting back is often beneficial. This article rigorously investigates this perspective in the context of finite mixture models, namely how to improve inference…

Methodology · Statistics 2014-11-10 Andrea Mercatanti , Fan Li , Fabrizia Mealli

Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal…

Statistics Theory · Mathematics 2021-11-30 Iryna Gurevych , Michael Kohler , Gözde Gül Sahin

The main challenge in Bayesian models is to determine the posterior for the model parameters. Already, in models with only one or few parameters, the analytical posterior can only be determined in special settings. In Bayesian neural…

Machine Learning · Statistics 2021-06-02 Sefan Hörtling , Daniel Dold , Oliver Dürr , Beate Sick

The main approach to inference for multivariate extremes consists in approximating the joint upper tail of the observations by a parametric family arising in the limit for extreme events. The latter may be expressed in terms of…

Methodology · Statistics 2015-06-17 Raphaël Huser , Anthony C. Davison , Marc G. Genton

Seemingly unrelated linear regression models are introduced in which the distribution of the errors is a finite mixture of Gaussian components. Identifiability conditions are provided. The score vector and the Hessian matrix are derived.…

Methodology · Statistics 2014-03-18 Giuliano Galimberti , Elena Scardovi , Gabriele Soffritti

We establish the consistency of a nonparametric maximum likelihood estimator for a class of stochastic inverse problems. We proceed by embedding the framework into the general settings of early results of Pfanzagl related to mixtures.

Statistics Theory · Mathematics 2007-10-08 Djalil Chafai , Jean-Michel Loubes

The Bayesian approach to machine learning amounts to computing posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables.…

Logic in Computer Science · Computer Science 2015-07-01 Johannes Borgström , Andrew D Gordon , Michael Greenberg , James Margetson , Jurgen Van Gael

Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions…

Machine Learning · Computer Science 2024-03-04 Xiaoxin Yin , David S. Yin

Transformers have dominated empirical machine learning models of natural language processing. In this paper, we introduce basic concepts of Transformers and present key techniques that form the recent advances of these models. This includes…

Computation and Language · Computer Science 2023-11-30 Tong Xiao , Jingbo Zhu

The identification of nonlinear dynamics from observations is essential for the alignment of the theoretical ideas and experimental data. The last, in turn, is often corrupted by the side effects and noise of different natures, so…

Machine Learning · Computer Science 2020-06-08 Anna Shalova , Ivan Oseledets

The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and…

Machine Learning · Computer Science 2026-01-21 Richard E. Turner

In order to rigorously define maximum-a-posteriori estimators for nonparametric Bayesian inverse problems for general Banach space valued parameters, we derive and prove certain previously postulated but unproven bounds on small ball…

Probability · Mathematics 2022-07-07 Philipp Wacker
‹ Prev 1 2 3 10 Next ›