Related papers: TraDE: Transformers for Density Estimation

A Deep and Tractable Density Estimator

The Neural Autoregressive Distribution Estimator (NADE) and its real-valued version RNADE are competitive density models of multidimensional data across a variety of domains. These models use a fixed, arbitrary ordering of the data…

Machine Learning · Statistics 2014-01-14 Benigno Uria , Iain Murray , Hugo Larochelle

Designing Robust Transformers using Robust Kernel Density Estimation

Recent advances in Transformer architectures have empowered their empirical success in a variety of tasks across different domains. However, existing works mainly focus on predictive accuracy and computational cost, without considering…

Machine Learning · Computer Science 2023-11-09 Xing Han , Tongzheng Ren , Tan Minh Nguyen , Khai Nguyen , Joydeep Ghosh , Nhat Ho

RNADE: The real-valued neural autoregressive density-estimator

We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared…

Machine Learning · Statistics 2014-01-10 Benigno Uria , Iain Murray , Hugo Larochelle

Conditional Transfer with Dense Residual Attention: Synthesizing traffic signs from street-view imagery

Object detection and classification of traffic signs in street-view imagery is an essential element for asset management, map making and autonomous driving. However, some traffic signs occur rarely and consequently, they are difficult to…

Computer Vision and Pattern Recognition · Computer Science 2018-09-06 Clint Sebastian , Ries Uittenbogaard , Julien Vijverberg , Bas Boom , Peter H. N. de With

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data. We propose a new model that extends this inference scheme to multiple steps, arguing that…

Machine Learning · Statistics 2014-12-09 Tapani Raiko , Li Yao , Kyunghyun Cho , Yoshua Bengio

Neural Autoregressive Distribution Estimation

We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a…

Machine Learning · Computer Science 2016-05-30 Benigno Uria , Marc-Alexandre Côté , Karol Gregor , Iain Murray , Hugo Larochelle

Tensor-Train Density Estimation

Estimation of probability density function from samples is one of the central problems in statistics and machine learning. Modern neural network-based models can learn high dimensional distributions but have problems with hyperparameter…

Machine Learning · Computer Science 2022-02-28 Georgii S. Novikov , Maxim E. Panov , Ivan V. Oseledets

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Auto-regressive sequence generative models trained by Maximum Likelihood Estimation suffer the exposure bias problem in practical finite sample scenarios. The crux is that the number of training samples for Maximum Likelihood Estimation is…

Machine Learning · Statistics 2020-07-14 Yuxuan Song , Ning Miao , Hao Zhou , Lantao Yu , Mingxuan Wang , Lei Li

Autoregressive Energy Machines

Neural density estimators are flexible families of parametric models which have seen widespread use in unsupervised machine learning in recent years. Maximum-likelihood training typically dictates that these models be constrained to specify…

Machine Learning · Statistics 2019-04-12 Charlie Nash , Conor Durkan

Study of Training Dynamics for Memory-Constrained Fine-Tuning

Memory-efficient training of deep neural networks has become increasingly important as models grow larger while deployment environments impose strict resource constraints. We propose TraDy, a novel transfer learning scheme leveraging two…

Machine Learning · Computer Science 2026-02-23 Aël Quélennec , Nour Hezbri , Pavlo Mozharovskyi , Van-Tam Nguyen , Enzo Tartaglione

Autoregressive Models: What Are They Good For?

Autoregressive (AR) models have become a popular tool for unsupervised learning, achieving state-of-the-art log likelihood estimates. We investigate the use of AR models as density estimators in two settings -- as a learning signal for…

Machine Learning · Computer Science 2019-10-18 Murtaza Dalal , Alexander C. Li , Rohan Taori

OCDE: Odds Conditional Density Estimator

Conditional density estimation (CDE) models can be useful for many statistical applications, especially because the full conditional density is estimated instead of traditional regression point estimates, revealing more information about…

Methodology · Statistics 2021-07-12 Alex Akira Okuno , Felipe Maia Polo

The DEformer: An Order-Agnostic Distribution Estimating Transformer

Order-agnostic autoregressive distribution (density) estimation (OADE), i.e., autoregressive distribution estimation where the features can occur in an arbitrary order, is a challenging problem in generative machine learning. Prior work on…

Machine Learning · Computer Science 2021-07-13 Michael A. Alcorn , Anh Nguyen

GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data

Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific…

Machine Learning · Computer Science 2024-03-13 Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt

A Unified Perspective on the Dynamics of Deep Transformers

Transformers, which are state-of-the-art in most machine learning tasks, represent the data as sequences of vectors called tokens. This representation is then exploited by the attention function, which learns dependencies between tokens and…

Machine Learning · Computer Science 2025-01-31 Valérie Castin , Pierre Ablin , José Antonio Carrillo , Gabriel Peyré

Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers

We propose a novel framework, Continuous_Time Attention, which infuses partial differential equations (PDEs) into the Transformer's attention mechanism to address the challenges of extremely long input sequences. Instead of relying solely…

Machine Learning · Computer Science 2025-12-30 Yukun Zhang , Xueqing Zhou

Data-driven deep density estimation

Density estimation plays a crucial role in many data analysis tasks, as it infers a continuous probability density function (PDF) from discrete samples. Thus, it is used in tasks as diverse as analyzing population data, spatial locations in…

Machine Learning · Computer Science 2021-07-26 Patrik Puchert , Pedro Hermosilla , Tobias Ritschel , Timo Ropinski

Cloze-driven Pretraining of Self-attention Networks

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. Our model solves a cloze-style word reconstruction task, where…

Computation and Language · Computer Science 2019-03-20 Alexei Baevski , Sergey Edunov , Yinhan Liu , Luke Zettlemoyer , Michael Auli

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long…

Machine Learning · Computer Science 2024-05-08 Zhongkai Hao , Chang Su , Songming Liu , Julius Berner , Chengyang Ying , Hang Su , Anima Anandkumar , Jian Song , Jun Zhu

Meta-Learning for Relative Density-Ratio Estimation

The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its…

Machine Learning · Statistics 2021-07-05 Atsutoshi Kumagai , Tomoharu Iwata , Yasuhiro Fujiwara