Related papers: Nonparametric logistic regression with deep learni…
We study mixture of linear regression (random coefficient) models, which capture population heterogeneity by allowing the regression coefficients to follow an unknown distribution $G^*$. In contrast to common parametric methods that fix the…
This paper investigates the expected excess risk of in-context learning (ICL) for multiclass classification. We formalize each task as a sequence of labeled examples followed by a query input; a pretrained model then estimates the query's…
In this work, we investigate Gaussian Mixture Models ({\it abbrv} GMM) and the related problem of non parametric maximum likelihood estimation ({\it abbrv} NPMLE) from the perspective of statistical mechanics. In particular, we establish…
Estimating the Kullback-Leibler (KL) divergence between random variables is a fundamental problem in statistical analysis. For continuous random variables, traditional information-theoretic estimators scale poorly with dimension and/or…
The Kullback-Leibler divergence, the Kullback-Leibler variation, and the Bernstein "norm" are used to quantify discrepancies among probability distributions in likelihood models such as nonparametric maximum likelihood and nonparametric…
This paper provides a unified perspective for the Kullback-Leibler (KL)-divergence and the integral probability metrics (IPMs) from the perspective of maximum likelihood density-ratio estimation (DRE). Both the KL-divergence and the IPMs…
Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there…
We consider the problem of estimating a mixture of power series distributions with infinite support, to which belong very well-known models such as Poisson, Geometric, Logarithmic or Negative Binomial probability mass functions. We consider…
We study the maximum likelihood estimator of density of $n$ independent observations, under the assumption that it is well approximated by a mixture with a large number of components. The main focus is on statistical properties with respect…
The forward Kullback-Leibler (KL) divergence is a ubiquitous objective for fitting a parameterized distribution to samples due to its tractability and equivalence to maximum likelihood estimation (MLE). Its inherent asymmetry, however, may…
Logistic regression is a classical model for describing the probabilistic dependence of binary responses to multivariate covariates. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression,…
Protesting mildly against the notion of an exactly correct parametric model the view is adopted that the logistic regression equation is merely an approximation to the underlying, true function. The behaviour of likelihood based estimators…
We revisit the problem of the existence of the maximum likelihood estimate for multi-class logistic regression. We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The…
We consider the parameter estimation problem of a probabilistic generative model prescribed using a natural exponential family of distributions. For this problem, the typical maximum likelihood estimator usually overfits under limited…
For a parametric model of distributions, the closest distribution in the model to the true distribution located outside the model is considered. Measuring the closeness between two distributions with the Kullback-Leibler (K-L) divergence,…
We consider the fundamental problem of estimating a discrete distribution on a domain of size $K$ with high probability in Kullback-Leibler divergence. We provide upper and lower bounds on the minimax estimation rate, which show that the…
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. From a distributional view, MLE in fact minimizes the Kullback-Leibler divergence (KLD) between the distribution of the…
Nonparametric empirical Bayes methods provide a flexible and attractive approach to high-dimensional data analysis. One particularly elegant empirical Bayes methodology, involving the Kiefer-Wolfowitz nonparametric maximum likelihood…
Accelerated algorithms for maximum likelihood image reconstruction are essential for emerging applications such as 3D tomography, dynamic tomographic imaging, and other high dimensional inverse problems. In this paper, we introduce and…
We employ a parameter-free distribution estimation framework where estimators are random distributions and utilize the Kullback-Leibler (KL) divergence as a loss function. Wu and Vos [J. Statist. Plann. Inference 142 (2012) 1525-1536] show…