Related papers: Equations of States in Singular Statistical Estima…
Random matrix theory has proven to be a valuable tool in analyzing the generalization of linear models. However, the generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. To…
The most effective differentially private machine learning algorithms in practice rely on an additional source of purportedly public data. This paradigm is most interesting when the two sources combine to be more than the sum of their…
Learning under one-sided feedback (i.e., where we only observe the labels for examples we predicted positively on) is a fundamental problem in machine learning -- applications include lending and recommendation systems. Despite this, there…
Literatures in state space models focus on parametric inference and prediction, which fail if the state space model is not fully specified and the maximum likelihood estimation does not work. In this paper, we assume the state transition…
Modern applications routinely collect high-dimensional data, leading to statistical models having more parameters than there are samples available. A common solution is to impose sparsity in parameter estimation, often using penalized…
We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted…
We establish in-expectation and tail bounds on the generalization error of representation learning type algorithms. The bounds are in terms of the relative entropy between the distribution of the representations extracted from the training…
For a set of binary response variables, conditional mean models characterize the expected value of a response variable given the others and are popularly applied in longitudinal and network data analyses. The quadratic exponential binary…
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually…
Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the…
Bernstein's condition is a key assumption that guarantees fast rates in machine learning. For example, the Gibbs algorithm with prior $\pi$ has an excess risk in $O(d_{\pi}/n)$, as opposed to the standard $O(\sqrt{d_{\pi}/n})$, where $n$…
Optimization is widely used in statistics, and often efficiently delivers point estimates on useful spaces involving structural constraints or combinatorial structure. To quantify uncertainty, Gibbs posterior exponentiates the negative loss…
We consider the problem of estimating the error variance in a general linear model when the error distribution is assumed to be spherically symmetric, but not necessary Gaussian. In particular we study the case of a scale mixture of…
Standard Bayesian learning is known to have suboptimal generalization capabilities under misspecification and in the presence of outliers. PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a…
While there have been a lot of recent developments in the context of Bayesian model selection and variable selection for high dimensional linear models, there is not much work in the presence of change point in literature, unlike the…
There are many practical difficulties in the calibration of computer models to experimental data. One such complication is the fact that certain combinations of the calibration inputs can cause the code to output data lacking fundamental…
The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in…
Bayesian inference promises a framework for principled uncertainty quantification of neural network predictions. Barriers to adoption include the difficulty of fully characterizing posterior distributions on network parameters and the…
For three decades statistical mechanics has been providing a framework to analyse neural networks. However, the theoretically tractable models, e.g., perceptrons, random features models and kernel machines, or multi-index models and…
High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting…