统计理论
Out-of-distribution generalization is key to building models that remain reliable across diverse environments. Recent causality-based methods address this challenge by learning invariant causal relationships in the underlying…
In multivariate statistics, estimating the covariance matrix is essential for understanding the interdependence among variables. In high-dimensional settings, where the number of covariates increases with the sample size, it is well known…
Nonlinear time series models with exogenous regressors are essential in econometrics, queuing theory, and machine learning, though their statistical analysis remains incomplete. Key results, such as the law of large numbers and the…
Two key tasks in high-dimensional regularized regression are tuning the regularization strength for accurate predictions and estimating the out-of-sample risk. It is known that the standard approach -- $k$-fold cross-validation -- is…
Stochastic gradient descent (SGD) has emerged as the quintessential method in a data scientist's toolbox. Using SGD for high-stakes applications requires, however, careful quantification of the associated uncertainty. Towards that end, in…
In probabilistic principal component analysis (PPCA), an observed vector is modeled as a linear transformation of a low-dimensional Gaussian factor plus isotropic noise. We generalize PPCA to tensors by constraining the loading operator to…
We present a systematic analysis of estimation errors for a class of optimal transport based algorithms for filtering and data assimilation. Along the way, we extend previous error analyses of Brenier maps to the case of conditional Brenier…
If you tell a learning model that you prefer an alternative $a$ over another alternative $b$, then you probably expect the model to be monotone, that is, the valuation of $a$ increases, and that of $b$ decreases. Yet, perhaps surprisingly,…
We study non-linear Bayesian inverse problems arising from semilinear partial differential equations (PDEs) that can be transformed into linear Bayesian inverse problems. We are then able to extend the early stopping for Ensemble…
We revisit the problem of parameter estimation for discrete probability distributions with values in $\mathbb{Z}^d$. To this end, we adapt a technique called Stein's Method of Moments to discrete distributions which often gives closed-form…
For statistical models on circles, we investigate performance of estimators defined as the projections of the empirical distribution with respect to the Wasserstein distance. We develop algorithms for computing the Wasserstein projection…
The inference of evolutionary histories is a central problem in evolutionary biology. The analysis of a sample of phylogenetic trees can be conducted in Billera-Holmes-Vogtmann tree space, which is a CAT(0) metric space of phylogenetic…
The consistency of posterior distributions in density estimation is at the core of Bayesian statistical theory. Classical work established sufficient conditions, typically combining KL support with complexity bounds on sieves of high prior…
Bayesian linear inverse problems aim to recover an unknown signal from noisy observations, incorporating prior knowledge. This paper analyses a data-dependent method to choose the scale parameter of a Gaussian prior. The method we study…
Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require…
We consider Gaussian and bootstrap approximations for the supremum of additive functionals of aperiodic Harris recurrent Markov chains. The supremum is taken over a function class that may depend on the sample size, which allows for…
We consider the estimation of some parameter $\mathbf{x}$ living in a cone from the nonlinear observations of the form $\{y_i=f_i(\langle\mathbf{a}_i,\mathbf{x}\rangle)\}_{i=1}^m$. We develop a unified approach that first constructs a…
The problem of the mean-square optimal linear estimation of the functional $A\xi=\ \int\limits_{R^s}a(t)\xi(-t)dt,$ which depends on the unknown values of stochastic stationary process $\xi(t)$ from observations of the process…
The problem of optimal estimation of functionals $A\xi =\sum\nolimits_{k=0}^{\infty }{}a(k)\xi (k)$ and ${{A}_{N}}\xi =\sum\nolimits_{k=0}^{N}{}a(k)\xi (k)$ which depend on the unknown values of stochastic sequence $\xi (k)$ with stationary…
Some improved estimators of the location parameters of several exponential distributions with ordered restriction are derived and compared numerically using Monte Carlo simulations. Note that the two-parameter exponential distribution is…