统计理论
This paper focuses on estimating the invariant density function $f_X$ of the strongly mixing stationary process $X_t$ in the multiplicative measurement errors model $Y_t = X_t U_t$, where $U_t$ is also a strongly mixing stationary process.…
We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles, enabling efficient and consistent tuning of regularization and sketching…
A seminal result in the ICA literature states that for $AY = \varepsilon$, if the components of $\varepsilon$ are independent and at most one is Gaussian, then $A$ is identified up to sign and permutation of its rows (Comon, 1994). In this…
This work addresses the problem of high-dimensional classification by exploring the generalized Bayesian logistic regression method under a sparsity-inducing prior distribution. The method involves utilizing a fractional power of the…
This paper proposes a regression tree procedure to estimate conditional copulas. The associated algorithm determines classes of observations based on covariate values and fits a simple parametric copula model on each class. The association…
Graphical models find numerous applications in biology, chemistry, sociology, neuroscience, etc. While substantial progress has been made in graph estimation, it remains largely unexplored how to select significant graph signals with…
In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. In particular, we design and analyze two computational efficient algorithms to partition data…
In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it…
Sobol' sensitivity index estimators for stochastic models are functions of nested Monte Carlo estimators, which are estimators built from two nested Monte Carlo loops. The outer loop explores the input space and, for each of the…
Non-Euclidean data become more prevalent in practice, necessitating the development of a framework for statistical inference analogous to that for Euclidean data. Quantile is one of the most important concepts in traditional statistical…
Categorical responses arise naturally within various scientific disciplines. In many circumstances, there is no predetermined order for the response categories, and the response has to be modeled as nominal. In this study, we regard the…
Identifiability of discrete statistical models with latent variables is known to be challenging to study, yet crucial to a model's interpretability and reliability. This work presents a general algebraic technique to investigate…
For statistical inference on an infinite-dimensional Hilbert space $\H $ with no moment conditions we introduce a new class of energy distances on the space of probability measures on $\H$. The proposed distances consist of the integrated…
This paper deals with a projection least squares estimator of the drift function of a jump diffusion process $X$ computed from multiple independent copies of $X$ observed on $[0,T]$. Risk bounds are established on this estimator and on an…
We prove that kernel density estimation on symmetric spaces of non-compact type, whose L2-risk was bounded above in previous work (Asta,2021), in fact achieves a minimax rate of convergence. With this result, the story for kernel density…
A variety of interesting parameters may depend on high dimensional regressions. Machine learning can be used to estimate such parameters. However estimators based on machine learners can be severely biased by regularization and/or model…
Understanding the distributions of spectral estimators in low-rank random matrix models, also known as signal-plus-noise matrix models, is fundamentally important in various statistical learning problems, including network analysis, matrix…
In logistic regression modeling, Firth's modified estimator is widely used to address the issue of data separation, which results in the nonexistence of the maximum likelihood estimate. Firth's modified estimator can be formulated as a…
Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can…
We study when low coordinate degree functions (LCDF) -- linear combinations of functions depending on small subsets of entries of a vector -- can hypothesis test between high-dimensional probability measures. These functions are a…