统计理论
In this paper we introduce two procedures for variable selection in cluster analysis and classification rules. One is mainly oriented to detect the noisy non-informative variables, while the other deals also with multicolinearity. A…
In this short note, we present a refined approximation for the log-ratio of the density of the von Mises$(\mu,\kappa)$ distribution (also called the circular normal distribution) to the standard (linear) normal distribution when the…
Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the…
In this paper, we introduce a joint central limit theorem (CLT) for specific bilinear forms, encompassing the resolvent of the sample covariance matrix under an elliptical distribution. Through an exhaustive exploration of our theoretical…
Preferential attachment models of network growth are bivariate heavy tailed models for in- and out-degree with limit measures which either concentrate on a ray of positive slope from the origin or on all of the positive quadrant depending…
We provide an overview of recent progress in statistical inverse problems with random experimental design, covering both linear and nonlinear inverse problems. Different regularization schemes have been studied to produce robust and stable…
Researchers often hold the belief that random forests are "the cure to the world's ills" (Bickel, 2010). But how exactly do they achieve this? Focused on the recently introduced causal forests (Athey and Imbens, 2016; Wager and Athey,…
We consider the concept of Bayes risk in the context of finite-dimensional ill-posed linear inverse problem with Gaussian prior and noise models. In this note, we rederive the following well-known result: in the present Gaussian linear…
We consider finite-dimensional Bayesian linear inverse problems with Gaussian priors and additive Gaussian noise models. The goal of this note is to present a simple derivation of the well-known fact that solving the Bayesian D-optimal…
This paper develops a general asymptotic theory of local polynomial (LP) regression for spatial data observed at irregularly spaced locations in a sampling region $R_n \subset \mathbb{R}^d$. We adopt a stochastic sampling design that can…
Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been…
We study a minimax risk of estimating inverse functions on a plane, while keeping an estimator is also invertible. Learning invertibility from data and exploiting an invertible estimator are used in many domains, such as statistics,…
That science and other domains are now largely data-driven means virtually unlimited opportunities for statisticians. With great power comes responsibility, so it's imperative that statisticians ensure that the methods being developing to…
Classical mathematical statistics deals with models that are parametrized by a Euclidean, i.e. finite dimensional, parameter. Quite often such models have been and still are chosen in practical situations for their mathematical simplicity…
This work considers the asymptotic behavior of the distance between two sample covariance matrices (SCM). A general result is provided for a class of functionals that can be expressed as sums of traces of functions that are separately…
This paper studies the asymptotic spectral properties of the sample covariance matrix for high dimensional compositional data, including the limiting spectral distribution, the limit of extreme eigenvalues, and the central limit theorem for…
We study the use of a deep Gaussian process (DGP) prior in a general nonlinear inverse problem satisfying certain regularity conditions. We prove that when the data arises from a true parameter $\theta^*$ with a compositional structure, the…
Fisher's fiducial argument is widely viewed as a failed version of Neyman's theory of confidence limits. But Fisher's goal -- Bayesian-like probabilistic uncertainty quantification without priors -- was more ambitious than Neyman's, and…
Dependence is undoubtedly a central concept in statistics. Though, it proves difficult to locate in the literature a formal definition which goes beyond the self-evident 'dependence = non-independence'. This absence has allowed the term…
We investigate unbiased high-dimensional mean estimators in differential privacy. We consider differentially private mechanisms whose expected output equals the mean of the input dataset, for every dataset drawn from a fixed bounded…