统计理论
The problem of optimal estimation of linear functional ${{A}_{N}}\xi =\sum\limits_{k=0}^{N}{a(k)\xi (k)}\,$ depending on the unknown values of a stochastic sequence $\xi (m)$ with stationary $n$-th increments from observations of the…
We develop a formal statistical framework for classical multidimensional scaling (CMDS) applied to noisy dissimilarity data. We establish distributional convergence results for the embeddings produced by CMDS for various noise models, which…
We study how large an $\ell^2$ ellipsoid is by introducing type-$\tau$ integrals that capture the average decay of its semi-axes. These integrals turn out to be closely related to standard complexity measures: we show that the metric…
We investigate the online detection of changepoints in the distribution of a sequence of observations using degenerate U-statistic-type processes. We study weighted versions of: an ordinary, CUSUM-type scheme, a Page-CUSUM-type scheme, and…
We study a class of degenerate diffusion generators that arise in sequential testing and quickest detection problems with partial information. The observation process is driven by $k$ independent Brownian motions, while the hidden state…
In this paper, we explore the asymptotically optimal tuning parameter choice in ridge regression for estimating nuisance functions of a statistical functional that has recently gained prominence in conditional independence testing and…
Statistical inference for non-stationary data is hindered by the failure of classical central limit theorems (CLTs), not least because there is no fixed Gaussian limit to converge to. To resolve this, we introduce relative weak convergence,…
We investigate stochastic interpolation, a recently introduced framework for high dimensional sampling which bears many similarities to diffusion modeling. Stochastic interpolation generates a data sample by first randomly initializing a…
Selective classification is a powerful tool for automated decision-making in high-risk scenarios, allowing classifiers to act only when confident and abstain when uncertainty is high. Given a target accuracy, our goal is to minimize…
We propose a method for comparing survival data based on the higher criticism of p-values obtained from multiple exact hypergeometric tests. The method accommodates non-informative right-censorship and is sensitive to hazard differences in…
Given $n$ noisy samples with $p$ dimensions, where $n \ll p$, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector $\beta \in {\mathbb R}^p$…
This paper introduces an innovative and intuitive finite population sampling method that has been developed using a unique graphical framework. In this approach, first-order inclusion probabilities are represented as bars on a…
This paper explores hypothesis testing for the parametric forms of the mean and variance functions in regression models under diverging-dimension settings. To mitigate the curse of dimensionality, we introduce weighted residual empirical…
Gaussian process regression is used throughout statistics and machine learning for prediction and uncertainty quantification. A Gaussian process is specified by its mean and covariance functions. Many covariance functions, including…
R\'enyi entropy is an important measure in the context of information theory as a generalization of Shannon entropy. This information measure was often used for uncertainty quantification of dynamical behaviour of stochastic processes. In…
Performative predictions are forecasts which influence the outcomes they aim to predict, undermining the existence of correct forecasts and standard methods of elicitation and estimation. We show that conditioning forecasts on covariates…
This work addresses the interpolation of probability measures within a spatial statistics framework. We develop a Kriging approach in the Wasserstein space, leveraging the quantile function representation of the one-dimensional Wasserstein…
What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace…
Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that…
We establish the validity of bootstrap methods for empirical likelihood (EL) inference under the density ratio model (DRM). In particular, we prove that the bootstrap maximum EL estimators share the same limiting distribution as their…