统计理论
We introduce a novel statistical framework for the analysis of replicated point processes that allows for the study of point pattern variability at a population level. By treating point process realizations as random measures, we adopt a…
Extremal graphical models encode the conditional independence structure of multivariate extremes. Key statistics for learning extremal graphical structures are empirical extremal variograms, for which we prove non-asymptotic concentration…
Phase-rectified signal averaging (PRSA) is a widely used algorithm to analyze nonstationary biomedical time series. The method operates by identifying hinge points in the time series according to prescribed rules, extracting segments…
We investigate parameter estimation in subcritical continuous-time birth-and-death processes with multiple births. We show that the classical maximum likelihood estimators for the model parameters, based on the continuous observation of a…
In this paper, we study stochastic ordering results between two finite mixtures with single and multiple outliers, assuming subpopulations follow general exponentiated location-scale distributions. For single-outlier mixtures, several…
A statistical hypothesis test for long range dependence (LRD) in functional time series in manifolds has been formulated in Ruiz-Medina and Crujeiras (2025) in the spectral domain for fully observed functional data. The asymptotic Gaussian…
Motivated by practical applications, I present a novel and comprehensive framework for operator-valued positive definite kernels. This framework is applied to both operator theory and stochastic processes. The first application focuses on…
We consider a bandit problem where the buget is smaller than the number of arms, which may be infinite. In this regime, the usual objective in the literature is to minimize simple regret. To analyze broad classes of distributions with…
In many applications of statistical estimation via sampling, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics, given by a…
Motivated by an application to empirical Bayes learning in high-dimensional regression, we study a class of Langevin diffusions in a system with random disorder, where the drift coefficient is driven by a parameter that continuously adapts…
In modern scientific research, small-scale studies with limited participants are increasingly common. However, interpreting individual outcomes can be challenging, making it standard practice to combine data across studies using random…
Recent work has used optimal transport ideas to generalize the notion of (center-outward) quantiles to dimension $d\geq 2$. We study the robustness properties of these transport-based quantiles by deriving their breakdown point, roughly,…
We show that the stochastic independence of real-valued random variables is equivalent to the conditional uncorrelation, where the conditioning takes place over the Cartesian products of intervals. Next, we express the mutual independence…
The stability of persistent homology has led to wide applications of the persistence diagram as a trusted topological descriptor in the presence of noise. However, with the increasing demand for high-dimension and low-sample-size data…
Permutation tests have been proposed by Albert et al. (2015) to detect dependence between point processes, modeling in particular spike trains, that is the time occurrences of action potentials emitted by neurons. Our present work focuses…
It is well known that Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error (Birg\'e and Massart, 1993). In this paper, we prove that, under relatively mild assumptions, the suboptimality…
It is well known that an $n \times n$ Wishart matrix with $d$ degrees of freedom is close to the appropriately centered and scaled Gaussian Orthogonal Ensemble (GOE) if $d$ is large enough. Recent work of Bubeck, Ding, Eldan, and Racz, and…
Smoothing methods find signals in noisy data. A challenge for Statistical inference is the choice of smoothing parameter. SiZer addressed this challenge in one-dimension by detecting significant slopes across multiple scales, but was not a…
We study the Gaussian sequence model, i.e. $X \sim N(\mathbf{\theta}, I_\infty)$, where $\mathbf{\theta} \in \Gamma \subset \ell_2$ is assumed to be convex and compact. We show that goodness-of-fit testing sample complexity is lower bounded…
We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are usually vulnerable to outliers. In contrast,…