统计理论
The application of semiparametric efficient estimators, particularly those that leverage machine learning, is rapidly expanding within epidemiology and causal inference. This literature is increasingly invoking the Riesz representation…
The posterior predictive $p$-value (ppp) is widely used in Bayesian model evaluation. However, due to double use of the data, the ppp may not be a valid $p$-value even in large samples: The asymptotic null distribution of the ppp can be…
A statistical model is said to be calibrated if the resulting mean estimates perfectly match the true means of the underlying responses. Aiming for calibration is often not achievable in practice as one has to deal with finite samples of…
A spherical $t$-design is a finite subset $X$ of the unit sphere such that every polynomial of degree at most $t$ has the same average over $X$ as it does over the entire sphere. Determining the minimum possible size of spherical designs,…
Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics,…
Testing the equality of mean vectors across $g$ different groups plays an important role in many scientific fields. In regular frameworks, likelihood-based statistics under the normality assumption offer a general solution to this task.…
As network data has become ubiquitous in the sciences, there has been growing interest in network models whose structure is driven by latent node-level variables in a (typically low-dimensional) latent geometric space. These "latent…
Hypothesis testing problems for circular data are formulated, where observations take values on the unit circle and may contain a hidden, phase-coherent structure. Under the null, the data are independent uniform on the unit circle; under…
This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function…
We consider estimation of the drift parameter $\vartheta>0$ in a \emph{partially observed} Ornstein--Uhlenbeck type model driven by a mixed fractional Brownian noise. Our framework extends the partially observed model of…
While ridges in the scalogram, determined by the squared modulus of analytic wavelet transform (AWT), is a widely accepted concept and utilized in nonstationary time series analysis, their behavior in noisy environments remains…
From the observation of a diffusion path $(X_t)_{t\in [0,T]}$ on a compact connected $d$-dimensional manifold $\mathcal{M}$ without boundary, we consider the problem of estimating the stationary measure $\mu$ of the process. Wang and Zhu…
In a mixed generalized linear model, the goal is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two…
This paper establishes convergence rates for learning elliptic pseudo-differential operators, a fundamental operator class in partial differential equations and mathematical physics. In a wavelet-Galerkin framework, we formulate learning…
We establish sharp non-asymptotic probabilistic bounds for the star discrepancy of double-infinite random matrices -- a canonical model for sequences of random point sets in high dimensions. By integrating the recently proved…
We present two main contributions to the expected star discrepancy theory. First, we derive a sharper expected upper bound for jittered sampling, improving the leading constants and logarithmic terms compared to the state-of-the-art [Doerr,…
Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds…
Latin hypercube sampling (LHS) is a widely used stratified sampling method in computer experiments. In this work, we extend the existing convergence results for the sample mean under LHS to the broader class of $Z$-estimators, estimators…
We study the Gibbs posterior distribution for sparse deep neural nets in a nonparametric regression setting. The posterior can be accessed via Metropolis-adjusted Langevin algorithms. Using a mixture over uniform priors on sparse sets of…
This paper investigates the nonparametric estimation of a heteroskedastic variance function on the sphere in a regression framework, assuming the variance belongs to a Besov regularity class. A needlet-based estimator is proposed, combining…