统计理论
We consider a fractional Brownian motion with unknown linear drift such that the drift coefficient has a prior normal distribution and construct a sequential test for the hypothesis that the drift is positive versus the alternative that it…
We define nearest-neighbour point processes on graphs with Euclidean edges and linear networks. They can be seen as the analogues of renewal processes on the real line. We show that the Delaunay neighbourhood relation on a tree satisfies…
We consider in this paper a stochastic process that mixes in time, according to a nonobserved stationary Markov selection process, two separate sources of randomness: i) a stationary process which distribution is accessible (gold standard);…
We establish a general concentration result for the 1-Wasserstein distance between the empirical measure of a sequence of random variables and its expectation. Unlike standard results that rely on independence (e.g., Sanov's theorem) or…
Non-monotonic ageing notions are looked upon as an extension of the corresponding monotonic ageing notions in this work. In particular, the New Better than Used in Expectation (NBUE) and the corresponding non-monotonic analogue New Worse…
Score-based diffusion models have become a powerful framework for generative modeling, with score estimation as a central statistical bottleneck. Existing guarantees for score estimation largely focus on light-tailed targets or rely on…
We study estimation of the intercept parameter in an integrated Galton-Watson process, a basic building-block for many count-valued time series models. In this unit root setting, the ordinary least squares estimator is inconsistent, whereas…
This paper establishes the theoretical foundations for the asymptotic separability of Gaussian Mixture Models (GMMs) in high dimensions by extending the classical Feldman-H\'ajek theorem. We first prove that a countable mixture of Gaussian…
Understanding and quantifying causal relationships between variables is essential for reasoning about the physical world. In this work, we develop a resource-theoretic framework to do so. Here, we focus on the simplest nontrivial setting --…
This paper revisits the classical problem of interval estimation of a binomial proportion under Huber contamination. Our main result derives the rate of optimal interval length when the contamination proportion is unknown under a local…
We study a sequential Monte Carlo algorithm to sample from the Gibbs measure with a non-convex energy function at a low temperature. We use the practical and popular geometric annealing schedule, and use a Langevin diffusion at each…
The application of semiparametric efficient estimators, particularly those that leverage machine learning, is rapidly expanding within epidemiology and causal inference. This literature is increasingly invoking the Riesz representation…
The posterior predictive $p$-value (ppp) is widely used in Bayesian model evaluation. However, due to double use of the data, the ppp may not be a valid $p$-value even in large samples: The asymptotic null distribution of the ppp can be…
A statistical model is said to be calibrated if the resulting mean estimates perfectly match the true means of the underlying responses. Aiming for calibration is often not achievable in practice as one has to deal with finite samples of…
A spherical $t$-design is a finite subset $X$ of the unit sphere such that every polynomial of degree at most $t$ has the same average over $X$ as it does over the entire sphere. Determining the minimum possible size of spherical designs,…
Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics,…
Testing the equality of mean vectors across $g$ different groups plays an important role in many scientific fields. In regular frameworks, likelihood-based statistics under the normality assumption offer a general solution to this task.…
As network data has become ubiquitous in the sciences, there has been growing interest in network models whose structure is driven by latent node-level variables in a (typically low-dimensional) latent geometric space. These "latent…
Hypothesis testing problems for circular data are formulated, where observations take values on the unit circle and may contain a hidden, phase-coherent structure. Under the null, the data are independent uniform on the unit circle; under…
This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function…