统计理论
We study a stochastic heat equation with piecewise constant diffusivity $\theta$ having a jump at a hypersurface $\Gamma$ that splits the underlying space $[0,1]^d$, $d\geq2,$ into two disjoint sets $\Lambda_-\cup\Lambda_+.$ Based on…
We study discrete random fields $\{X_t: t\in \mathbb{Z}^d\}$ parameterized on the $d$-dimensional integer lattice $\mathbb{Z}^d$. For a fixed threshold $u$, the excursion set $\{t \in \mathbb{Z}^d : X_t > u\}$ decomposes into connected…
We introduce a new dependence order, termed the conditional convex order, whose minimal and maximal elements characterize independence and perfect dependence. Moreover, it characterizes conditional independence, satisfies information…
Given a probability measure with density, Fermat distances and density-driven metrics are conformal transformations of the Euclidean metric that shrink distances in high density areas and enlarge distances in low density areas. Although…
We introduce a general framework for testing temporal symmetries in time series based on the distribution of ordinal patterns. While previous approaches have focused on specific forms of asymmetry, such as time reversal, our method provides…
Correlation analysis is a fundamental problem in statistics. In this paper, we consider the correlation detection problem between a pair of Erdos-Renyi graphs. Specifically, the problem is formulated as a hypothesis testing problem: under…
This study examines generalized cross-validation for the tuning parameter selection for ridge regression in high-dimensional misspecified linear models. The set of candidates for the tuning parameter includes not only positive values but…
We develop a statistical proxy framework for retrieval-augmented generation (RAG), designed to formalize how a language model (LM) should balance its own predictions with retrieved evidence. For each query x, the system combines a frozen…
We consider non-linear regression models corrupted by generic noise when the regression functions form a non-linear subspace of L^2, relevant in non-linear PDE inverse problems and data assimilation. We show that when the score of the model…
We develop a data-driven algorithm for automatically selecting the regularisation parameter in Bayesian inversion under random tree Besov priors. One of the key challenges in Bayesian inversion is the construction of priors that are both…
We consider the problem of learning the network of mutual excitations (i.e., the dependency graph) in a non-stationary, multivariate Hawkes process. We consider a general setting where baseline rates at each node are time-varying and delay…
Algorithmic stability is a central concept in statistics and learning theory that measures how sensitive an algorithm's output is to small changes in the training data. Stability plays a crucial role in understanding generalization,…
Randomness (in the sense of being generated in an IID fashion) and exchangeability are standard assumptions in nonparametric statistics and machine learning, and relations between them have been a popular topic of research. This short paper…
We study the problem of coincidence detection in time series data, where we aim to determine whether the appearance of simultaneous or near-simultaneous events in two time series is indicative of some shared underlying signal or…
We consider the design of smoothings of the (coordinate-wise) max function in $\mathbb{R}^d$ in the infinity norm. The LogSumExp function $f(x)=\ln(\sum^d_i\exp(x_i))$ provides a classical smoothing, differing from the max function in value…
Instrumental variable regression is a foundational tool for causal analysis across the social and biomedical sciences. Recent advances use kernel methods to estimate nonparametric causal relationships, with general data types, while…
Text watermarking plays a crucial role in ensuring the traceability and accountability of large language model (LLM) outputs and mitigating misuse. While promising, most existing methods assume perfect pseudorandomness. In practice,…
This paper presents several situations leading to the observation of multiple correlated copies of a drifted process, and then non-asymptotic risk bounds are established on nonparametric estimators of the drift function $b_0$ and its…
The validity of classical hypothesis testing requires the significance level $\alpha$ be fixed before any statistical analysis takes place. This is a stringent requirement. For instance, it prohibits updating $\alpha$ during (or after) an…
A common object to describe the extremal dependence of a $d$-variate random vector $X$ is the stable tail dependence function $L$. Various parametric models have emerged, with a popular subclass consisting of those stable tail dependence…