统计理论
We study design-unbiased estimation of the finite-population total $\sum_{i=1}^N y_i$ when each outcome satisfies known bounds $y_i\in[a_i,b_i]$. For any sampling design with inclusion probabilities $\pi_i>0$, we prove a sharp lower bound…
Classical kernel density estimation usually derives the AMISE and optimal bandwidth from a pointwise Taylor expansion, which requires twice continuous differentiability. This assumption is stronger than necessary and excludes natural…
The expected signature uniquely determines the law of a random rough path under a moment-growth condition, yet finite-sample bounds for estimating it from a single long dependent trajectory have been lacking. We study a stationary…
We construct and analyze generative diffusions that transport a point mass to a prescribed target distribution over a finite time horizon using the stochastic interpolant framework. The drift is expressed as a conditional expectation that…
Wasserstein distances are widely used in modern data analysis but pose significant computational and statistical challenges in high dimensions. The sliced Wasserstein distance alleviates these challenges by leveraging one-dimensional…
We consider a process $X^\ve$ that solves a stochastic Volterra equation with an unknown parameter $\theta^\star$ in the drift function. The Volterra kernel is singular, and includes as an example, $K\_0(u)=c u^{\alpha-1/2} \id{u>0}$ with…
This paper re-examines the limit theorems of Abadie and Imbens for nearest-neighbor matching estimators of average treatment effects with a fixed number of matches. We establish, for the first time, a non-normalized central limit theorem…
Independent component (IC) models are a standard tool for representing multivariate data in statistics, signal processing, and machine learning. Despite the extensive use of IC models, much less attention has been given to goodness-of-fit…
Importance sampling with data-driven proposal distributions is widely used in practice. A common workflow first generates an auxiliary sample of size $N$ from an approximation of the target distribution, constructs a density estimate $\hat…
This paper studies a uniform projection criterion for space-filling designs under the stratified $L_2$-discrepancy. The criterion, denoted by $\Phi_{SD}$, is the average squared stratified $L_2$-discrepancy over all two-dimensional…
We study the problem of estimating a monotone function $f:\{0,1\}^d\to[0,1]$ from noisy observations at uniformly random vertices of the Boolean hypercube. As a measure of complexity for the target~$f$, we use the total $L^1$-influence…
We study high-dimensional inference in correlated two-view models, focusing on spectral methods for strong detection and weak recovery. We introduce a general framework, motivated by a TAP type heuristic from statistical physics, that…
This paper generalises inference functions (Godambe, 1960) to distributional statistical models, in which each probability measure is represented by a distribution--kernel pair $(T_\theta, \varphi) \in \mathcal S'(\mathbb R) \times \mathcal…
Given an observation $\mathbf Y \in \mathbb{R}^{d_1\times d_2}$ from the model $\mathbf Y = \mathbf X + \mathbf E$ where $\mathbf X$ is constant and $\mathbf E$ has i.i.d. $N(0,1)$ entries, we consider the problem of detecting a planted…
We introduce entropic strict minimum message length (SMML), a risk-sensitive generalization of strict minimum message length coding. The proposed criterion replaces expected two-part codelength under the prior predictive distribution with…
The asymptotic properties of multivariate Sz\'{a}sz-Mirakyan estimators for cumulative distribution functions (cdf) supported on the nonnegative orthant are investigated. Explicit bias and variance expansions are derived on compact subsets…
We consider a process $X^\ve$ solution of a stochastic Volterra equation with an unknown parameter $\theta^\star$ in the drift function. The Volterra kernel is singular near zero, exhibiting a behavior comparable to $K\_0(u)=cu^{\alpha-1}…
The rank correlation \xi(X,Y), recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in [0,1], where 0 characterizes independence of X and Y, and 1 characterizes perfect dependence of Y on…
High-dimensional linear regression has been thoroughly studied in the context of independent and identically distributed data. We propose to investigate high-dimensional regression models for independent but non-identically distributed…
Randomized experiments are the gold standard for estimating treatment effects, and randomization serves as a reasoned basis for inference. In widely used stratified randomized experiments, randomization-based finite-population asymptotic…