统计理论
As for other latent-variable problems, exact Bayesian analysis is typically not practicable for mixture problems and approximate methods have been developed. Variational Bayes tends to produce approximate posterior distributions for…
When $K$ models are evaluated on the same validation set of size $n$, the selected winner's apparent performance is biased upward. Suppose $K$ models are evaluated on a shared sequence of i.i.d. observations $X_1,\dots, X_n$, where model…
We prove finite-sample concentration and anti-concentration bounds for dimension estimation using Gaussian kernel sums. Our bounds provide explicit dependence on sample size, bandwidth, and local geometric and distributional parameters,…
In differential privacy, random noise is introduced to privatize summary statistics of a sensitive dataset before releasing them. The noise level determines the privacy loss, which quantifies how easily an adversary can detect a target…
This paper formally derives the asymptotic distribution of a goodness-of-fit test based on the Kernel Stein Discrepancy introduced in (Oscar Key et al., "Composite Goodness-of-fit Tests with Kernels", Journal of Machine Learning Research…
This article demonstrates how recent developments in the theory of empirical processes allow us to construct a new family of asymptotically distribution-free smooth tests. Their distribution-free property is preserved even when the…
Sparsity in a regression context makes the model itself an object of interest, pointing to a confidence set of models as the appropriate presentation of evidence. A difficulty in areas such as genomics, where the number of candidate…
In recent years, there has been a growing interest in information measures that quantify inaccuracy and uncertainty in systems. In this paper, we introduce a novel concept called the Weighted Fractional Cumulative Residual Inaccuracy…
We consider the problem of estimating quantile treatment effects without assuming strict overlap , i.e., we do not assume that the propensity score is bounded away from zero. More specifically, we consider an inverse probability weighting…
The Grenander estimator is a well-studied procedure for univariate nonparametric density estimation. It is usually defined as the Maximum Likelihood Estimator (MLE) over the class of all non-increasing densities on the positive real line.…
We consider the sparse estimation for stochastic processes with possibly infinite-dimensional nuisance parameters, by using the Dantzig selector which is a sparse estimation method similar to $Z$-estimation. When a consistent estimator for…
The integrated density of states (IDS) is a fundamental spectral quantity for quantum Hamiltonians modeling condensed matter systems, describing how densely energy levels are distributed. It can be interpreted as a volume-averaged spectral…
Transfer learning aims to improve inference in a target domain by leveraging information from related source domains, but its effectiveness critically depends on how cross-domain heterogeneity is modeled and controlled. When the conditional…
The global clustering coefficient serves as a powerful metric for the structural analysis and comparison of complex networks. Random geometric graphs offer a realistic framework for representing the spatial constraints and geometry often…
We introduce a novel class of non-stationary covariance functions for random fields on linear networks that allows both the variance and the correlation range of the random field to vary spatially. The proposed covariance functions are…
We provide a rigorous random matrix theory analysis of spiked cross-covariance models where the signals across two high-dimensional data channels are partially aligned. These models are motivated by multi-modal learning and form the…
We study one-sided and $\alpha$-correct sequential hypothesis testing for data generated by an ergodic Markov chain. The null hypothesis is that the unknown transition matrix belongs to a prescribed set $P$ of stochastic matrices, and the…
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth labels and the teacher's own predictions using the same architecture and training data. Although SD has been empirically shown to often improve…
Consider the sequential testing of binary outcomes. The a posteriori belief process and its objective conditional-probability counterpart generally differ but converge to the same result in well-defined tests. We show that unless the two…
Changepoint localization is the problem of estimating the index at which a change occurred in the data generating distribution of an ordered list of data, or declaring that no change occurred. We present the broadly applicable MCP…