统计理论
Efron et al. (2004) introduced least angle regression (LAR) as an algorithm for linear predictions, intended as an alternative to forward selection with connections to penalized regression. However, LAR has remained somewhat of a "black…
Testing whether two multivariate samples exhibit the same extremal behavior is an important problem in various fields including environmental and climate sciences. While several ad-hoc approaches exist in the literature, they often lack…
Federated learning enables institutions to train predictive models collaboratively without sharing raw data, addressing privacy and regulatory constraints. In the standard horizontal setting, clients hold disjoint cohorts of individuals and…
We tackle the natural question of whether it is possible to estimate conditional distributions via Sklar's theorem by separately estimating the conditional distributions of the underlying copula and the marginals. Working with so-called…
Contaminated mixture of experts (MoE) is motivated by transfer learning methods where a pre-trained model, acting as a frozen expert, is integrated with an adapter model, functioning as a trainable expert, in order to learn a new task.…
We consider the efficient inference of finite dimensional parameters arising in the context of inverse problems. Our setup is the observation of a transformation of an unknown infinite dimensional signal $f$ corrupted by statistical noise,…
We consider the heat equation with absorption in a bounded domain of $\mathbb{R}^d$, where both the scalar diffusivity and the absorption function are unknown. We investigate a Bayesian approach for recovering the diffusivity from a noisy…
We study the asymptotic behavior of the spectra of matrices of the form $S_n = \frac{1}{n}XX^*$ where $X =\sum_{r=1}^K X_r$, where $X_r = A_r^\frac{1}{2}Z_rB_r^\frac{1}{2}$, $K \in \mathbb{N}$ and $A_r,B_r$ are sequences of positive…
Finite mixture models are ubiquitous in modern statistical modeling, and a recurring practical issue is choosing the model order. In \citet[Sankhy\=a Series A, \textbf62, pp. 49--66]{keribin2000consistent}, the Bayesian information…
Species sampling processes have long served as the fundamental framework for modeling random discrete distributions and exchangeable sequences. However, data arising from distinct but related sources require a broader notion of…
Conditional Feature Importance (CFI) is a classical variable importance measure that accounts for the relationship between the studied feature and the others. However, CFI has not yet been studied from a theoretical perspective because the…
We present a new framework for statistical inference on Riemannian manifolds that achieves high-order accuracy, addressing the challenges posed by non-Euclidean parameter spaces frequently encountered in modern data science. Our approach…
The envelope of an elliptical Gaussian complex vector, or equivalently, the amplitude or norm of a bivariate normal random vector has application in many weather and signal processing contexts. We explicitly characterize its distribution in…
Conditional independence testing (CIT) is essential for reliable scientific discovery. It prevents spurious findings and enables controlled feature selection. Recent CIT methods have used machine learning (ML) models as surrogates of the…
We propose a novel framework for measuring privacy from a Bayesian game-theoretic perspective. This framework enables the creation of new, purpose-driven privacy definitions that are rigorously justified, while also allowing for the…
Conformal prediction (CP) is a distribution-free method to construct reliable prediction intervals that has gained significant attention in recent years. Despite its success and various proposed extensions, a significant practical feature…
Stochastic optimization in learning and inference often relies on Markov chain Monte Carlo (MCMC) to approximate gradients when exact computation is intractable. However, finite-time MCMC estimators are biased, and reducing this bias…
Change point detection in covariance structures is a fundamental and crucial problem for sequential data. Under the high-dimensional setting, most of the existing research has focused on identifying change points in historical data.…
A well-defined distance on the parameter space is key to evaluating estimators, ensuring consistency, and building confidence sets. While there are typically standard distances to adopt in a continuous space, this is not the case for…
Recently, weighted cumulative residual Tsallis entropy has been introduced in the literature as a generalization of weighted cumulative residual entropy. We study some new properties of weighted cumulative residual Tsallis entropy measure.…