Mathematics
We perform a mathematical and statistical analysis of the Wasserstein least squares problem, a regression method for vector-valued covariates and distribution-valued responses. Our proposal contrasts with other distributional regression…
A central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The…
We study estimation in the low signal-to-noise ratio (SNR) regime for a broad class of Gaussian latent-variable models, including Gaussian mixtures and orbit recovery problems. We show that, in this regime, the generalized method-of-moments…
Hjort and Glad (1995) present a method for semiparametric density estimation. Relative to the ordinary kernel density estimator, this technique performs much better when a parametric vehicle distribution fits the data, and otherwise…
The term Gibbons conjecture is widely used in connection with symmetry results for the Allen-Cahn equation. However, its origin is less transparent than its frequent citation suggests. In this note, we revisit its emergence, tracing it to a…
We introduce the \emph{Topological Stability Index} (TSI), a variance-based scalar measure for persistence barcodes that quantifies the dispersion of persistence lifetimes. Unlike persistent entropy, which depends only on normalized…
We introduce an Indian-buffet-type model for multi-factorial innovation in which each arriving agent may exhibit both previously observed and new features. The number of new features follows a power-law behavior, while the probability of…
We develop a natural Bayesian multiplicity-correcting prior distribution within the probabilistic forward stepwise representation of model space priors for regression problems. The proposed prior, obtained from making an analogy to the Holm…
We derive a scale-free bound on the density of the maximum of a centered Gaussian vector. The basic bound is non-uniform, depends logarithmically on the dimension, and allows any covariance matrix. When the largest marginal variance is…
Being encouraged by [AKRS] that provides an amazing bridge between Statistics and Invariant Theory, and especially by [FM], where quiver semi-invariant techniques apply to verify the existence of MLE for a recent iPCA model, we provide an…
In this paper, we consider the problem of simultaneous testing of multivariate normal means under arbitrary covariance dependence. Specifically, let $\boldsymbol{X}\sim N_n(\boldsymbol{\theta},\boldsymbol{\Sigma})$, where…
Many Bayesian inference problems involve high-dimensional models where the performance of standard importance sampling (IS) methods often degrades rapidly as the dimensionality increases. Classical analyses of IS typically rely on the…
We study a generalization of the classical hidden clique problem to graphs with real-valued edge weights. Formally, we define a hypothesis testing problem. Under the null hypothesis, edges of a complete graph on $n$ vertices are associated…
We introduce a new version of the KL-divergence for Gaussian distributions which is based on Wasserstein geometry and referred to as WKL-divergence. We show that this version is consistent with the geometry of the sample space ${\Bbb R}^n$.…
This paper presents a multivariate normal integral representation for the joint survival function of the cumulative sums of the components of any multinomial random vector at interior lattice points. This result can be viewed as a…
The proposed Goodness--of--Fit (GoF) test for checking the linear autocorrelation model in a functional time series is based on an empirical process, whose residual marks and covariate index set are in a separable Hilbert space \mathbb{H}.…
Integrating probability and nonprobability survey samples is an important problem in modern survey sampling. Nonprobability samples often contain rich outcome information but may lack population representativeness, whereas probability…
In Grayson's combinatorial description of higher K-groups, the generators are bounded acyclic binary multi-complexes of arbitrary size. Generalising work by Kasprowski, Winges and the author, we show in this paper that multi-complexes of…
The literature on hypothesis testing with data-dependent and post-hoc significance levels relies on a particular extension of the Type-I error to data-dependent levels. Existing arguments for this extension are heuristic, and primarily…
For models evaluated at a random set of independent variables, the variance-based Shapley effects range between Sobol' indices, and the corresponding total indices admit derivative-based upper-bounds. Such relationships fail when the inputs…