统计理论
In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates…
The least trimmed squares (LTS) estimator is popular in location, regression, machine learning, and AI literature. Despite the empirical version of least trimmed squares (LTS) being repeatedly studied in the literature, the population…
The functional linear regression model has been widely studied and utilized for dealing with functional predictors. In this paper, we study the Nystr\"om subsampling method, a strategy used to tackle the computational complexities inherent…
A variety of statistics based on sample spacings has been studied in the literature for testing goodness-of-fit to parametric distributions. To test the goodness-of-fit to a nonparametric class of univariate shape-constrained densities,…
Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem.…
Consider the sum $Y=B+B(H)$ of a Brownian motion $B$ and an independent fractional Brownian motion $B(H)$ with Hurst parameter $H\in(0,1)$. Even though $B(H)$ is not a semimartingale, it was shown in [\textit{Bernoulli} \textbf{7} (2001)…
This work studies the distribution of the nonsymmetric matrix $\mathbf{E}^{-1}\mathbf{H}$. This random product is of fundamental interest under the general multivariate linear hypothesis setting. Specifically when $\mathbf{H}$ and…
The scale function holds significant importance within the fluctuation theory of Levy processes, particularly in addressing exit problems. However, its definition is established through the Laplace transform, thereby lacking explicit…
Robust Bayesian methods for high-dimensional regression problems under diverse sparse regimes are studied. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens of thousands of predictors in the…
The Gromov-Wasserstein (GW) distance enables comparing metric measure spaces based solely on their internal structure, making it invariant to isomorphic transformations. This property is particularly useful for comparing datasets that…
In this paper, we investigate inaccuracy measures based on record values, focusing on the relationship between the distribution of the n-th upper and lower k-record values and the parent distribution. We extend the classical Kerridge…
Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data. Most recent results focus primarily on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian…
Supervised learning problems with side information in the form of a network arise frequently in applications in genomics, proteomics and neuroscience. For example, in genetic applications, the network side information can accurately capture…
We develop a Nonparametric Empirical Bayes (NEB) framework for compound estimation in the discrete linear exponential family, which includes a wide class of discrete distributions frequently arising from modern big data applications. We…
Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case…
We investigate the complexity of covariance matrix estimation for Gibbs distributions based on dependent samples from a Markov chain. We show that when $\pi$ satisfies a Poincar\'e inequality and the chain possesses a spectral gap, we can…
Monte Carlo matrix trace estimation is a popular randomized technique to estimate the trace of implicitly-defined matrices via averaging quadratic forms across several observations of a random vector. The most common approach to analyze the…
Higher-order spectra (or polyspectra), defined as the Fourier Transform of a stationary process' autocumulants, are useful in the analysis of nonlinear and non Gaussian processes. Polyspectral means are weighted averages over Fourier…
We study the problem of bivariate discrete or continuous probability density estimation under low-rank constraints.For discrete distributions, we assume that the two-dimensional array to estimate is a low-rank probability matrix. In the…
Many methods for estimating integrated volatility and related functionals of semimartingales in the presence of jumps require specification of tuning parameters for their use in practice. In much of the available theory, tuning parameters…