统计理论
We conduct an in-depth analysis of the Bayes risk of clustering in the context of Hidden Markov and i.i.d. models. In both settings, we identify the situations where this risk is comparable to the Bayes risk of classification and those…
We study nonparametric Bayesian inference for the intensity function of a covariate-driven point process. We extend recent results from the literature, showing that a wide class of Gaussian priors, combined with flexible link functions,…
The graphical lasso (glasso) is an $l_1$ penalised likelihood estimator for a Gaussian precision matrix. A benefit of the glasso is that it exists even when the sample covariance matrix is not positive definite but only positive…
This work proposes a method for modeling and forecasting mortality rates. It constitutes an improvement over previous studies by incorporating both the historical evolution of the mortality phenomenon and its random behavior. In the first…
This paper introduces a novel framework to construct the probability density function (PDF) of non-negative continuous random variables. The proposed framework uses two functions: one is the survival function (SF) of a non-negative…
Sampling from binary quadratic distributions (BQDs) is a fundamental but challenging problem in discrete optimization and probabilistic inference. Previous work established theoretical guarantees for stochastic localization (SL) in…
We investigate nonparametric estimation of sliced inverse regression (SIR) via the $k$-nearest neighbors approach with a kernel. An estimator of the covariance matrix of the conditional expectation of the explanatory random vector given the…
Optimal Transport (OT) is a resource allocation problem with applications in biology, data science, economics and statistics, among others. In some of the applications, practitioners have access to samples which approximate the continuous…
We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size $n_1 \times n_2$. Under the null hypothesis, the observed graph is drawn from a bipartite Erd\H{o}s-Renyi…
We study Bayesian posterior consistency in parametric density models with proper priors, challenging the perception that the problem is settled. Classical results established consistency via MLE convergence under regularity and…
In this paper, we propose two new estimators of the multivariate rank correlation coefficient Spearman's footrule which are based on two general estimators for Average Orthant Dependence measures. We compare the new proposals with a…
This paper studies phase transitions for the existence of unregularized M-estimators under proportional asymptotics where the sample size $n$ and feature dimension $p$ grow proportionally with $n/p \to \delta \in (1, \infty)$. We study the…
The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d.\ samples required to distinguish between two distributions $p$ and $q$ in either: (i) the prior-free setting, with type-I error at most $\alpha$ and…
The spectral clustering algorithm is often used as a binary clustering method for unclassified data by applying the principal component analysis. To study theoretical properties of the algorithm, the assumption of conditional…
Randomized singular value decomposition (RSVD) is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices. Given an $m \times n$ matrix $\widehat{{\mathbf M}}$, the prototypical RSVD algorithm…
The composite binary hypothesis testing problem within the Neyman-Pearson framework is considered. The goal is to maximize the expectation of a nonlinear function of the detection probability, integrated with respect to a given probability…
We study the stochastic linear bandit problem with multiple arms over $T$ rounds, where the covariate dimension $d$ may exceed $T$, but each arm-specific parameter vector is $s$-sparse. We begin by analyzing the sequential estimation…
Patients with chronic diseases often receive treatments at multiple time points, or stages. Our goal is to learn the optimal dynamic treatment regime (DTR) from longitudinal patient data. When both the number of stages and the number of…
Statistical depth functions order the elements of a space with respect to their centrality in a probability distribution or dataset. Since many depth functions are maximized in the real line by the median, they provide a natural approach to…
Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT), offering diverse applications for large-scale data analysis. In this work, we derive novel statistical…