统计理论
Post-data statistical inference concerns making probability statements about model parameters conditional on observed data. When a priori knowledge about parameters is available, post-data inference can be conveniently made from Bayesian…
Establishing the limiting distribution of Chatterjee's rank correlation for a general, possibly non-independent, pair of random variables has been eagerly awaited by many. This paper shows that (a) Chatterjee's rank correlation is…
We study the advantages of accelerated gradient methods, specifically based on the Frank-Wolfe method and projected gradient descent, for privacy and heavy-tailed robustness. Our approaches are as follows: For the Frank-Wolfe method, our…
Since its introduction as a computable approximation of the Reeb graph, the Mapper graph has become one of the most popular tools from topological data analysis for performing data visualization and inference. However, finding an…
We consider the problem of estimating the parameters of a non-stationary Hawkes process with time-dependent reproduction rate and baseline intensity. Our approach relies on the standard maximum likelihood estimator (MLE), coinciding with…
Square contingency tables are traditionally analyzed with a focus on the symmetric structure of the corresponding probability tables. We view probability tables as elements of a simplex equipped with the Aitchison geometry. This perspective…
Given an undirected and connected graph $G$ on $T$ vertices, suppose each vertex $t$ has a latent signal $x_t \in \mathbb{R}^n$ associated to it. Given partial linear measurements of the signals, for a potentially small subset of the…
A central problem in data science is to use potentially noisy samples of an unknown function to predict values for unseen inputs. In classical statistics, predictive error is understood as a trade-off between the bias and the variance that…
In practice, the use of rounding is ubiquitous. Although researchers have looked at the implications of rounding continuous random variables, rounding may also be applied to functions of discrete random variables. For example, to infer the…
This paper deals with testing the equality of $k$ ($k\ge 2$) distribution functions against possible stochastic ordering among them. Two classes of rank tests are proposed for this testing problem. The statistics of the tests under study…
We study the empirical version of halfspace depths with the objective of establishing a connection between the rates of convergence and the tail behaviour of the corresponding underlying distributions. The intricate interplay between the…
This paper considers the order estimation problem of stochastic autoregressive exogenous input (ARX) systems by using quantized data. Based on the least squares algorithm and inspired by the control systems information criterion (CIC), a…
We consider a regression framework where the design points are deterministic and the errors possibly non-i.i.d. and heavy-tailed (with a moment of order $p$ in $[1,2]$). Given a class of candidate regression functions, we propose a…
We study the high-dimensional uniformity testing problem, which involves testing whether the underlying distribution is the uniform distribution, given $n$ data points on the $p$-dimensional unit hypersphere. While this problem has been…
To our knowledge, the analysis of convergence rates for persistence diagrams estimation from noisy signals has predominantly relied on lifting signal estimation results through sup-norm (or other functional norm) stability theorems. We…
Scoring rules promote rational and honest decision-making, which is important for model evaluation and becoming increasingly important for automated procedures such as `AutoML'. In this paper we survey common squared and logarithmic scoring…
We take into consideration generalization bounds for the problem of the estimation of the drift component for ergodic stochastic differential equations, when the estimator is a ReLU neural network and the estimation is non-parametric with…
In this paper, we investigate a class of approximate Gaussian processes (GP) obtained by taking a linear combination of compactly supported basis functions with the basis coefficients endowed with a dependent Gaussian prior distribution.…
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which…
Kernel density estimation is a popular method for estimating unseen probability distributions. However, the convergence of these classical estimators to the true density slows down in high dimensions. Moreover, they do not define meaningful…