统计理论
This study presents new closed-form estimators for the Dirichlet and the Multivariate Gamma distribution families, whose maximum likelihood estimator cannot be explicitly derived. The methodology builds upon the score-adjusted estimators…
A simple way of obtaining robust estimates of the "center" (or the "location") and of the "scatter" of a dataset is to use the maximum likelihood estimate with a class of heavy-tailed distributions, regardless of the "true" distribution…
Constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical…
We consider the problem of sequential change detection, where the goal is to design a scheme for detecting any changes in a parameter or functional $\theta$ of the data stream distribution that has small detection delay, but guarantees…
This paper introduces a Factor Augmented Sparse Throughput (FAST) model that utilizes both latent factors and sparse idiosyncratic components for nonparametric regression. The FAST model bridges factor models on one end and sparse…
It is common to model a deterministic response function, such as the output of a computer experiment, as a Gaussian process with a Mat\'ern covariance kernel. The smoothness parameter of a Mat\'ern kernel determines many important…
The fixed-X knockoff filter is a flexible framework for variable selection with false discovery rate (FDR) control in linear models with arbitrary design matrices (of full column rank) and it allows for finite-sample selective inference via…
We consider the problem of nonparametric estimation of the drift and diffusion coefficients of a Stochastic Differential Equation (SDE), based on $n$ independent replicates $\left\{X_i(t)\::\: t\in [0,1]\right\}_{1 \leq i \leq n}$, observed…
The present contribution investigates multivariate bootstrap procedures for general stabilizing statistics, with specific application to topological data analysis. Existing limit theorems for topological statistics prove difficult to use in…
It is conventionally believed that a permutation test should ideally use all permutations. If this is computationally unaffordable, it is believed one should use the largest affordable Monte Carlo sample or (algebraic) subgroup of…
Feature alignment methods are used in many scientific disciplines for data pooling, annotation, and comparison. As an instance of a permutation learning problem, feature alignment presents significant statistical and computational…
Tensors have broad applications in neuroimaging, data mining, digital marketing, etc. CANDECOMP/PARAFAC (CP) tensor decomposition can effectively reduce the number of parameters to gain dimensionality-reduction and thus plays a key role in…
There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art…
For $d \ge 2$, let $X$ be a random vector having a Bingham distribution on $\mathcal{S}^{d-1}$, the unit sphere centered at the origin in $\R^d$, and let $\Sigma$ denote the symmetric matrix parameter of the distribution. Let $\Psi(\Sigma)$…
In this manuscript, we discuss a class of difference-based estimators of the autocovariance structure in a semiparametric regression model where the signal is discontinuous and the errors are serially correlated. The signal in this model…
Properties of strong mixing have been established for the stationary linear Hawkes process in the univariate case, and can serve as a basis for statistical applications. In this paper, we provide the technical arguments needed to extend the…
The nonparametric estimators built by minimizing the mean squared relative error are gaining in popularity for their robustness in the presence of outliers in comparison to the Nadaraya Watson estimators. In this paper we build a relative…
Integer-valued time series exist widely in economics, finance, biology, computer science, medicine, insurance, and many other fields. In recent years, many types of models have been proposed to model integer-valued time series data, in…
In network data analysis, summary statistics of a network can provide us with meaningful insight into the structure of the network. The average clustering coefficient is one of the most popular and widely used network statistics. In this…
Nonparametric regression problems with qualitative constraints such as monotonicity or convexity are ubiquitous in applications. For example, in predicting the yield of a factory in terms of the number of labor hours, the monotonicity of…