统计理论
Information generating functions have been used for generating various entropy and divergence measures. In the present work, we introduce quantile based relative information generating function and study its properties. The proposed…
In the last quarter of a century, algebraic statistics has established itself as an expanding field which uses multilinear algebra, commutative algebra, computational algebra, geometry, and combinatorics to tackle problems in mathematical…
We study mixtures of decomposable graphical models, focusing on their ideals and dimensions. For mixtures of clique stars, we characterize the ideals in terms of ideals of mixtures of independence models. We also give a recursive formula…
Completion problems, of recovering a point from a set of observed coordinates, are abundant in applications to image reconstruction, phylogenetics, and data science. We consider a completion problem coming from algebraic statistics: to…
We study the maximum likelihood (ML) degree of discrete exponential independence models and models defined by the second hypersimplex. For models with two independent variables, we show that the ML degree is an invariant of a matroid…
This paper presents a unified framework for constructing Approximate Message Passing (AMP) algorithms for rotationally-invariant models. By employing a general iterative algorithm template and reducing it to long-memory Orthogonal AMP…
Semiparametric mixture models are parametric models with latent variables. They are defined kernel, $p_\theta(x | z)$, where z is the unknown latent variable, and $\theta$ is the parameter of interest. We assume that the latent variables…
We study the problem of nonparametric estimation of the linear multiplier function $\theta(t)$ for processes satisfying stochastic differential equations of the type $$dX_t= \theta(t)X_t dt+ \epsilon\; \sigma_1(t,X_t)\sigma_2(t,Y_t)dW_t,…
Matrix factor models have been growing popular dimension reduction tools for large-dimensional matrix time series. However, the heteroscedasticity of the idiosyncratic components has barely received any attention. Starting from the pseudo…
Establishing causality is a fundamental goal in fields like medicine and social sciences. While randomized controlled trials are the gold standard for causal inference, they are not always feasible or ethical. Observational studies can…
It is of special importance in the clinical trial to compare survival times between the treatment group and the control group. Propensity score methods with a logistic regression model are often used to reduce the effects of confounders.…
Given a composite null $ \mathcal P$ and composite alternative $ \mathcal Q$, when and how can we construct a p-value whose distribution is exactly uniform under the null, and stochastically smaller than uniform under the alternative?…
This paper considers a multi-environment linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariates may vary across different environments, yet the…
For a multivariate normal set up, it is well known that the maximum likelihood estimator of covariance matrix is neither admissible nor minimax under the Stein loss function. For the past six decades, a bunch of researches have followed…
We study the problem of detecting or recovering a planted ranked subgraph from a directed graph, an analog for directed graphs of the well-studied planted dense subgraph model. We suppose that, among a set of $n$ items, there is a subset…
We consider the recovery of an unknown function $f$ from a noisy observation of the solution $u_f$ to a partial differential equation that can be written in the form $\mathcal{L} u_f=c(f,u_f)$, for a differential operator $\mathcal{L}$ that…
The coefficient of variation, which measures the variability of a distribution from its mean, is not uniquely defined in the multidimensional case, and so is the multidimensional Gini index, which measures the inequality of a distribution…
The weak convergence of the quantile processes, which are constructed based on different estimators of the finite population quantiles, is shown under various well-known sampling designs based on a superpopulation model. The results related…
This paper is devoted to the problem of determining the concentration bounds that are achievable in non-parametric regression. We consider the setting where features are supported on a bounded subset of $\mathbb{R}^d$, the regression…
For a multinormal distribution with a $p$-dimensional mean vector ${\mbtheta}$ and an arbitrary unknown dispersion matrix ${\mbSigma}$, Rao ([9], [10]) proposed two tests for the problem of testing $ H_{0}:{\mbtheta}_{1} = {\bf 0},…