统计理论
We study a specific type of SCM, called a Dynamic Structural Causal Model (DSCM), whose endogenous variables represent functions of time, which is possibly cyclic and allows for latent confounding. As a motivating use-case, we show that…
Consider a data matrix $Y = [\mathbf{y}_1, \cdots, \mathbf{y}_N]$ of size $M \times N$, where the columns are independent observations from a random vector $\mathbf{y}$ with zero mean and population covariance $\Sigma$. Let $\mathbf{u}_i$…
There are several ways to establish the asymptotic normality of $L$-statistics, which depend on the choice of the weights-generating function and the cumulative distribution selection of the underlying model. In this study, we focus on…
We analyze the stability of (strong) laws of large numbers in Hadamard spaces with respect to distributional perturbations. For the inductive means of a sequence of independent, but not necessarily identically distributed random variables,…
We consider a model where a signal (discrete or continuous) is observed with an additive Gaussian noise process. The signal is issued from a linear combination of a finite but increasing number of translated features. The features are…
Consider a convex function that is invariant under an group of transformations. If it has a minimizer, does it also have an invariant minimizer? Variants of this problem appear in nonparametric statistics and in a number of adjacent fields.…
Let $(x_{i}, y_{i})_{i=1,\dots,n}$ denote independent samples from a general mixture distribution $\sum_{c\in\mathcal{C}}\rho_{c}P_{c}^{x}$, and consider the hypothesis class of generalized linear models $\hat{y} = F(\Theta^{\top}x)$. In…
Motivated by the dynamic modeling of relative abundance data in ecology, we introduce a general approach to model stationary Markovian or non Markovian time series on (relatively) compact spaces such as a hypercube, the simplex or a sphere…
We show how convergence to the Gumbel distribution in an extreme value setting can be understood in an information-theoretic sense. We introduce a new type of score function which behaves well under the maximum operation, and which implies…
Bi-stochastic normalization provides an alternative normalization of graph Laplacians in graph-based data analysis and can be computed efficiently by Sinkhorn-Knopp (SK) iterations. This paper proves the convergence of bi-stochastically…
Extreme value theory has constructed asymptotic properties of the sample maximum. This study concerns probability distribution estimation of the sample maximum. The traditional approach is parametric fitting to the limiting distribution --…
In this paper we propose a new method of joint nonparametric estimation of probability density and its support. As is well known, nonparametric kernel density estimator has "boundary bias problem" when the support of the population density…
We propose new smoothed median and the Wilcoxon's rank sum test. As is pointed out by Maesono et al.(2016), some nonparametric discrete tests have a problem with their significance probability. Because of this problem, the selection of the…
The hazard function is a ratio of a density and survival function, and it is a basic tool of the survival analysis. In this paper we propose a kernel estimator of the hazard ratio function, which are based on a modification of \'{C}wik and…
In this paper we propose new smoothed sign and Wilcoxon's signed rank tests, which are based on a kernel estimator of the underlying distribution function of data. We discuss approximations of $p$-values and asymptotic properties of these…
The generalized filtered method of moments was developed in the recent papers by Alomari et al., 2020, and Ayache et al., 2022. It used functional data obtained from continuously sampled cyclic long-memory stochastic processes to…
Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of…
Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad…
It is often desirable to summarise a probability measure on a space $X$ in terms of a mode, or MAP estimator, i.e.\ a point of maximum probability. Such points can be rigorously defined using masses of metric balls in the small-radius…
We study the Bayesian density estimation of data living in the offset of an unknown submanifold of the Euclidean space. In this perspective, we introduce a new notion of anisotropic H\"older for the underlying density and obtain posterior…