统计理论
We study estimators of the optimal transport (OT) map between two probability distributions. We focus on plugin estimators derived from the OT map between estimates of the underlying distributions. We develop novel stability bounds for OT…
We investigate some aspects of the problem of the estimation of birth distributions (BD) in multi-type Galton-Watson trees (MGW) with unobserved types. More precisely, we consider two-type MGW called spinal-structured trees. This kind of…
This note relates the calibration of models to the consistent loss functions for the target functional of the model. We demonstrate that a model is calibrated if and only if there is a parameter value that is optimal under all consistent…
This paper investigates the moment monotonicity property of Weibull, Gamma, and Log-normal distributions. We provide the first complete mathematical proofs for the monotonicity of the function $E(X^n)^{\frac{1}{n}}$ specific to these…
Classification with imbalanced data is a common challenge in data analysis, where certain classes (minority classes) account for a small fraction of the training data compared with other classes (majority classes). Classical statistical…
P-hacking poses challenges to traditional hypothesis testing. In this paper, we propose a robust method for the one-sample significance test that can protect against p-hacking from sample manipulation. Precisely, assuming a sequential…
This paper investigates the spectral properties of spatial-sign covariance matrices, a self-normalized version of sample covariance matrices, for data from $\alpha$-regularly varying populations with general covariance structures. By…
In the context of Gaussian conditioning, greedy algorithms iteratively select the most informative measurements, given an observed Gaussian random variable. However, the convergence analysis for conditioning Gaussian random variables…
A novel and comprehensive methodology designed to tackle the challenges posed by extreme values in the context of random censorship is introduced. The main focus is on the analysis of integrals based on the product-limit estimator of…
We study the problem of identifiability of the total effect of an intervention from observational time series in the situation, common in practice, where one only has access to abstractions of the true causal graph. We consider here two…
The conditional Aalen--Johansen estimator, a general-purpose non-parametric estimator of conditional state occupation probabilities, is introduced. The estimator is applicable for any finite-state jump process and supports conditioning on…
Spectral estimators are fundamental in lowrank matrix models and arise throughout machine learning and statistics, with applications including network analysis, matrix completion and PCA. These estimators aim to recover the leading…
Expected values weighted by the inverse of a multivariate density or, equivalently, Lebesgue integrals of regression functions with multivariate regressors occur in various areas of applications, including estimating average treatment…
Robust inference based on the minimization of statistical divergences has proved to be a useful alternative to classical techniques based on maximum likelihood and related methods. Basu et al. (1998) introduced the density power divergence…
Classification and Regression Tree (CART), Random Forest (RF) and Gradient Boosting Tree (GBT) are probably the most popular set of statistical learning methods. However, their statistical consistency can only be proved under very…
Relative survival methodology deals with a competing risks survival model where the cause of death is unknown. This lack of information occurs regularly in population-based cancer studies. Non-parametric estimation of the net survival is…
We consider robust location-scale estimators under contamination. We show that commonly used robust estimators such as the median and the Huber estimator are inconsistent under asymmetric contamination, while the Tukey estimator is…
The graph projection of a hypergraph is a simple graph with the same vertex set and with an edge between each pair of vertices that appear in a hyperedge. We consider the problem of reconstructing a random $d$-uniform hypergraph from its…
One of the first steps in applications of statistical network analysis is frequently to produce summary charts of important features of the network. Many of these features take the form of sequences of graph statistics counting the number…
We introduce a semiparametric latent space model for analyzing longitudinal network data. The model consists of a static latent space component and a time-varying node-specific baseline component. We develop a semiparametric efficient score…