统计理论
This pedagogical document explains three variational representations that are useful when comparing the efficiencies of reversible Markov chains: (i) the Dirichlet form and the associated variational representations of the spectral gaps;…
This paper presents closed-form analytical formulas for pricing volatility and variance derivatives with nonlinear payoffs under discrete-time observations. The analysis is based on a probabilistic approach assuming that the underlying…
The identifiability problem for interventions aims at assessing whether the total effect of some given interventions can be written with a do-free formula, and thus be computed from observational data only. We study this problem,…
Estimating high-dimensional precision matrices is a fundamental problem in modern statistics, with the graphical lasso and its $\ell_1$-penalty being a standard approach for recovering sparsity patterns. However, many statistical models,…
We study the semiparametric efficient estimation of a class of linear functionals in settings where a complete multivariate dataset is supplemented by additional datasets recording subsets of the variables of interest. These datasets are…
We provide a comprehensive theory of multiple variants of ordinal multidimensional scaling,including internal unfolding and external unfolding. We first follow Shepard (1966) and work in a continuum model to gain insight. We then follow…
The key result of this paper is to characterize all the multivariate symmetric Bernoulli distributions whose sum is minimal under convex order. In doing so, we automatically characterize extremal negative dependence among Bernoulli random…
The Horvitz-Thompson estimate of a total can be seen as as differentially private mechanism applied to this population total. We provide forumlae to compute the $\epsilon$ and $\delta$ parameter for this specific mecanism, coupled or not…
The identifiability problem for interventions aims at assessing whether the total causal effect can be written with a do-free formula, and thus be estimated from observational data only. We study this problem, considering multiple…
Studies using assays to quantify the expression of thousands of genes on tens to thousands of cell samples have been carried out for over 20 years. Such assays are based on microarrays, DNA sequencing or other molecular technologies. All…
In many high-dimensional problems, like sparse-PCA, planted clique, or clustering, the best known algorithms with polynomial time complexity fail to reach the statistical performance provably achievable by algorithms free of computational…
Two directed graphs are called covariance equivalent if they induce the same set of covariance matrices, up to a Lebesgue measure zero set, on the random variables of their associated linear structural equation models. For acyclic graphs,…
We consider the limiting distribution of the quantity $X^s/(X+Y)^r$, where $X$ and $Y$ are two independent Binomial random variables with a common success probability and a number of trials $n$ and $m$, respectively, and $r,s$ are positive…
Traditional covariate selection methods for causal inference focus on achieving unbiasedness and asymptotic efficiency. In many practical scenarios, researchers must estimate causal effects from observational data with limited sample sizes…
In 2017 Jordanova and co-authors consider probabilities for p-outside values, and later on, they use them in order to construct distribution sensitive IPO estimators. These works do not take into account the asymmetry of the distribution.…
Sparse linear regression methods such as Lasso require a tuning parameter that depends on the noise variance, which is typically unknown and difficult to estimate in practice. In the presence of heavy-tailed noise or adversarial outliers,…
This paper develops a general inferential framework for discrete copulas on finite supports in any dimension. The copula of a multivariate discrete distribution is defined as Csiszar's I-projection (i.e., the minimum-Kullback-Leibler…
We derive the information geometry induced by the statistical R\'enyi divergence, namely its metric tensor, its dual parametrized connections, as well as its dual Laplacians. Based on these results, we demonstrate that the R\'enyi-geometry,…
Comparison-based preference learning has become central to the alignment of AI models with human preferences. However, these methods may behave counterintuitively. After empirically observing that, when accounting for a preference for…
The work of Sprungk (Inverse Problems, 2020) established the local Lipschitz continuity of the misfit-to-posterior and prior-to-posterior maps with respect to the Kullback--Leibler divergence and the total variation, Hellinger, and…