统计理论
We address the problem of representing context-specific causal models based on both observational and experimental data collected under general (e.g. hard or soft) interventions by introducing a new family of context-specific conditional…
In this largely expository note, we present an impossibility result for inner product recovery in a random geometric graph or latent space model using the rate-distortion theory. More precisely, suppose that we observe a graph $A$ on $n$…
Introduced in 1962, the Langlie procedure is one of the most popular approaches to sensitivity testing. It aims to estimate an unknown sensitivity distribution based on the outcomes of binary trials. Officially recognized by the U.S.…
The Grouped Horseshoe distribution arises from hierarchical structures in the recent Bayesian methodological literature aimed at selection of groups of regression coefficients. We isolate this distribution and study its properties…
Let recall that the term 'k-th extreme' was introduced in a limiting sense. That is, if $X_{r:n}$ denote the r-th order statistic then for fix k, as $n\to\infty$, $X_{n-k+1:n}$ is called the k-th extremes or k-th largest order statistics.…
We estimate on a compact interval densities with isolated irregularities, such as discontinuities or discontinuities in some derivatives. From independent and identically distributed observations we construct a kernel estimator with…
We develop a nonparametric test for deciding whether volatility of an asset follows a standard semimartingale process, with paths of finite quadratic variation, or a rough process with paths of infinite quadratic variation. The test…
We introduce a general class of autoregressive models for studying the dynamic of multivariate binary time series with stationary exogenous covariates. Using a high-level set of assumptions, we show that existence of a stationary path for…
We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature…
Recently, data depth has been widely used to rank multivariate data. The study of the depth-based $Q$ statistic, originally proposed by Liu and Singh (1993), has become increasingly popular when it can be used as a quality index to…
These lecture notes were prepared for a special topics course in the Department of Statistics at the University of Washington, Seattle. They comprise the first eight chapters of a book currently in progress.
We are interested in the distribution of Wishart samples after forgetting their scaling factors. We call such a distribution a projective Wishart distribution. We show that projective Wishart distributions have strong links with the…
The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step…
In statistical inference, a discrepancy between the parameter-to-observable map that generates the data and the parameter-to-observable map that is used for inference can lead to misspecified likelihoods and thus to incorrect estimates. In…
In this work we address the problem of detecting whether a sampled probability distribution of a random variable $V$ has infinite first moment. This issue is notably important when the sample results from complex numerical simulation…
Despite the wide usage of parametric point processes in theory and applications, a sound goodness-of-fit procedure to test whether a given parametric model is appropriate for data coming from a self-exciting point processes has been missing…
Probabilistic proofs of the Johnson-Lindenstrauss lemma imply that random projection can reduce the dimension of a data set and approximately preserve pairwise distances. If a distance being approximately preserved is called a success, and…
In this work, we address the longstanding puzzle that Sliced Inverse Regression (SIR) often performs poorly for sufficient dimension reduction when the structural dimension $d$ (the dimension of the central space) exceeds 4. We first show…
In epidemics many interesting quantities, like the reproduction number, depend on the incubation period (time from infection to symptom onset) and/or the generation time (time until a new person is infected from another infected person).…
We will consider multivariate stochastic processes indexed either by vertices or pairs of vertices of a dynamic network. Under a dynamic network we understand a network with a fixed vertex set and an edge set which changes randomly over…