统计理论
We present a test for independence of two strictly stationary time series based on a bootstrap procedure for the distance covariance. Our test detects any kind of dependence between the two time series within an arbitrary maximum lag $L$.…
Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel…
We introduce a test for the conditional independence of random variables $X$ and $Y$ given a random variable $Z$, specifically by sampling from the joint distribution $(X,Y,Z)$, binning the support of the distribution of $Z$, and conducting…
This paper investigates the effect of the design matrix on the ability (or inability) to estimate a sparse parameter in linear regression. More specifically, we characterize the optimal rate of estimation when the smallest singular value of…
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences. In this model, a dense cycle of expected bandwidth $n \tau$, representing the hidden one-dimensional geometry of vertices, is…
Stochastic optimization methods encounter new challenges in the realm of streaming, characterized by a continuous flow of large, high-dimensional data. While first-order methods, like stochastic gradient descent, are the natural choice,…
The volume function V(t) of a compact set S\in R^d is just the Lebesgue measure of the set of points within a distance to S not larger than t. According to some classical results in geometric measure theory, the volume function turns out to…
Higher-order multiway data is ubiquitous in machine learning and statistics and often exhibits community-like structures, where each component (node) along each different mode has a community membership associated with it. In this paper we…
Single-chain Markov chain Monte Carlo simulates realizations from a Markov chain to estimate expectations with the empirical average. The single-chain simulation is generally of considerable length and restricts many advantages of modern…
In this paper, we propose a novel approach to test the equality of high-dimensional mean vectors of several populations via the weighted $L_2$-norm. We establish the asymptotic normality of the test statistics under the null hypothesis. We…
For stationary time series, it is common to use the plots of partial autocorrelation function (PACF) or PACF-based tests to explore the temporal dependence structure of such processes. To our best knowledge, such analogs for non-stationary…
In this paper, we establish the partial correlation graph for multivariate continuous-time stochastic processes, assuming only that the underlying process is stationary and mean-square continuous with expectation zero and spectral density…
Consider the task of estimating a random vector $X$ from noisy observations $Y = X + Z$, where $Z$ is a standard normal vector, under the $L^p$ fidelity criterion. This work establishes that, for $1 \leq p \leq 2$, the optimal Bayesian…
The behaviour of many dynamic real phenomena shows different phases, with each one following a sigmoidal type pattern. This requires studying sigmoidal curves with more than one inflection point. In this work, a diffusion process is…
It is a challenge to manage infinite- or high-dimensional data in situations where storage, transmission, or computation resources are constrained. In the simplest scenario when the data consists of a noisy infinite-dimensional signal, we…
We establish nonuniform Berry-Esseen (B-E) bounds for Studentized U-statistics of the rate $1/\sqrt{n}$ under a third-moment assumption, which covers the t-statistic that corresponds to a kernel of degree $1$ as a special case. While an…
Mixtures of regression are a powerful class of models for regression learning with respect to a highly uncertain and heterogeneous response variable of interest. In addition to being a rich predictive model for the response given some…
Let $(X_t)$ be a reflected diffusion process in a bounded convex domain in $\mathbb R^d$, solving the stochastic differential equation $$dX_t = \nabla f(X_t) dt + \sqrt{2f (X_t)} dW_t, ~t \ge 0,$$ with $W_t$ a $d$-dimensional Brownian…
This paper aims to front with dimensionality reduction in regression setting when the predictors are a mixture of functional variable and high-dimensional vector. A flexible model, combining both sparse linear ideas together with…
This paper develops a new automatic and location-adaptive procedure for estimating regression in a Functional Single-Index Model (FSIM). This procedure is based on $k$-Nearest Neighbours ($k$NN) ideas. The asymptotic study includes results…