统计理论
Distributional approximation is a fundamental problem in machine learning with numerous applications across all fields of science and engineering and beyond. The key challenge in most approximation methods is the need to tackle the…
The ``sample amplification'' problem formalizes the following question: Given $n$ i.i.d. samples drawn from an unknown distribution $P$, when is it possible to produce a larger set of $n+m$ samples which cannot be distinguished from $n+m$…
Robust estimation has played an important role in statistical and machine learning. However, its applications to functional linear regression are still under-developed. In this paper, we focus on Huber's loss with a diverging robustness…
Data dispersed across multiple files are commonly integrated through probabilistic linkage methods, where even minimal error rates in record matching can significantly contaminate subsequent statistical analyses. In regression problems, we…
Quantitative measurement of ageing across systems and components is crucial for accurately assessing reliability and predicting failure probabilities. This measurement supports effective maintenance scheduling, performance optimisation, and…
We prove a formula for the maximal correlation coefficient of the bivariate Marshall Olkin distribution that was conjectured in Lin, Lai, and Govindaraju (2016, Stat. Methodol., 29:1-9). The formula is applied to obtain a new proof for a…
We consider population modelling using parametrised ordinary differential equation initial value problems (ODE-IVPs). For each individual drawn randomly from the unknown population distribution, the corresponding parameters for the ODE-IVP…
We provide new non-asymptotic false discovery proportion (FDP) confidence envelopes in several multiple testing settings relevant for modern high dimensional-data methods. We revisit the multiple testing scenarios considered in the recent…
A new approach based on censoring and moment criterion is introduced for parameter estimation of count distributions when the probability generating function is available even though a closed form of the probability mass function and/or…
Testing for pairwise independence for the case where the number of variables may be of the same size or even larger than the sample size has received increasing attention in the recent years. We contribute to this branch of the literature…
The ageing intensity function is a powerful analytical tool that provides valuable insights into the ageing process across diverse domains such as reliability engineering, actuarial science, and healthcare. Its applications continue to…
In this article, we consider the complete independence test of high-dimensional data. Based on Chatterjee coefficient, we pioneer the development of quadratic test and extreme value test which possess good testing performance for…
Measuring the degree of inequality expressed by a multivariate statistical distribution is a challenging problem, which appears in many fields of science and engineering. In this paper, we propose to extend the well known univariate Gini…
Fueled by the ever-increasing need for statistics that guarantee the privacy of their training sets, this article studies the centrally-private estimation of Sobolev-smooth densities of probability over the hypercube in dimension d. The…
The probability of causation (PC) is often used in liability assessments. In a legal context, for example, where a patient suffered the side effect after taking a medication and sued the pharmaceutical company as a result, the value of the…
In bipartite incidence graph sampling, the target study units may be formed as connected population elements, which are distinct to the units of sampling and there may exist generally more than one way by which a given study unit can be…
Exchangeable random graphs, which include some of the most widely studied network models, have emerged as the mainstay of statistical network analysis in recent years. Graphons, which are the central objects in graph limit theory, provide a…
Let $\alpha_n(\cdot)=P\bigl(X_{n+1}\in\cdot\mid X_1,\ldots,X_n\bigr)$ be the predictive distributions of a sequence $(X_1,X_2,\ldots)$ of $p$-dimensional random vectors. Suppose $$\alpha_n= \mathcal{N} _p (M_n,Q_n)$$ where…
We propose tensor time series imputation when the missing pattern in the tensor data can be general, as long as any two data positions along a tensor fibre are both observed for enough time points. The method is based on a tensor time…
The traditional method of computing singular value decomposition (SVD) of a data matrix is based on a least squares principle, thus, is very sensitive to the presence of outliers. Hence the resulting inferences across different applications…