Related papers: A Tutorial on Multivariate $k$-Statistics and thei…
We propose new algorithms for generating $k$-statistics, multivariate $k$-statistics, polykays and multivariate polykays. The resulting computational times are very fast compared with procedures existing in the literature. Such speeding up…
kStatistics is a package in R that serves as a unified framework for estimating univariate and multivariate cumulants as well as products of univariate and multivariate cumulants of a random sample, using unbiased estimators with minimum…
Through the classical umbral calculus, we provide a unifying syntax for single and multivariate $k$-statistics, polykays and multivariate polykays. From a combinatorial point of view, we revisit the theory as exposed by Stuart and Ord,…
Trough the classical umbral calculus, we provide new, compact and easy to handle expressions of k-statistics, and more in general of U-statistics. In addition such a symbolic method can be naturally extended to multivariate case and to…
We present multivariate unbiased estimators for second, third, and fourth order cumulants $C_2(x,y)$, $C_3(x,y,z)$, and $C_4(x,y,z,w)$. Many relevant new estimators are derived for cases where some variables are average-free or pairs of…
We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings. $K$-variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local…
This paper develops new combinatorial approaches to analyze and compute special set partitions, called complementary set partitions, which are fundamental in the study of generalized cumulants. Moving away from traditional graph-based and…
The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a…
We propose a statistical method for clustering of multivariate longitudinal data into homogeneous groups. This method relies on a time-varying extension on the classical K-means algorithm, where a multivariate vector autoregressive model is…
In this work, we consider a multivariate regression model with one-sided errors. We assume for the regression function to lie in a general H\"{o}lder class and estimate it via a nonparametric local polynomial approach that consists of…
We propose the K-series estimation approach for the recovery of unknown univariate and multivariate distributions given knowledge of a finite number of their moments. Our method is directly applicable to the probabilistic analysis of…
In the present work we have selected a collection of statistical and mathematical tools useful for the exploration of multivariate data and we present them in a form that is meant to be particularly accessible to a classically trained…
Using recurrences gotten from the Apagodu-Zeilberger Multivariate Almkvist-Zeilberger algorithm we present a linear-time, and constant-space, algorithm to compute the general mixed moments of the k-variate general normal distribution, with…
The purpose of many health studies is to estimate the effect of an exposure on an outcome. It is not always ethical to assign an exposure to individuals in randomised controlled trials, instead observational data and appropriate study…
Following the student t-statistic, normalization has been a widely used method in statistic and other disciplines including economics, ecology and machine learning. We focus on statistics taking the form of a ratio over (some power of) the…
How can we explain the predictions of a machine learning model? When the data is structured as a multivariate time series, this question induces additional difficulties such as the necessity for the explanation to embody the time dependency…
In this paper, the decades-old clustering method k-means is revisited. The original distortion minimization model of k-means is addressed by a pure stochastic minimization procedure. In each step of the iteration, one sample is tentatively…
Cumulant mapping employs a statistical reconstruction of the whole by sampling its parts. The theory developed in this work formalises and extends ad hoc methods of `multi-fold' or `multi-dimensional' covariance mapping. Explicit formulae…
We provide a general methodology for unbiased estimation for intractable stochastic models. We consider situations where the target distribution can be written as an appropriate limit of distributions, and where conventional approaches…
The importance of exploring a potential integration among surveys has been acknowledged in order to enhance effectiveness and minimize expenses. In this work, we employ the alignment method to combine information from two different surveys…