Related papers: Consistance d'un estimateur de minimum de variance…
We study the statistical meaning of the minimization of distortion measure and the relation between the equilibrium points of the SOM algorithm and the minima of distortion measure. If we assume that the observations and the map lie in an…
Estimating entropy and mutual information consistently is important for many machine learning applications. The Kozachenko-Leonenko (KL) estimator (Kozachenko & Leonenko, 1987) is a widely used nonparametric estimator for the entropy of…
We introduce and initiate the study of new parameters associated with any norm and any log-concave measure on $\mathbb R^n$, which provide sharp distributional inequalities. In the Gaussian context this investigation sheds light to the…
The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either…
The k Nearest Neighbors (kNN) method has received much attention in the past decades, where some theoretical bounds on its performance were identified and where practical optimizations were proposed for making it work fairly well in high…
Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for large…
In the regression framework, the empirical measure based on the responses resulting from the nearest neighbors, among the covariates, to a given point $x$ is introduced and studied as a central statistical quantity. First, the associated…
We introduce a class of dimension reduction estimators based on an ensemble of the minimum average variance estimates of functions that characterize the central subspace, such as the characteristic functions, the Box--Cox transformations…
The k-nearest neighbors (k-NN) is a basic machine learning (ML) algorithm, and several quantum versions of it, employing different distance metrics, have been presented in the last few years. Although the Euclidean distance is one of the…
This paper deals with the estimation of the modes of an univariate mixture when the number of components is known and when the component density are well separated. We propose an algorithm based on the minimization of the "kp" criterion we…
The K-means algorithm is one of the most widely studied clustering algorithms in machine learning. While extensive research has focused on its ability to achieve a globally optimal solution, there still lacks a rigorous analysis of its…
We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may…
An important feature of kernel mean embeddings (KME) is that the rate of convergence of the empirical KME to the true distribution KME can be bounded independently of the dimension of the space, properties of the distribution and smoothness…
Reduced k-means clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that both clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously…
The traditional k nearest neighbor (kNN) approach uses a distance formula within a spherical region to determine the k closest training observations to a test sample point. However, this approach may not work well when test point is located…
In this work, we unify recent variable-clustering techniques within a common geometric framework which allows to extend clustering to variable-structures, i.e. variable-subsets within which links between variables are taken into…
Suppose $V$ is an $n$-element set where for each $x \in V$, the elements of $V \setminus \{x\}$ are ranked by their similarity to $x$. The $K$-nearest neighbor graph is a directed graph including an arc from each $x$ to the $K$ points of $V…
Estimating some mathematical expectations from partially observed data and in particular missing outcomes is a central problem encountered in numerous fields such as transfer learning, counterfactual analysis or causal inference. Matching…
In this paper, we consider a k-nearest neighbor kernel type estimator when the random variables belong in a Riemannian manifolds. We study asymptotic properties such as the consistency and the asymptotic distribution. A simulation study is…
In this paper, we study the strong consistency of the sparse K-means clustering for high dimensional data. We prove the consistency in both risk and clustering for the Euclidean distance. We discuss the characterization of the limit of the…