Related papers: Consistance d'un estimateur de minimum de variance…

Self Organizing Map algorithm and distortion measure

We study the statistical meaning of the minimization of distortion measure and the relation between the equilibrium points of the SOM algorithm and the minima of distortion measure. If we assume that the observations and the map lie in an…

Machine Learning · Statistics 2008-02-22 Joseph Rynkiewicz

Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation

Estimating entropy and mutual information consistently is important for many machine learning applications. The Kozachenko-Leonenko (KL) estimator (Kozachenko & Leonenko, 1987) is a widely used nonparametric estimator for the entropy of…

Statistics Theory · Mathematics 2016-07-22 Shashank Singh , Barnabás Póczos

Variance estimates and almost Euclidean structure

We introduce and initiate the study of new parameters associated with any norm and any log-concave measure on $\mathbb R^n$, which provide sharp distributional inequalities. In the Gaussian context this investigation sheds light to the…

Functional Analysis · Mathematics 2017-10-23 Grigoris Paouris , Petros Valettas

A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either…

Machine Learning · Statistics 2025-11-25 Man-Chung Yue , Yves Rychener , Daniel Kuhn , Viet Anh Nguyen

Efficient Estimation of k for the Nearest Neighbors Class of Methods

The k Nearest Neighbors (kNN) method has received much attention in the past decades, where some theoretical bounds on its performance were identified and where practical optimizations were proposed for making it work fairly well in high…

Machine Learning · Computer Science 2016-06-14 Aleksander Lodwich , Faisal Shafait , Thomas Breuel

Geometric k-nearest neighbor estimation of entropy and mutual information

Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for large…

Statistics Theory · Mathematics 2018-04-18 Warren M. Lord , Jie Sun , Erik M. Bollt

Nearest neighbor empirical processes

In the regression framework, the empirical measure based on the responses resulting from the nearest neighbors, among the covariates, to a given point $x$ is introduced and studied as a central statistical quantity. First, the associated…

Statistics Theory · Mathematics 2024-04-11 François Portier

Sufficient dimension reduction based on an ensemble of minimum average variance estimators

We introduce a class of dimension reduction estimators based on an ensemble of the minimum average variance estimates of functions that characterize the central subspace, such as the characteristic functions, the Box--Cox transformations…

Statistics Theory · Mathematics 2012-03-16 Xiangrong Yin , Bing Li

A quantum k-nearest neighbors algorithm based on the Euclidean distance estimation

The k-nearest neighbors (k-NN) is a basic machine learning (ML) algorithm, and several quantum versions of it, employing different distance metrics, have been presented in the last few years. Although the Euclidean distance is one of the…

Emerging Technologies · Computer Science 2024-04-25 Enrico Zardini , Enrico Blanzieri , Davide Pastorello

A non-iterative algorithm to estimate the modes of univariate mixtures with well separated components

This paper deals with the estimation of the modes of an univariate mixture when the number of components is known and when the component density are well separated. We propose an algorithm based on the minimization of the "kp" criterion we…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Nicolas Paul , Luc Fety , Michel Terre

Modified K-means Algorithm with Local Optimality Guarantees

The K-means algorithm is one of the most widely studied clustering algorithms in machine learning. While extensive research has focused on its ability to achieve a globally optimal solution, there still lacks a rigorous analysis of its…

Machine Learning · Computer Science 2025-06-12 Mingyi Li , Michael R. Metel , Akiko Takeda

An adaptive nearest neighbor rule for classification

We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may…

Machine Learning · Computer Science 2019-05-31 Akshay Balsubramani , Sanjoy Dasgupta , Yoav Freund , Shay Moran

Variance-Aware Estimation of Kernel Mean Embedding

An important feature of kernel mean embeddings (KME) is that the rate of convergence of the empirical KME to the true distribution KME can be bounded independently of the dimension of the space, properties of the distribution and smoothness…

Statistics Theory · Mathematics 2025-04-17 Geoffrey Wolfer , Pierre Alquier

Strong Consistency of Reduced K-means Clustering

Reduced k-means clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that both clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously…

Statistics Theory · Mathematics 2014-02-14 Yoshikazu Terada

Optimal Extended Neighbourhood Rule $k$ Nearest Neighbours Ensemble

The traditional k nearest neighbor (kNN) approach uses a distance formula within a spherical region to determine the k closest training observations to a test sample point. However, this approach may not work well when test point is located…

Machine Learning · Statistics 2024-02-19 Amjad Ali , Zardad Khan , Dost Muhammad Khan , Saeed Aldahmani

Classifying variable-structures: a general framework

In this work, we unify recent variable-clustering techniques within a common geometric framework which allows to extend clustering to variable-structures, i.e. variable-subsets within which links between variables are taken into…

Methodology · Statistics 2018-04-25 Xavier Bry , Lionel Cucala

K-Nearest Neighbor Approximation Via the Friend-of-a-Friend Principle

Suppose $V$ is an $n$-element set where for each $x \in V$, the elements of $V \setminus \{x\}$ are ranked by their similarity to $x$. The $K$-nearest neighbor graph is a directed graph including an arc from each $x$ to the $K$ points of $V…

Combinatorics · Mathematics 2020-12-29 Jacob D. Baron , R. W. R. Darling

Convergence rate for Nearest Neighbour matching: geometry of the domain and higher-order regularity

Estimating some mathematical expectations from partially observed data and in particular missing outcomes is a central problem encountered in numerous fields such as transfer learning, counterfactual analysis or causal inference. Matching…

Statistics Theory · Mathematics 2025-05-01 Simon Viel , Lionel Truquet , Ikko Yamane

k-Nearest neighbor density estimation on Riemannian Manifolds

In this paper, we consider a k-nearest neighbor kernel type estimator when the random variables belong in a Riemannian manifolds. We study asymptotic properties such as the consistency and the asymptotic distribution. A simulation study is…

Statistics Theory · Mathematics 2011-06-24 Guillermo Henry , Andrés Muñoz , Daniela Rodriguez

Strong Consistency of Sparse K-means Clustering

In this paper, we study the strong consistency of the sparse K-means clustering for high dimensional data. We prove the consistency in both risk and clustering for the Euclidean distance. We discuss the characterization of the limit of the…

Statistics Theory · Mathematics 2025-04-15 Jeungju Kim , Johan Lim