Related papers: Efficient Distributed Algorithms for the $K$-Neare…
Algorithms often carry out equally many computations for "easy" and "hard" problem instances. In particular, algorithms for finding nearest neighbors typically have the same running time regardless of the particular problem instance. In…
K-Nearest Neighbours (k-NN) is a popular classification and regression algorithm, yet one of its main limitations is the difficulty in choosing the number of neighbours. We present a Bayesian algorithm to compute the posterior probability…
This paper presents how to perform minimax optimal classification, regression, and density estimation based on fixed-$k$ nearest neighbor (NN) searches. We consider a distributed learning scenario, in which a massive dataset is split into…
Motivated by the increasing need for fast processing of large-scale graphs, we study a number of fundamental graph problems in a message-passing model for distributed computing, called $k$-machine model, where we have $k$ machines that…
In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data…
Demands for minimum parameter setup in machine learning models are desirable to avoid time-consuming optimization processes. The $k$-Nearest Neighbors is one of the most effective and straightforward models employed in numerous problems.…
The k-nearest-neighbor method performs classification tasks for a query sample based on the information contained in its neighborhood. Previous studies into the k-nearest-neighbor algorithm usually achieved the decision value for a class by…
In this paper, we propose a distributed algorithm for the minimum dominating set problem. For some especial networks, we prove theoretically that the achieved answer by our proposed algorithm is a constant approximation factor of the exact…
We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees…
The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown…
Given a set $P$ of $n$ points in $\mathbf{R}^d$, and a positive integer $k \leq n$, the $k$-dispersion problem is that of selecting $k$ of the given points so that the minimum inter-point distance among them is maximized (under Euclidean…
The $k$-center problem is a central optimization problem with numerous applications for machine learning, data mining, and communication networks. Despite extensive study in various scenarios, it surprisingly has not been thoroughly…
Suppose $V$ is an $n$-element set where for each $x \in V$, the elements of $V \setminus \{x\}$ are ranked by their similarity to $x$. The $K$-nearest neighbor graph is a directed graph including an arc from each $x$ to the $K$ points of $V…
k is the most important parameter in a text categorization system based on k-Nearest Neighbor algorithm (kNN).In the classification process, k nearest documents to the test one in the training set are determined firstly. Then, the…
Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that…
K-nearest neighbor classification algorithm is one of the most basic algorithms in machine learning, which determines the sample's category by the similarity between samples. In this paper, we propose a quantum K-nearest neighbor…
The k Nearest Neighbors (kNN) method has received much attention in the past decades, where some theoretical bounds on its performance were identified and where practical optimizations were proposed for making it work fairly well in high…
The weighted k-nearest neighbors algorithm is one of the most fundamental non-parametric methods in pattern recognition and machine learning. The question of setting the optimal number of neighbors as well as the optimal weights has…
We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance…
Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life…