Related papers: Mahalanobis Distance Metric Learning Algorithm for…
For many machine learning algorithms such as $k$-Nearest Neighbor ($k$-NN) classifiers and $ k $-means clustering, often their success heavily depends on the metric used to calculate distances between different data points. An effective…
Effectively applying the K-means algorithm to clustering tasks with incomplete features remains an important research area due to its impact on real-world applications. Recent work has shown that unifying K-means clustering and imputation…
A number of machine learning algorithms are using a metric, or a distance, in order to compare individuals. The Euclidean distance is usually employed, but it may be more efficient to learn a parametric distance such as Mahalanobis metric.…
The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This…
This report provides an exploration of different distance measures that can be used with the $K$-means algorithm for cluster analysis. Specifically, we investigate the Mahalanobis distance, and critically assess any benefits it may have…
Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with data arriving in streams, must be processed. Some algorithms to extend the popular K-means method…
In recent years, the crucial importance of metrics in machine learning algorithms has led to an increasing interest for optimizing distance and similarity functions. Most of the state of the art focus on learning Mahalanobis distances…
A key element of any machine learning algorithm is the use of a function that measures the dis/similarity between data points. Given a task, such a function can be optimized with a metric learning algorithm. Although this research field has…
Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering. Despite their importance, there has not been any prior work on applying…
Online metric learning has been widely applied in classification and retrieval. It can automatically learn a suitable metric from data by restricting similar instances to be separated from dissimilar instances with a given margin. However,…
Distance metric learning has attracted much attention in recent years, where the goal is to learn a distance metric based on user feedback. Conventional approaches to metric learning mainly focus on learning the Mahalanobis distance metric…
Distance metric learning is a successful way to enhance the performance of the nearest neighbor classifier. In most cases, however, the distribution of data does not obey a regular form and may change in different parts of the feature…
A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of…
In this paper, we discuss Mahalanobis k-NN: A Statistical Lens designed to address the challenges of feature matching in learning-based point cloud registration when confronted with an arbitrary density of point clouds. We tackle this by…
The concepts of similarity and distance are crucial in data mining. We consider the problem of defining the distance between two data sets by comparing summary statistics computed from the data sets. The initial definition of our distance…
For many data mining and machine learning tasks, the quality of a similarity measure is the key for their performance. To automatically find a good similarity measure from datasets, metric learning and similarity learning are proposed and…
Data streams are often defined as large amounts of data flowing continuously at high speed. Moreover, these data are likely subject to changes in data distribution, known as concept drift. Given all the reasons mentioned above, learning…
Data stream mining aims at extracting meaningful knowledge from continually evolving data streams, addressing the challenges posed by nonstationary environments, particularly, concept drift which refers to a change in the underlying data…
We propose a novel semiparametric classifier based on Mahalanobis distances of an observation from the competing classes. Our tool is a generalized additive model with the logistic link function that uses these distances as features to…
Metric learning is a key problem for many data mining and machine learning applications, and has long been dominated by Mahalanobis methods. Recent advances in nonlinear metric learning have demonstrated the potential power of…