Related papers: On pattern classification with weighted dimensions
Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to…
This paper presents a new similarity measure to be used for general tasks including supervised learning, which is represented by the K-nearest neighbor classifier (KNN). The proposed similarity measure is invariant to large differences in…
K-Nearest Neighbors (KNN) is one of the most used ML classifiers. However, if we observe closely, standard distance-weighted KNN and relative variants assume all 'k' neighbors are equally reliable. In heterogeneous feature space, this…
kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For…
The $k$-nearest neighbour ($k$-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the $k$-NN classifier. In this…
High dimensionality, i.e. data having a large number of variables, tends to be a challenge for most machine learning tasks, including classification. A classifier usually builds a model representing how a set of inputs explain the outputs.…
Imputation of missing data is a common application in various classification problems where the feature training matrix has missingness. A widely used solution to this imputation problem is based on the lazy learning technique, $k$-nearest…
The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space and it is applicable when clusters are linearly separable. The kernel K-means, which extends…
Cosine similarity is a popular distance measure that measures the similarity between two vectors in the inner product space. It is widely used in many data classification algorithms like K-Nearest Neighbors, Clustering etc. This study…
The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the…
There are many distance-based methods for classification and clustering, and for data with a high number of dimensions and a lower number of observations, processing distances is computationally advantageous compared to the raw data matrix.…
Time series classification is an increasing research topic due to the vast amount of time series data that are being created over a wide variety of fields. The particularity of the data makes it a challenging task and different approaches…
Recent advances in deep learning have achieved impressive gains in classification accuracy on a variety of types of data, including images and text. Despite these gains, however, concerns have been raised about the calibration, robustness,…
Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life…
The weighted Euclidean norm $\|x\|_w$ of a vector $x\in \mathbb{R}^d$ with weights $w\in \mathbb{R}^d$ is the Euclidean norm where the contribution of each dimension is scaled by a given weight. Approaches to dimensionality reduction that…
Recent findings by Jahn, T. Ullrich, Voigtlaender [10] relate non-linear sampling numbers for the square norm to quantities involving trigonometric best $m-$term approximation errors in the uniform norm. Here we establish new results for…
Semi-supervised classification, where unlabeled data are massive but labeled data are limited, often arises in machine learning applications. We address this challenge under high-dimensional data by leveraging the manifold and cluster…
Testing for the equality of two high-dimensional distributions is a challenging problem, and this becomes even more challenging when the sample size is small. Over the last few decades, several graph-based two-sample tests have been…
Bearings are among the most failure-prone components in rotating machinery, and their condition directly impacts overall performance. Therefore, accurately diagnosing bearing faults is essential for ensuring system stability. However,…
$K$-NN classifier is one of the most famous classification algorithms, whose performance is crucially dependent on the distance metric. When we consider the distance metric as a parameter of $K$-NN, learning an appropriate distance metric…