English
Related papers

Related papers: Binary Bleed: Fast Distributed and Parallel Method…

200 papers

We develop novel clustering algorithms for functional data when the number of clusters $K$ is unknown and also when it is prefixed. These algorithms are developed based on the Maximum Mean Discrepancy (MMD) measure between two sets of…

Methodology · Statistics 2025-07-16 Sourav Chakrabarty , Anirvan Chakraborty , Shyamal K. De

Estimating the number of clusters (K) is a critical and often difficult task in cluster analysis. Many methods have been proposed to estimate K, including some top performers using resampling approach. When performing cluster analysis in…

Methodology · Statistics 2019-09-05 Yujia Li , Xiangrui Zeng , Chien-Wei Lin , George Tseng

Identifying the underlying models in a set of data points contaminated by noise and outliers, leads to a highly complex multi-model fitting problem. This problem can be posed as a clustering problem by the projection of higher order…

Computer Vision and Pattern Recognition · Computer Science 2018-08-01 Ruwan Tennakoon , Alireza Sadri , Reza Hoseinnezhad , Alireza Bab-Hadiashar

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data…

Machine Learning · Computer Science 2025-08-28 Afonso Martini Spezia , Thomas Fontanari , Mariana Recamonde-Mendoza

Clustering is a critical component of decision-making in todays data-driven environments. It has been widely used in a variety of fields such as bioinformatics, social network analysis, and image processing. However, clustering accuracy…

Machine Learning · Computer Science 2025-07-14 Krishnendu Das , Sumit Gupta , Awadhesh Kumar

Recent advancements in Mixed Integer Optimization (MIO) algorithms, paired with hardware enhancements, have led to significant speedups in resolving MIO problems. These strategies have been utilized for optimal subset selection,…

Methodology · Statistics 2024-03-27 Madhav Sankaranarayanan , Intekhab Hossain , Tom Chen

$K$-means clustering is a widely used machine learning method for identifying patterns in large datasets. Recently, semidefinite programming (SDP) relaxations have been proposed for solving the $K$-means optimization problem, which enjoy…

Machine Learning · Statistics 2024-04-16 Yubo Zhuang , Xiaohui Chen , Yun Yang , Richard Y. Zhang

K-Means clustering still plays an important role in many computer vision problems. While the conventional Lloyd method, which alternates between centroid update and cluster assignment, is primarily used in practice, it may converge to a…

Computer Vision and Pattern Recognition · Computer Science 2018-10-30 Huu Le , Anders Eriksson , Thanh-Toan Do , Michael Milford

The capability of classifying and clustering a desired set of data is an essential part of building knowledge from data. However, as the size and dimensionality of input data increases, the run-time for such clustering algorithms is…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-25 Hadi Mardani Kamali

Clustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-10 Marco Jacopo Ferrarotti , Sergio Decherchi , Walter Rocchia

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Seyed Omid Mohammadi , Ahmad Kalhor , Hossein Bodaghi

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and…

Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means…

Machine Learning · Computer Science 2017-11-15 Zhao Kang , Chong Peng , Qiang Cheng , Zenglin Xu

The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks…

Databases · Computer Science 2013-09-11 Bondu Venkateswarlu , Prof G. S. V. Prasad Raju

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

In this paper, we investigate the learning-augmented $k$-median clustering problem, which aims to improve the performance of traditional clustering algorithms by preprocessing the point set with a predictor of error rate $\alpha \in [0,1)$.…

Data Structures and Algorithms · Computer Science 2026-03-12 Kangke Cheng , Shihong Song , Guanlin Mo , Hu Ding

We study the topic of dimensionality reduction for $k$-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for…

Data Structures and Algorithms · Computer Science 2015-03-19 Christos Boutsidis , Anastasios Zouzias , Michael W. Mahoney , Petros Drineas

Symmetric nonnegative matrix factorization (SymNMF) is a powerful tool for clustering, which typically uses the $k$-nearest neighbor ($k$-NN) method to construct similarity matrix. However, $k$-NN may mislead clustering since the neighbors…

Machine Learning · Computer Science 2024-12-06 Wenlong Lyu , Yuheng Jia

This paper considers the problem of model selection under domain shift. Motivated by principles from distributionally robust optimisation and domain adaptation theory, it is proposed that the training-validation split should maximise the…

Machine Learning · Computer Science 2025-08-19 Andrea Napoli , Paul White

Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal…

‹ Prev 1 2 3 10 Next ›