Related papers: Nested Mini-Batch K-Means

An efficient K-means algorithm for Massive Data

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…

Machine Learning · Statistics 2016-05-11 Marco Capó , Aritz Pérez , José Antonio Lozano

Mini-Batch Kernel $k$-means

We present the first mini-batch kernel $k$-means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes $\widetilde{O}(kb^2)$ time, significantly…

Machine Learning · Computer Science 2024-10-10 Ben Jourdan , Gregory Schwartzman

Fast k-means algorithm clustering

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of…

Data Structures and Algorithms · Computer Science 2011-08-08 Raied Salman , Vojislav Kecman , Qi Li , Robert Strack , Erik Test

Parallelization of the K-Means Algorithm with Applications to Big Data Clustering

The K-Means clustering using LLoyd's algorithm is an iterative approach to partition the given dataset into K different clusters. The algorithm assigns each point to the cluster based on the following objective function \[\ \min…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-21 Ashish Srivastava , Mohammed Nawfal

Convergence rate of stochastic k-means

We analyze online \cite{BottouBengio} and mini-batch \cite{Sculley} $k$-means variants. Both scale up the widely used $k$-means algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised…

Machine Learning · Computer Science 2016-11-17 Cheng Tang , Claire Monteleoni

Fast Approximate $K$-Means via Cluster Closures

$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…

Computer Vision and Pattern Recognition · Computer Science 2013-12-12 Jingdong Wang , Jing Wang , Qifa Ke , Gang Zeng , Shipeng Li

k-sums: another side of k-means

In this paper, the decades-old clustering method k-means is revisited. The original distortion minimization model of k-means is addressed by a pure stochastic minimization procedure. In each step of the iteration, one sample is tentatively…

Machine Learning · Computer Science 2020-05-20 Wan-Lei Zhao , Run-Qing Chen , Hui Ye , Chong-Wah Ngo

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…

Machine Learning · Computer Science 2016-04-19 Fouad Khan

Ball k-means

This paper presents a novel accelerated exact k-means algorithm called the Ball k-means algorithm, which uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The Ball k-means can accurately find…

Machine Learning · Computer Science 2020-05-05 Shuyin Xia , Daowan Peng , Deyu Meng , Changqing Zhang , Guoyin Wang , Zizhong Chen , Wei Wei

Inference with K-means

This thesis aims to invent new approaches for making inferences with the k-means algorithm. k-means is an iterative clustering algorithm that randomly assigns k centroids, then assigns data points to the nearest centroid, and updates…

Machine Learning · Computer Science 2024-10-24 Alfred K. Adzika , Prudence Djagba

Computing k-means in mixed precision

Motivated by the increasing availability of low- and mixed-precision arithmetic on modern hardware, we develop mixed-precision variants of Lloyd's algorithm for k-means clustering. The main ingredient is a family of mixed-precision kernels…

Numerical Analysis · Mathematics 2026-05-26 Erin Carson , Xinye Chen , Xiaobo Liu

Faster Balanced Clusterings in High Dimension

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

Fast K-Means Clustering with Anderson Acceleration

We propose a novel method to accelerate Lloyd's algorithm for K-Means clustering. Unlike previous acceleration approaches that reduce computational cost per iterations or improve initialization, our approach is focused on reducing the…

Machine Learning · Computer Science 2018-05-29 Juyong Zhang , Yuxin Yao , Yue Peng , Hao Yu , Bailin Deng

Better Mini-Batch Algorithms via Accelerated Gradient Methods

Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard…

Machine Learning · Computer Science 2011-06-24 Andrew Cotter , Ohad Shamir , Nathan Srebro , Karthik Sridharan

Fast k-means based on KNN Graph

In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well…

Machine Learning · Computer Science 2017-05-05 Cheng-Hao Deng , Wan-Lei Zhao

Elkan's k-Means for Graphs

This paper extends k-means algorithms from the Euclidean domain to the domain of graphs. To recompute the centroids, we apply subgradient methods for solving the optimization-based formulation of the sample mean of graphs. To accelerate the…

Artificial Intelligence · Computer Science 2009-12-24 Brijnesh J. Jain , Klaus Obermayer

Improving the performance of bagging ensembles for data streams through mini-batching

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data…

Machine Learning · Computer Science 2021-12-21 Guilherme Cassales , Heitor Gomes , Albert Bifet , Bernhard Pfahringer , Hermes Senger

Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D

The $k$-Means clustering problem on $n$ points is NP-Hard for any dimension $d\ge 2$, however, for the 1D case there exists exact polynomial time algorithms. Previous literature reported an $O(kn^2)$ time dynamic programming algorithm that…

Data Structures and Algorithms · Computer Science 2018-04-26 Allan Grønlund , Kasper Green Larsen , Alexander Mathiasen , Jesper Sindahl Nielsen , Stefan Schneider , Mingzhou Song

Mini-batch $k$-means terminates within $O(d/\epsilon)$ iterations

We answer the question: "Does local progress (on batches) imply global progress (on the entire dataset) for mini-batch $k$-means?". Specifically, we consider mini-batch $k$-means which terminates only when the improvement in the quality of…

Machine Learning · Computer Science 2023-04-04 Gregory Schwartzman

Balanced k-Means and Min-Cut Clustering

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their…

Machine Learning · Computer Science 2014-11-25 Xiaojun Chang , Feiping Nie , Zhigang Ma , Yi Yang