English
Related papers

Related papers: A Weighted K-Center Algorithm for Data Subset Sele…

200 papers

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…

Machine Learning · Statistics 2016-05-11 Marco Capó , Aritz Pérez , José Antonio Lozano

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Center-based clustering techniques are fundamental in some areas of machine learning such as data summarization. Generic $k$-center algorithms can produce biased cluster representatives so there has been a recent interest in fair $k$-center…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-21 Jinxiang Gan , Mordecai Golin , Zonghan Yang , Yuhao Zhang

In this paper, we investigate the learning-augmented $k$-median clustering problem, which aims to improve the performance of traditional clustering algorithms by preprocessing the point set with a predictor of error rate $\alpha \in [0,1)$.…

Data Structures and Algorithms · Computer Science 2026-03-12 Kangke Cheng , Shihong Song , Guanlin Mo , Hu Ding

Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is…

Machine Learning · Computer Science 2025-04-04 Maximilian Böther , Abraham Sebastian , Pranjal Awasthi , Ana Klimovic , Srikumar Ramalingam

As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable…

Machine Learning · Computer Science 2024-06-03 Mohammad Jafari , Yimeng Zhang , Yihua Zhang , Sijia Liu

In real applications, database systems should be able to manage and process data with uncertainty. Any real dataset may have missing or rounded values, also the values of data may change by time. So, it becomes important to handle these…

Computational Geometry · Computer Science 2020-06-12 Sharareh Alipour

Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular $k$-median and $k$-means variants which, given a set $P$ of points from a metric…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-01 Alessio Mazzetto , Andrea Pietracaprina , Geppino Pucci

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and…

In a metric space, a set of point sets of roughly the same size and an integer $k\geq 1$ are given as the input and the goal of data-distributed $k$-center is to find a subset of size $k$ of the input points as the set of centers to…

Computational Geometry · Computer Science 2023-09-11 Sepideh Aghamolaei , Mohammad Ghodsi

Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original…

Machine Learning · Computer Science 2020-06-24 Hanlin Lu , Ming-Ju Li , Ting He , Shiqiang Wang , Vijaykrishnan Narayanan , Kevin S Chan

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of…

Data Structures and Algorithms · Computer Science 2011-08-08 Raied Salman , Vojislav Kecman , Qi Li , Robert Strack , Erik Test

We study two generalizations of classic clustering problems called dynamic ordered $k$-median and dynamic $k$-supplier, where the points that need clustering evolve over time, and we are allowed to move the cluster centers between…

Data Structures and Algorithms · Computer Science 2022-07-26 Shichuan Deng , Jian Li , Yuval Rabani

One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can…

Machine Learning · Computer Science 2020-09-23 Ali Hassani , Amir Iranmanesh , Mahdi Eftekhari , Abbas Salemi

Center-based clustering has attracted significant research interest from both theory and practice. In many practical applications, input data often contain background knowledge that can be used to improve clustering results. In this work,…

Machine Learning · Computer Science 2025-06-13 Longkun Guo , Chaoqi Jia , Kewen Liao , Zhigang Lu , Minhui Xue

Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive…

Machine Learning · Statistics 2018-06-08 Olivier Bachem , Mario Lucic , Andreas Krause

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…

Computer Vision and Pattern Recognition · Computer Science 2013-12-12 Jingdong Wang , Jing Wang , Qifa Ke , Gang Zeng , Shipeng Li
‹ Prev 1 2 3 10 Next ›