English
Related papers

Related papers: Practical Coreset Constructions for Machine Learni…

200 papers

Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original…

Machine Learning · Computer Science 2020-06-24 Hanlin Lu , Ming-Ju Li , Ting He , Shiqiang Wang , Vijaykrishnan Narayanan , Kevin S Chan

A coreset is a point set containing information about geometric properties of a larger point set. A series of previous works show that in many machine learning problems, especially in clustering problems, coreset could be very useful to…

Data Structures and Algorithms · Computer Science 2022-10-18 Yichuan Deng , Zhao Song , Yitan Wang , Yuanyuan Yang

A coreset (or core-set) of an input set is its small summation, such that solving a problem on the coreset as its input, provably yields the same result as solving the same problem on the original (full) set, for a given family of problems…

Machine Learning · Computer Science 2019-10-22 Ibrahim Jubran , Alaa Maalouf , Dan Feldman

Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably…

Machine Learning · Statistics 2018-02-21 Olivier Bachem , Mario Lucic , Silvio Lattanzi

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering…

Machine Learning · Computer Science 2020-11-19 Dan Feldman

Coreset of a given dataset and loss function is usually a small weighed set that approximates this loss for every query from a given set of queries. Coresets have shown to be very useful in many applications. However, coresets construction…

Machine Learning · Computer Science 2021-11-05 Alaa Maalouf , Gilad Eini , Ben Mussay , Dan Feldman , Margarita Osadchy

Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive…

Machine Learning · Statistics 2018-06-08 Olivier Bachem , Mario Lucic , Andreas Krause

A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques…

Machine Learning · Computer Science 2022-10-11 Jiaxiang Chen , Qingyuan Yang , Ruomin Huang , Hu Ding

A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if $P$ is a set of points, $Q$ is a set of queries, and $f:P\times Q\to\mathbb{R}$ is a…

Data Structures and Algorithms · Computer Science 2022-09-20 Vladimir Braverman , Dan Feldman , Harry Lang , Adiel Statman , Samson Zhou

The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a…

Machine Learning · Computer Science 2021-09-28 Zalán Borsos , Mojmír Mutný , Marco Tagliasacchi , Andreas Krause

Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple…

Machine Learning · Computer Science 2020-10-23 Zalán Borsos , Mojmír Mutný , Andreas Krause

We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating…

Machine Learning · Computer Science 2021-10-29 Lingxiao Huang , K. Sudhir , Nisheeth K. Vishnoi

A coreset is a tiny weighted subset of an input set, that closely resembles the loss function, with respect to a certain set of queries. Coresets became prevalent in machine learning as they have shown to be advantageous for many…

Machine Learning · Computer Science 2023-05-23 Alaa Maalouf , Murad Tukan , Vladimir Braverman , Daniela Rus

We devise coresets for kernel $k$-Means with a general kernel, and use them to obtain new, more efficient, algorithms. Kernel $k$-Means has superior clustering capability compared to classical $k$-Means, particularly when clusters are…

Data Structures and Algorithms · Computer Science 2024-04-09 Shaofeng H. -C. Jiang , Robert Krauthgamer , Jianing Lou , Yubo Zhang

A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as…

Machine Learning · Computer Science 2020-02-21 Pietro Barbiero , Giovanni Squillero , Alberto Tonda

Coreset selection targets the challenge of finding a small, representative subset of a large dataset that preserves essential patterns for effective machine learning. Although several surveys have examined data reduction strategies before,…

Machine Learning · Computer Science 2026-01-30 Brian B. Moser , Arundhati S. Shanbhag , Stanislav Frolov , Federico Raue , Joachim Folz , Andreas Dengel

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as $k$-means in both theory and practice. Curiously, there exists no work on…

Data Structures and Algorithms · Computer Science 2022-07-05 Chris Schwiegelshohn , Omar Ali Sheikh-Omar

A \emph{strong coreset} for the mean queries of a set $P$ in ${\mathbb{R}}^d$ is a small weighted subset $C\subseteq P$, which provably approximates its sum of squared distances to any center (point) $x\in {\mathbb{R}}^d$. A \emph{weak…

Machine Learning · Computer Science 2021-11-05 Alaa Maalouf , Ibrahim Jubran , Dan Feldman

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also…

Machine Learning · Statistics 2018-01-17 Mario Lucic , Matthew Faulkner , Andreas Krause , Dan Feldman

Geometric data summarization has become an essential tool in both geometric approximation algorithms and where geometry intersects with big data problems. In linear or near-linear time large data sets can be compressed into a summary, and…

Computational Geometry · Computer Science 2016-06-14 Jeff M. Phillips
‹ Prev 1 2 3 10 Next ›