English
Related papers

Related papers: Coresets for Kernel Regression

200 papers

This paper introduces the problem of coresets for regression problems to panel data settings. We first define coresets for several variants of regression problems with panel data and then present efficient algorithms to construct coresets…

Machine Learning · Computer Science 2020-11-04 Lingxiao Huang , K. Sudhir , Nisheeth K. Vishnoi

Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly…

Data Structures and Algorithms · Computer Science 2021-03-09 Alexander Munteanu , Chris Schwiegelshohn , Christian Sohler , David P. Woodruff

Coresets have emerged as a powerful tool to summarize data by selecting a small subset of the original observations while retaining most of its information. This approach has led to significant computational speedups but the performance of…

Statistics Theory · Mathematics 2020-12-10 Paxton Turner , Jingbo Liu , Philippe Rigollet

Coreset (or core-set) is a small weighted \emph{subset} $Q$ of an input set $P$ with respect to a given \emph{monotonic} function $f:\mathbb{R}\to\mathbb{R}$ that \emph{provably} approximates its fitting loss $\sum_{p\in P}f(p\cdot x)$ to…

Machine Learning · Computer Science 2021-12-24 Elad Tolochinsky , Ibrahim Jubran , Dan Feldman

We devise coresets for kernel $k$-Means with a general kernel, and use them to obtain new, more efficient, algorithms. Kernel $k$-Means has superior clustering capability compared to classical $k$-Means, particularly when clusters are…

Data Structures and Algorithms · Computer Science 2024-04-09 Shaofeng H. -C. Jiang , Robert Krauthgamer , Jianing Lou , Yubo Zhang

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for…

Human-Computer Interaction · Computer Science 2017-09-14 Yan Zheng , Yi Ou , Alexander Lex , Jeff M. Phillips

When faced with a data set too large to be processed all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, and among them "coresets" are especially appealing. A coreset is a…

Machine Learning · Statistics 2020-01-07 Nicolas Tremblay , Simon Barthelmé , Pierre-Olivier Amblard

A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if $P$ is a set of points, $Q$ is a set of queries, and $f:P\times Q\to\mathbb{R}$ is a…

Data Structures and Algorithms · Computer Science 2022-09-20 Vladimir Braverman , Dan Feldman , Harry Lang , Adiel Statman , Samson Zhou

Kernel methods, particularly kernel ridge regression (KRR), are time-proven, powerful nonparametric regression techniques known for their rich capacity, analytical simplicity, and computational tractability. The analysis of their predictive…

Statistics Theory · Mathematics 2025-09-23 Xin Bing , Xin He , Chao Wang

Coreset is usually a small weighted subset of $n$ input points in $\mathbb{R}^d$, that provably approximates their loss function for a given set of queries (models, classifiers, etc.). Coresets become increasingly common in machine learning…

Machine Learning · Computer Science 2020-06-22 Murad Tukan , Alaa Maalouf , Dan Feldman

A coreset (or core-set) of a dataset is its semantic compression with respect to a set of queries, such that querying the (small) coreset provably yields an approximate answer to querying the original (full) dataset. In the last decade,…

Robotics · Computer Science 2017-12-19 Soliman Nasser , Ibrahim Jubran , Dan Feldman

Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original…

Machine Learning · Computer Science 2020-06-24 Hanlin Lu , Ming-Ju Li , Ting He , Shiqiang Wang , Vijaykrishnan Narayanan , Kevin S Chan

Coreset of a given dataset and loss function is usually a small weighed set that approximates this loss for every query from a given set of queries. Coresets have shown to be very useful in many applications. However, coresets construction…

Machine Learning · Computer Science 2021-11-05 Alaa Maalouf , Gilad Eini , Ben Mussay , Dan Feldman , Margarita Osadchy

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering…

Machine Learning · Computer Science 2020-11-19 Dan Feldman

Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically…

Machine Learning · Computer Science 2024-03-01 Xiaobo Xia , Jiale Liu , Shaokun Zhang , Qingyun Wu , Hongxin Wei , Tongliang Liu

Modern data analysis often involves massive datasets with hundreds of thousands of observations, making traditional inference algorithms computationally prohibitive. Coresets are selection methods designed to choose a smaller subset of…

Computation · Statistics 2025-02-13 Bernardo Flores

This chapter deals with kernel methods as a special class of techniques for surrogate modeling. Kernel methods have proven to be efficient in machine learning, pattern recognition and signal analysis due to their flexibility, excellent…

Numerical Analysis · Mathematics 2022-10-31 Gabriele Santin , Bernard Haasdonk

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as $k$-means in both theory and practice. Curiously, there exists no work on…

Data Structures and Algorithms · Computer Science 2022-07-05 Chris Schwiegelshohn , Omar Ali Sheikh-Omar

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly…

Machine Learning · Computer Science 2024-04-03 Andrew Draganov , David Saulpic , Chris Schwiegelshohn

A coreset is a point set containing information about geometric properties of a larger point set. A series of previous works show that in many machine learning problems, especially in clustering problems, coreset could be very useful to…

Data Structures and Algorithms · Computer Science 2022-10-18 Yichuan Deng , Zhao Song , Yitan Wang , Yuanyuan Yang
‹ Prev 1 2 3 10 Next ›