English
Related papers

Related papers: Optimal Coreset for Gaussian Kernel Density Estima…

200 papers

We construct near-optimal coresets for kernel density estimates for points in $\mathbb{R}^d$ when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size $O(\sqrt{d}/\varepsilon\cdot…

Machine Learning · Computer Science 2019-04-15 Jeff M. Phillips , Wai Ming Tai

We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For…

Machine Learning · Computer Science 2017-10-13 Jeff M. Phillips , Wai Ming Tai

We apply the discrepancy method and a chaining approach to give improved bounds on the coreset complexity of a wide class of kernel functions. Our results give randomized polynomial time algorithms to produce coresets of size…

Machine Learning · Computer Science 2023-10-13 Rainie Bozzai , Thomas Rothvoss

An $\varepsilon$-coreset for a given set $D$ of $n$ points, is usually a small weighted set, such that querying the coreset \emph{provably} yields a $(1+\varepsilon)$-factor approximation to the original (full) dataset, for a given family…

Machine Learning · Computer Science 2019-06-13 Dan Feldman , Zahi Kfir , Xuan Wu

Given a set of points $P\subset \mathbb{R}^{d}$ and a kernel $k$, the Kernel Density Estimate at a point $x\in\mathbb{R}^{d}$ is defined as $\mathrm{KDE}_{P}(x)=\frac{1}{|P|}\sum_{y\in P} k(x,y)$. We study the problem of designing a data…

Data Structures and Algorithms · Computer Science 2018-09-03 Moses Charikar , Paris Siminelakis

We study the worst case error of kernel density estimates via subset approximation. A kernel density estimate of a distribution is the convolution of that distribution with a fixed kernel (e.g. Gaussian kernel). Given a subset (i.e. a point…

Computational Geometry · Computer Science 2012-04-05 Jeff M. Phillips

Coresets have emerged as a powerful tool to summarize data by selecting a small subset of the original observations while retaining most of its information. This approach has led to significant computational speedups but the performance of…

Statistics Theory · Mathematics 2020-12-10 Paxton Turner , Jingbo Liu , Philippe Rigollet

A \emph{strong coreset} for the mean queries of a set $P$ in ${\mathbb{R}}^d$ is a small weighted subset $C\subseteq P$, which provably approximates its sum of squared distances to any center (point) $x\in {\mathbb{R}}^d$. A \emph{weak…

Machine Learning · Computer Science 2021-11-05 Alaa Maalouf , Ibrahim Jubran , Dan Feldman

Let $P$ be a set of $n$ points in $\Re^2$. For a parameter $\varepsilon\in (0,1)$, a subset $C\subseteq P$ is an \emph{$\varepsilon$-kernel} of $P$ if the projection of the convex hull of $C$ approximates that of $P$ within…

Computational Geometry · Computer Science 2023-03-15 Pankaj K. Agarwal , Sariel Har-Peled

A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if $P$ is a set of points, $Q$ is a set of queries, and $f:P\times Q\to\mathbb{R}$ is a…

Data Structures and Algorithms · Computer Science 2022-09-20 Vladimir Braverman , Dan Feldman , Harry Lang , Adiel Statman , Samson Zhou

We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel…

Machine Learning · Statistics 2024-05-14 Raaz Dwivedi , Lester Mackey

With the dramatic growth in the number of application domains that generate probabilistic, noisy and uncertain data, there has been an increasing interest in designing algorithms for geometric or combinatorial optimization problems over…

Data Structures and Algorithms · Computer Science 2016-05-24 Lingxiao Huang , Jian Li , Jeff M. Phillips , Haitao Wang

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also…

Machine Learning · Statistics 2018-01-17 Mario Lucic , Matthew Faulkner , Andreas Krause , Dan Feldman

This study proposes a data condensation method for multivariate kernel density estimation by genetic algorithm. First, our proposed algorithm generates multiple subsamples of a given size with replacement from the original sample. The…

Methodology · Statistics 2022-03-04 Kiheiji Nishida

In this paper we revisit the kernel density estimation problem: given a kernel $K(x, y)$ and a dataset of $n$ points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query $q$, a…

Data Structures and Algorithms · Computer Science 2020-11-16 Moses Charikar , Michael Kapralov , Navid Nouri , Paris Siminelakis

$\renewcommand{\Re}{{\rm I\!\hspace{-0.025em} R}} \newcommand{\eps}{{\varepsilon}} \newcommand{\Coreset}{{\mathcal{S}}} $ In this paper, we show the existence of small coresets for the problems of computing $k$-median and $k$-means…

Computational Geometry · Computer Science 2018-10-31 Sariel Har-Peled , Soham Mazumdar

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for…

Human-Computer Interaction · Computer Science 2017-09-14 Yan Zheng , Yi Ou , Alexander Lex , Jeff M. Phillips

We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon > 0$. For a given full rank matrix $\mathbf{X} \in…

Data Structures and Algorithms · Computer Science 2026-05-18 Rachit Chhaya , Anirban Dasgupta , Dan Feldman , Supratim Shit

The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample and has found extensive applications in large-scale data analysis. Existing coresets methods construct the…

Computation · Statistics 2024-09-17 Mengyu Li , Jun Yu , Tao Li , Cheng Meng

A coreset is a point set containing information about geometric properties of a larger point set. A series of previous works show that in many machine learning problems, especially in clustering problems, coreset could be very useful to…

Data Structures and Algorithms · Computer Science 2022-10-18 Yichuan Deng , Zhao Song , Yitan Wang , Yuanyuan Yang
‹ Prev 1 2 3 10 Next ›