Multi-Objective Weighted Sampling

Edith Cohen

Multi-Objective Weighted Sampling

Databases 2017-06-14 v6 Data Structures and Algorithms

Authors: Edith Cohen

Abstract

{\em Multi-objective samples} are powerful and versatile summaries of large data sets. For a set of keys $x\in X$ and associated values $f_x \geq 0$ , a weighted sample taken with respect to $f$ allows us to approximate {\em segment-sum statistics} $\text{Sum}(f;H) = \text{sum}_{x\in H} f_x$ , for any subset $H$ of the keys, with statistically-guaranteed quality that depends on sample size and the relative weight of $H$ . When estimating $\text{Sum}(g;H)$ for $g\not=f$ , however, quality guarantees are lost. A multi-objective sample with respect to a set of functions $F$ provides for each $f\in F$ the same statistical guarantees as a dedicated weighted sample while minimizing the summary size. We analyze properties of multi-objective samples and present sampling schemes and meta-algortithms for estimation and optimization while showcasing two important application domains. The first are key-value data sets, where different functions $f\in F$ applied to the values correspond to different statistics such as moments, thresholds, capping, and sum. A multi-objective sample allows us to approximate all statistics in $F$ . The second is metric spaces, where keys are points, and each $f\in F$ is defined by a set of points $C$ with $f_x$ being the service cost of $x$ by $C$ , and $\text{Sum}(f;X)$ models centrality or clustering cost of $C$ . A multi-objective sample allows us to estimate costs for each $f\in F$ . In these domains, multi-objective samples are often of small size, are efficiently to construct, and enable scalable estimation and optimization. We aim here to facilitate further applications of this powerful technique.

Keywords

randomized algorithm machine learning cluster analysis

Cite

@article{arxiv.1509.07445,
  title  = {Multi-Objective Weighted Sampling},
  author = {Edith Cohen},
  journal= {arXiv preprint arXiv:1509.07445},
  year   = {2017}
}

Comments

14 pages; full version of a HotWeb 2015 paper

Multi-Objective Weighted Sampling

Abstract

Keywords

Cite

Comments

Related papers