Multi-Objective Weighted Sampling
Abstract
{\em Multi-objective samples} are powerful and versatile summaries of large data sets. For a set of keys and associated values , a weighted sample taken with respect to allows us to approximate {\em segment-sum statistics} , for any subset of the keys, with statistically-guaranteed quality that depends on sample size and the relative weight of . When estimating for , however, quality guarantees are lost. A multi-objective sample with respect to a set of functions provides for each the same statistical guarantees as a dedicated weighted sample while minimizing the summary size. We analyze properties of multi-objective samples and present sampling schemes and meta-algortithms for estimation and optimization while showcasing two important application domains. The first are key-value data sets, where different functions applied to the values correspond to different statistics such as moments, thresholds, capping, and sum. A multi-objective sample allows us to approximate all statistics in . The second is metric spaces, where keys are points, and each is defined by a set of points with being the service cost of by , and models centrality or clustering cost of . A multi-objective sample allows us to estimate costs for each . In these domains, multi-objective samples are often of small size, are efficiently to construct, and enable scalable estimation and optimization. We aim here to facilitate further applications of this powerful technique.
Cite
@article{arxiv.1509.07445,
title = {Multi-Objective Weighted Sampling},
author = {Edith Cohen},
journal= {arXiv preprint arXiv:1509.07445},
year = {2017}
}
Comments
14 pages; full version of a HotWeb 2015 paper